Show simple item record

dc.contributor.authorSalem, Nema
dc.contributor.authorPeracha, Fahad Khalil
dc.contributor.authorIrfan Khattak, Muhammad
dc.contributor.authorSaleem, Nasir
dc.date.accessioned2023-06-10T06:29:38Z
dc.date.available2023-06-10T06:29:38Z
dc.date.issued2023-05-11
dc.identifier.doihttps://doi.org/10.1371/journal.pone.0285629en_US
dc.identifier.urihttp://hdl.handle.net/20.500.14131/908
dc.description.abstractSpeech enhancement (SE) reduces background noise signals in target speech and is applied at the front end in various real-world applications, including robust ASRs and real time processing in mobile phone communications. SE systems are commonly integrated into mobile phones to increase quality and intelligibility. As a result, a low-latency system is required to operate in real-world applications. On the other hand, these systems need effi cient optimization. This research focuses on the single-microphone SE operating in real time systems with better optimization. We propose a causal data-driven model that uses attention encoder-decoder long short-term memory (LSTM) to estimate the time-frequency mask from a noisy speech in order to make a clean speech for real-time applications that need low-latency causal processing. The encoder-decoder LSTM and a causal attention mechanism are used in the proposed model. Furthermore, a dynamical-weighted (DW) loss function is proposed to improve model learning by varying the weight loss values. Experi ments demonstrated that the proposed model consistently improves voice quality, intelligi bility, and noise suppression. In the causal processing mode, the LSTM-based estimated suppression time-frequency mask outperforms the baseline model for unseen noise types. The proposed SE improved the STOI by 2.64% (baseline LSTM-IRM), 6.6% (LSTM-KF), 4.18% (DeepXi-KF), and 3.58% (DeepResGRU-KF). In addition, we examine word error rates (WERs) using Google’s Automatic Speech Recognition (ASR). The ASR results show that error rates decreased from 46.33% (noisy signals) to 13.11% (proposed) 15.73% (LSTM), and 14.97% (LSTM-KF).en_US
dc.subjectSpeech signal processing, speech, Deep learning, background noiseen_US
dc.titleCausal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural networken_US
dc.source.journalPLOS-ONEen_US
dc.source.volume18en_US
dc.source.issue5en_US
refterms.dateFOA2023-06-10T06:29:38Z
dc.contributor.researcherExternal Collaborationen_US
dc.contributor.labNAen_US
dc.subject.KSAHEALTHen_US
dc.contributor.ugstudent0en_US
dc.contributor.alumnae0en_US
dc.source.indexScopusen_US
dc.source.indexWoSen_US
dc.contributor.departmentElectrical and Computer Engineeringen_US
dc.contributor.pgstudentMuhammad Irfan Khattaken_US
dc.contributor.firstauthorPeracha, Fahad Khalil


Files in this item

Thumbnail
Name:
journal.pone.0285629.pdf
Size:
4.301Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record