Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement

Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement. Xu, Z., Strake, M., & Fingscheidt, T. In 2019 27th European Signal Processing Conference (EUSIPCO), pages 1-5, Sep., 2019.

Paper doi abstract bibtex

Estimating time-frequency domain masks for speech enhancement using deep learning approaches has recently become a popular field of research. In this paper, we propose a mask-based speech enhancement framework by using concatenated identical deep neural networks (CI-DNNs). The idea is that a single DNN is trained under multiple input and output signal-to-noise power ratio (SNR) conditions, using targets that provide a moderate SNR gain with respect to the input and therefore achieve a balance between speech component quality and noise suppression. We concatenate this single DNN several times without any retraining to provide enough noise attenuation. Simulation results show that our proposed CI-DNN outperforms enhancement methods using classical spectral weighting rules w.r.t. total speech quality and speech intelligibility. Moreover, our approach shows similar or even a little bit better performance with much fewer trainable parameters compared with a noisy-target single DNN approach of the same size. A comparison to the conventional clean-target single DNN approach shows that our proposed CI-DNN is better in speech component quality and much better in residual noise component quality. Most importantly, our new CI-DNN generalized best to an unseen noise type, if compared to the other tested deep learning approaches.

@InProceedings{8903066,
  author = {Z. Xu and M. Strake and T. Fingscheidt},
  booktitle = {2019 27th European Signal Processing Conference (EUSIPCO)},
  title = {Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement},
  year = {2019},
  pages = {1-5},
  abstract = {Estimating time-frequency domain masks for speech enhancement using deep learning approaches has recently become a popular field of research. In this paper, we propose a mask-based speech enhancement framework by using concatenated identical deep neural networks (CI-DNNs). The idea is that a single DNN is trained under multiple input and output signal-to-noise power ratio (SNR) conditions, using targets that provide a moderate SNR gain with respect to the input and therefore achieve a balance between speech component quality and noise suppression. We concatenate this single DNN several times without any retraining to provide enough noise attenuation. Simulation results show that our proposed CI-DNN outperforms enhancement methods using classical spectral weighting rules w.r.t. total speech quality and speech intelligibility. Moreover, our approach shows similar or even a little bit better performance with much fewer trainable parameters compared with a noisy-target single DNN approach of the same size. A comparison to the conventional clean-target single DNN approach shows that our proposed CI-DNN is better in speech component quality and much better in residual noise component quality. Most importantly, our new CI-DNN generalized best to an unseen noise type, if compared to the other tested deep learning approaches.},
  keywords = {convolutional neural nets;learning (artificial intelligence);speech enhancement;speech intelligibility;noisy-target single DNN approach;clean-target single DNN approach;CI-DNN;speech component quality;residual noise component quality;deep learning approaches;noise-type dependence;DNN-based speech enhancement;mask-based speech enhancement framework;concatenated identical deep neural networks;output signal-to-noise power ratio;noise suppression;noise attenuation;total speech quality;speech intelligibility;time-frequency domain masks;Speech enhancement;Signal to noise ratio;Training;Noise measurement;Task analysis;Discrete Fourier transforms;Neural networks;Speech enhancement;noise reduction;DNN;noisy speech target},
  doi = {10.23919/EUSIPCO.2019.8903066},
  issn = {2076-1465},
  month = {Sep.},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570528366.pdf},
}

Downloads: 0

{"_id":"ZBrPzkACuKGkZyqYH","bibbaseid":"xu-strake-fingscheidt-concatenatedidenticaldnncidnntoreducenoisetypedependenceindnnbasedspeechenhancement-2019","authorIDs":[],"author_short":["Xu, Z.","Strake, M.","Fingscheidt, T."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Z."],"propositions":[],"lastnames":["Xu"],"suffixes":[]},{"firstnames":["M."],"propositions":[],"lastnames":["Strake"],"suffixes":[]},{"firstnames":["T."],"propositions":[],"lastnames":["Fingscheidt"],"suffixes":[]}],"booktitle":"2019 27th European Signal Processing Conference (EUSIPCO)","title":"Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement","year":"2019","pages":"1-5","abstract":"Estimating time-frequency domain masks for speech enhancement using deep learning approaches has recently become a popular field of research. In this paper, we propose a mask-based speech enhancement framework by using concatenated identical deep neural networks (CI-DNNs). The idea is that a single DNN is trained under multiple input and output signal-to-noise power ratio (SNR) conditions, using targets that provide a moderate SNR gain with respect to the input and therefore achieve a balance between speech component quality and noise suppression. We concatenate this single DNN several times without any retraining to provide enough noise attenuation. Simulation results show that our proposed CI-DNN outperforms enhancement methods using classical spectral weighting rules w.r.t. total speech quality and speech intelligibility. Moreover, our approach shows similar or even a little bit better performance with much fewer trainable parameters compared with a noisy-target single DNN approach of the same size. A comparison to the conventional clean-target single DNN approach shows that our proposed CI-DNN is better in speech component quality and much better in residual noise component quality. Most importantly, our new CI-DNN generalized best to an unseen noise type, if compared to the other tested deep learning approaches.","keywords":"convolutional neural nets;learning (artificial intelligence);speech enhancement;speech intelligibility;noisy-target single DNN approach;clean-target single DNN approach;CI-DNN;speech component quality;residual noise component quality;deep learning approaches;noise-type dependence;DNN-based speech enhancement;mask-based speech enhancement framework;concatenated identical deep neural networks;output signal-to-noise power ratio;noise suppression;noise attenuation;total speech quality;speech intelligibility;time-frequency domain masks;Speech enhancement;Signal to noise ratio;Training;Noise measurement;Task analysis;Discrete Fourier transforms;Neural networks;Speech enhancement;noise reduction;DNN;noisy speech target","doi":"10.23919/EUSIPCO.2019.8903066","issn":"2076-1465","month":"Sep.","url":"https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570528366.pdf","bibtex":"@InProceedings{8903066,\n author = {Z. Xu and M. Strake and T. Fingscheidt},\n booktitle = {2019 27th European Signal Processing Conference (EUSIPCO)},\n title = {Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement},\n year = {2019},\n pages = {1-5},\n abstract = {Estimating time-frequency domain masks for speech enhancement using deep learning approaches has recently become a popular field of research. In this paper, we propose a mask-based speech enhancement framework by using concatenated identical deep neural networks (CI-DNNs). The idea is that a single DNN is trained under multiple input and output signal-to-noise power ratio (SNR) conditions, using targets that provide a moderate SNR gain with respect to the input and therefore achieve a balance between speech component quality and noise suppression. We concatenate this single DNN several times without any retraining to provide enough noise attenuation. Simulation results show that our proposed CI-DNN outperforms enhancement methods using classical spectral weighting rules w.r.t. total speech quality and speech intelligibility. Moreover, our approach shows similar or even a little bit better performance with much fewer trainable parameters compared with a noisy-target single DNN approach of the same size. A comparison to the conventional clean-target single DNN approach shows that our proposed CI-DNN is better in speech component quality and much better in residual noise component quality. Most importantly, our new CI-DNN generalized best to an unseen noise type, if compared to the other tested deep learning approaches.},\n keywords = {convolutional neural nets;learning (artificial intelligence);speech enhancement;speech intelligibility;noisy-target single DNN approach;clean-target single DNN approach;CI-DNN;speech component quality;residual noise component quality;deep learning approaches;noise-type dependence;DNN-based speech enhancement;mask-based speech enhancement framework;concatenated identical deep neural networks;output signal-to-noise power ratio;noise suppression;noise attenuation;total speech quality;speech intelligibility;time-frequency domain masks;Speech enhancement;Signal to noise ratio;Training;Noise measurement;Task analysis;Discrete Fourier transforms;Neural networks;Speech enhancement;noise reduction;DNN;noisy speech target},\n doi = {10.23919/EUSIPCO.2019.8903066},\n issn = {2076-1465},\n month = {Sep.},\n url = {https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570528366.pdf},\n}\n\n","author_short":["Xu, Z.","Strake, M.","Fingscheidt, T."],"key":"8903066","id":"8903066","bibbaseid":"xu-strake-fingscheidt-concatenatedidenticaldnncidnntoreducenoisetypedependenceindnnbasedspeechenhancement-2019","role":"author","urls":{"Paper":"https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570528366.pdf"},"keyword":["convolutional neural nets;learning (artificial intelligence);speech enhancement;speech intelligibility;noisy-target single DNN approach;clean-target single DNN approach;CI-DNN;speech component quality;residual noise component quality;deep learning approaches;noise-type dependence;DNN-based speech enhancement;mask-based speech enhancement framework;concatenated identical deep neural networks;output signal-to-noise power ratio;noise suppression;noise attenuation;total speech quality;speech intelligibility;time-frequency domain masks;Speech enhancement;Signal to noise ratio;Training;Noise measurement;Task analysis;Discrete Fourier transforms;Neural networks;Speech enhancement;noise reduction;DNN;noisy speech target"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/Roznn/EUSIPCO/main/eusipco2019url.bib","creationDate":"2021-02-11T19:15:22.119Z","downloads":0,"keywords":["convolutional neural nets;learning (artificial intelligence);speech enhancement;speech intelligibility;noisy-target single dnn approach;clean-target single dnn approach;ci-dnn;speech component quality;residual noise component quality;deep learning approaches;noise-type dependence;dnn-based speech enhancement;mask-based speech enhancement framework;concatenated identical deep neural networks;output signal-to-noise power ratio;noise suppression;noise attenuation;total speech quality;speech intelligibility;time-frequency domain masks;speech enhancement;signal to noise ratio;training;noise measurement;task analysis;discrete fourier transforms;neural networks;speech enhancement;noise reduction;dnn;noisy speech target"],"search_terms":["concatenated","identical","dnn","dnn","reduce","noise","type","dependence","dnn","based","speech","enhancement","xu","strake","fingscheidt"],"title":"Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement","year":2019,"dataSources":["NqWTiMfRR56v86wRs","r6oz3cMyC99QfiuHW"]}