CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations

CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations. Yamaoka, K., Li, L., Ono, N., Makino, S., & Yamada, T. In 2019 27th European Signal Processing Conference (EUSIPCO), pages 1-5, Sep., 2019.

Paper doi abstract bibtex

In this paper, we propose a novel approach to virtually increasing the number of microphone elements between two real microphones to improve speech enhancement performance in underdetermined situations. The virtual microphone technique, with which virtual signals in the audio signal domain are estimated by linearly interpolating the phase and nonlinearly interpolating the amplitude independently on the basis of β-divergence, has been recently proposed and experimentally shown to be effective in improving speech enhancement performance. Furthermore, it has been reported that the performance tends to improve as the nonlinearity is improved. However, one drawback of this method is that the interpolation is employed in each time-frequency bin independently, in which the spectral and temporal structures of speech signals are ignored. To address this problem and improve the nonlinearity, motivated by the high capability of neural networks to model nonlinear functions and speech spectrograms, in this paper, we propose an alternative method of amplitude interpolation. In this method, we employ a convolutional neural network as an amplitude estimator that minimizes the mean squared error between the outputs of a minimum power distortionless response (MPDR) beamformer and the target speech signals. The experimental results revealed that the proposed method showed high potential for improving speech enhancement performance, which was not only superior to that of the conventional virtual microphone technique but also the performance in the corresponding determined situation.

@InProceedings{8903040,
  author = {K. Yamaoka and L. Li and N. Ono and S. Makino and T. Yamada},
  booktitle = {2019 27th European Signal Processing Conference (EUSIPCO)},
  title = {CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations},
  year = {2019},
  pages = {1-5},
  abstract = {In this paper, we propose a novel approach to virtually increasing the number of microphone elements between two real microphones to improve speech enhancement performance in underdetermined situations. The virtual microphone technique, with which virtual signals in the audio signal domain are estimated by linearly interpolating the phase and nonlinearly interpolating the amplitude independently on the basis of β-divergence, has been recently proposed and experimentally shown to be effective in improving speech enhancement performance. Furthermore, it has been reported that the performance tends to improve as the nonlinearity is improved. However, one drawback of this method is that the interpolation is employed in each time-frequency bin independently, in which the spectral and temporal structures of speech signals are ignored. To address this problem and improve the nonlinearity, motivated by the high capability of neural networks to model nonlinear functions and speech spectrograms, in this paper, we propose an alternative method of amplitude interpolation. In this method, we employ a convolutional neural network as an amplitude estimator that minimizes the mean squared error between the outputs of a minimum power distortionless response (MPDR) beamformer and the target speech signals. The experimental results revealed that the proposed method showed high potential for improving speech enhancement performance, which was not only superior to that of the conventional virtual microphone technique but also the performance in the corresponding determined situation.},
  keywords = {convolutional neural nets;interpolation;microphones;speech enhancement;nonlinear functions;speech spectrograms;amplitude interpolation;amplitude estimator;minimum power distortionless response beamformer;MPDR;target speech signals;speech enhancement performance;virtual microphone technique;CNN-based virtual microphone signal estimation;microphone elements;microphones;virtual signals;audio signal domain;linearly interpolating;nonlinearity;Microphones;Interpolation;Speech enhancement;Time-frequency analysis;Spectrogram;Logic gates},
  doi = {10.23919/EUSIPCO.2019.8903040},
  issn = {2076-1465},
  month = {Sep.},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570533075.pdf},
}

Downloads: 0

{"_id":"Sxkc55FgJdwPxaHv8","bibbaseid":"yamaoka-li-ono-makino-yamada-cnnbasedvirtualmicrophonesignalestimationformpdrbeamforminginunderdeterminedsituations-2019","authorIDs":[],"author_short":["Yamaoka, K.","Li, L.","Ono, N.","Makino, S.","Yamada, T."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["K."],"propositions":[],"lastnames":["Yamaoka"],"suffixes":[]},{"firstnames":["L."],"propositions":[],"lastnames":["Li"],"suffixes":[]},{"firstnames":["N."],"propositions":[],"lastnames":["Ono"],"suffixes":[]},{"firstnames":["S."],"propositions":[],"lastnames":["Makino"],"suffixes":[]},{"firstnames":["T."],"propositions":[],"lastnames":["Yamada"],"suffixes":[]}],"booktitle":"2019 27th European Signal Processing Conference (EUSIPCO)","title":"CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations","year":"2019","pages":"1-5","abstract":"In this paper, we propose a novel approach to virtually increasing the number of microphone elements between two real microphones to improve speech enhancement performance in underdetermined situations. The virtual microphone technique, with which virtual signals in the audio signal domain are estimated by linearly interpolating the phase and nonlinearly interpolating the amplitude independently on the basis of β-divergence, has been recently proposed and experimentally shown to be effective in improving speech enhancement performance. Furthermore, it has been reported that the performance tends to improve as the nonlinearity is improved. However, one drawback of this method is that the interpolation is employed in each time-frequency bin independently, in which the spectral and temporal structures of speech signals are ignored. To address this problem and improve the nonlinearity, motivated by the high capability of neural networks to model nonlinear functions and speech spectrograms, in this paper, we propose an alternative method of amplitude interpolation. In this method, we employ a convolutional neural network as an amplitude estimator that minimizes the mean squared error between the outputs of a minimum power distortionless response (MPDR) beamformer and the target speech signals. The experimental results revealed that the proposed method showed high potential for improving speech enhancement performance, which was not only superior to that of the conventional virtual microphone technique but also the performance in the corresponding determined situation.","keywords":"convolutional neural nets;interpolation;microphones;speech enhancement;nonlinear functions;speech spectrograms;amplitude interpolation;amplitude estimator;minimum power distortionless response beamformer;MPDR;target speech signals;speech enhancement performance;virtual microphone technique;CNN-based virtual microphone signal estimation;microphone elements;microphones;virtual signals;audio signal domain;linearly interpolating;nonlinearity;Microphones;Interpolation;Speech enhancement;Time-frequency analysis;Spectrogram;Logic gates","doi":"10.23919/EUSIPCO.2019.8903040","issn":"2076-1465","month":"Sep.","url":"https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570533075.pdf","bibtex":"@InProceedings{8903040,\n author = {K. Yamaoka and L. Li and N. Ono and S. Makino and T. Yamada},\n booktitle = {2019 27th European Signal Processing Conference (EUSIPCO)},\n title = {CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations},\n year = {2019},\n pages = {1-5},\n abstract = {In this paper, we propose a novel approach to virtually increasing the number of microphone elements between two real microphones to improve speech enhancement performance in underdetermined situations. The virtual microphone technique, with which virtual signals in the audio signal domain are estimated by linearly interpolating the phase and nonlinearly interpolating the amplitude independently on the basis of β-divergence, has been recently proposed and experimentally shown to be effective in improving speech enhancement performance. Furthermore, it has been reported that the performance tends to improve as the nonlinearity is improved. However, one drawback of this method is that the interpolation is employed in each time-frequency bin independently, in which the spectral and temporal structures of speech signals are ignored. To address this problem and improve the nonlinearity, motivated by the high capability of neural networks to model nonlinear functions and speech spectrograms, in this paper, we propose an alternative method of amplitude interpolation. In this method, we employ a convolutional neural network as an amplitude estimator that minimizes the mean squared error between the outputs of a minimum power distortionless response (MPDR) beamformer and the target speech signals. The experimental results revealed that the proposed method showed high potential for improving speech enhancement performance, which was not only superior to that of the conventional virtual microphone technique but also the performance in the corresponding determined situation.},\n keywords = {convolutional neural nets;interpolation;microphones;speech enhancement;nonlinear functions;speech spectrograms;amplitude interpolation;amplitude estimator;minimum power distortionless response beamformer;MPDR;target speech signals;speech enhancement performance;virtual microphone technique;CNN-based virtual microphone signal estimation;microphone elements;microphones;virtual signals;audio signal domain;linearly interpolating;nonlinearity;Microphones;Interpolation;Speech enhancement;Time-frequency analysis;Spectrogram;Logic gates},\n doi = {10.23919/EUSIPCO.2019.8903040},\n issn = {2076-1465},\n month = {Sep.},\n url = {https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570533075.pdf},\n}\n\n","author_short":["Yamaoka, K.","Li, L.","Ono, N.","Makino, S.","Yamada, T."],"key":"8903040","id":"8903040","bibbaseid":"yamaoka-li-ono-makino-yamada-cnnbasedvirtualmicrophonesignalestimationformpdrbeamforminginunderdeterminedsituations-2019","role":"author","urls":{"Paper":"https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570533075.pdf"},"keyword":["convolutional neural nets;interpolation;microphones;speech enhancement;nonlinear functions;speech spectrograms;amplitude interpolation;amplitude estimator;minimum power distortionless response beamformer;MPDR;target speech signals;speech enhancement performance;virtual microphone technique;CNN-based virtual microphone signal estimation;microphone elements;microphones;virtual signals;audio signal domain;linearly interpolating;nonlinearity;Microphones;Interpolation;Speech enhancement;Time-frequency analysis;Spectrogram;Logic gates"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/Roznn/EUSIPCO/main/eusipco2019url.bib","creationDate":"2021-02-11T19:15:22.105Z","downloads":0,"keywords":["convolutional neural nets;interpolation;microphones;speech enhancement;nonlinear functions;speech spectrograms;amplitude interpolation;amplitude estimator;minimum power distortionless response beamformer;mpdr;target speech signals;speech enhancement performance;virtual microphone technique;cnn-based virtual microphone signal estimation;microphone elements;microphones;virtual signals;audio signal domain;linearly interpolating;nonlinearity;microphones;interpolation;speech enhancement;time-frequency analysis;spectrogram;logic gates"],"search_terms":["cnn","based","virtual","microphone","signal","estimation","mpdr","beamforming","underdetermined","situations","yamaoka","li","ono","makino","yamada"],"title":"CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations","year":2019,"dataSources":["NqWTiMfRR56v86wRs","r6oz3cMyC99QfiuHW"]}