Cauchy Multichannel Speech Enhancement with a Deep Speech Prior

Cauchy Multichannel Speech Enhancement with a Deep Speech Prior. Fontaine, M., Nugraha, A. A., Badeau, R., Yoshii, K., & Liutkus, A. In 2019 27th European Signal Processing Conference (EUSIPCO), pages 1-5, Sep., 2019.

Paper doi abstract bibtex

We propose a semi-supervised multichannel speech enhancement system based on a probabilistic model which assumes that both speech and noise follow the heavy-tailed multi-variate complex Cauchy distribution. As we advocate, this allows handling strong and adverse noisy conditions. Consequently, the model is parameterized by the source magnitude spectrograms and the source spatial scatter matrices. To deal with the non-additivity of scatter matrices, our first contribution is to perform the enhancement on a projected space. Then, our second contribution is to combine a latent variable model for speech, which is trained by following the variational autoencoder framework, with a low-rank model for the noise source. At test time, an iterative inference algorithm is applied, which produces estimated parameters to use for separation. The speech latent variables are estimated first from the noisy speech and then updated by a gradient descent method, while a majoriation-equalization strategy is used to update both the noise and the spatial parameters of both sources. Our experimental results show that the Cauchy model outperforms the state-of-art methods. The standard deviation scores also reveal that the proposed method is more robust against non-stationary noise.

@InProceedings{8903091,
  author = {M. Fontaine and A. A. Nugraha and R. Badeau and K. Yoshii and A. Liutkus},
  booktitle = {2019 27th European Signal Processing Conference (EUSIPCO)},
  title = {Cauchy Multichannel Speech Enhancement with a Deep Speech Prior},
  year = {2019},
  pages = {1-5},
  abstract = {We propose a semi-supervised multichannel speech enhancement system based on a probabilistic model which assumes that both speech and noise follow the heavy-tailed multi-variate complex Cauchy distribution. As we advocate, this allows handling strong and adverse noisy conditions. Consequently, the model is parameterized by the source magnitude spectrograms and the source spatial scatter matrices. To deal with the non-additivity of scatter matrices, our first contribution is to perform the enhancement on a projected space. Then, our second contribution is to combine a latent variable model for speech, which is trained by following the variational autoencoder framework, with a low-rank model for the noise source. At test time, an iterative inference algorithm is applied, which produces estimated parameters to use for separation. The speech latent variables are estimated first from the noisy speech and then updated by a gradient descent method, while a majoriation-equalization strategy is used to update both the noise and the spatial parameters of both sources. Our experimental results show that the Cauchy model outperforms the state-of-art methods. The standard deviation scores also reveal that the proposed method is more robust against non-stationary noise.},
  keywords = {gradient methods;inference mechanisms;learning (artificial intelligence);matrix algebra;speech enhancement;statistical distributions;latent variable model;variational autoencoder framework;low-rank model;noise source;iterative inference algorithm;speech latent variables;noisy speech;nonstationary noise;Cauchy multichannel speech enhancement;semisupervised multichannel speech enhancement;probabilistic model;source magnitude spectrograms;source spatial scatter matrices;heavy-tailed multivariate complex Cauchy distribution;gradient descent method;majoriation-equalization strategy;standard deviation scores;Speech enhancement;Computational modeling;Spectrogram;Decoding;Probabilistic logic;Noise measurement;Training;Multichannel speech enhancement;multivariate complex Cauchy distribution;variational autoencoder;nonnegative matrix factorization},
  doi = {10.23919/EUSIPCO.2019.8903091},
  issn = {2076-1465},
  month = {Sep.},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570533890.pdf},
}

Downloads: 0

{"_id":"gvm875YyMhmg6Mfwx","bibbaseid":"fontaine-nugraha-badeau-yoshii-liutkus-cauchymultichannelspeechenhancementwithadeepspeechprior-2019","authorIDs":[],"author_short":["Fontaine, M.","Nugraha, A. A.","Badeau, R.","Yoshii, K.","Liutkus, A."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["M."],"propositions":[],"lastnames":["Fontaine"],"suffixes":[]},{"firstnames":["A.","A."],"propositions":[],"lastnames":["Nugraha"],"suffixes":[]},{"firstnames":["R."],"propositions":[],"lastnames":["Badeau"],"suffixes":[]},{"firstnames":["K."],"propositions":[],"lastnames":["Yoshii"],"suffixes":[]},{"firstnames":["A."],"propositions":[],"lastnames":["Liutkus"],"suffixes":[]}],"booktitle":"2019 27th European Signal Processing Conference (EUSIPCO)","title":"Cauchy Multichannel Speech Enhancement with a Deep Speech Prior","year":"2019","pages":"1-5","abstract":"We propose a semi-supervised multichannel speech enhancement system based on a probabilistic model which assumes that both speech and noise follow the heavy-tailed multi-variate complex Cauchy distribution. As we advocate, this allows handling strong and adverse noisy conditions. Consequently, the model is parameterized by the source magnitude spectrograms and the source spatial scatter matrices. To deal with the non-additivity of scatter matrices, our first contribution is to perform the enhancement on a projected space. Then, our second contribution is to combine a latent variable model for speech, which is trained by following the variational autoencoder framework, with a low-rank model for the noise source. At test time, an iterative inference algorithm is applied, which produces estimated parameters to use for separation. The speech latent variables are estimated first from the noisy speech and then updated by a gradient descent method, while a majoriation-equalization strategy is used to update both the noise and the spatial parameters of both sources. Our experimental results show that the Cauchy model outperforms the state-of-art methods. The standard deviation scores also reveal that the proposed method is more robust against non-stationary noise.","keywords":"gradient methods;inference mechanisms;learning (artificial intelligence);matrix algebra;speech enhancement;statistical distributions;latent variable model;variational autoencoder framework;low-rank model;noise source;iterative inference algorithm;speech latent variables;noisy speech;nonstationary noise;Cauchy multichannel speech enhancement;semisupervised multichannel speech enhancement;probabilistic model;source magnitude spectrograms;source spatial scatter matrices;heavy-tailed multivariate complex Cauchy distribution;gradient descent method;majoriation-equalization strategy;standard deviation scores;Speech enhancement;Computational modeling;Spectrogram;Decoding;Probabilistic logic;Noise measurement;Training;Multichannel speech enhancement;multivariate complex Cauchy distribution;variational autoencoder;nonnegative matrix factorization","doi":"10.23919/EUSIPCO.2019.8903091","issn":"2076-1465","month":"Sep.","url":"https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570533890.pdf","bibtex":"@InProceedings{8903091,\n author = {M. Fontaine and A. A. Nugraha and R. Badeau and K. Yoshii and A. Liutkus},\n booktitle = {2019 27th European Signal Processing Conference (EUSIPCO)},\n title = {Cauchy Multichannel Speech Enhancement with a Deep Speech Prior},\n year = {2019},\n pages = {1-5},\n abstract = {We propose a semi-supervised multichannel speech enhancement system based on a probabilistic model which assumes that both speech and noise follow the heavy-tailed multi-variate complex Cauchy distribution. As we advocate, this allows handling strong and adverse noisy conditions. Consequently, the model is parameterized by the source magnitude spectrograms and the source spatial scatter matrices. To deal with the non-additivity of scatter matrices, our first contribution is to perform the enhancement on a projected space. Then, our second contribution is to combine a latent variable model for speech, which is trained by following the variational autoencoder framework, with a low-rank model for the noise source. At test time, an iterative inference algorithm is applied, which produces estimated parameters to use for separation. The speech latent variables are estimated first from the noisy speech and then updated by a gradient descent method, while a majoriation-equalization strategy is used to update both the noise and the spatial parameters of both sources. Our experimental results show that the Cauchy model outperforms the state-of-art methods. The standard deviation scores also reveal that the proposed method is more robust against non-stationary noise.},\n keywords = {gradient methods;inference mechanisms;learning (artificial intelligence);matrix algebra;speech enhancement;statistical distributions;latent variable model;variational autoencoder framework;low-rank model;noise source;iterative inference algorithm;speech latent variables;noisy speech;nonstationary noise;Cauchy multichannel speech enhancement;semisupervised multichannel speech enhancement;probabilistic model;source magnitude spectrograms;source spatial scatter matrices;heavy-tailed multivariate complex Cauchy distribution;gradient descent method;majoriation-equalization strategy;standard deviation scores;Speech enhancement;Computational modeling;Spectrogram;Decoding;Probabilistic logic;Noise measurement;Training;Multichannel speech enhancement;multivariate complex Cauchy distribution;variational autoencoder;nonnegative matrix factorization},\n doi = {10.23919/EUSIPCO.2019.8903091},\n issn = {2076-1465},\n month = {Sep.},\n url = {https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570533890.pdf},\n}\n\n","author_short":["Fontaine, M.","Nugraha, A. A.","Badeau, R.","Yoshii, K.","Liutkus, A."],"key":"8903091","id":"8903091","bibbaseid":"fontaine-nugraha-badeau-yoshii-liutkus-cauchymultichannelspeechenhancementwithadeepspeechprior-2019","role":"author","urls":{"Paper":"https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570533890.pdf"},"keyword":["gradient methods;inference mechanisms;learning (artificial intelligence);matrix algebra;speech enhancement;statistical distributions;latent variable model;variational autoencoder framework;low-rank model;noise source;iterative inference algorithm;speech latent variables;noisy speech;nonstationary noise;Cauchy multichannel speech enhancement;semisupervised multichannel speech enhancement;probabilistic model;source magnitude spectrograms;source spatial scatter matrices;heavy-tailed multivariate complex Cauchy distribution;gradient descent method;majoriation-equalization strategy;standard deviation scores;Speech enhancement;Computational modeling;Spectrogram;Decoding;Probabilistic logic;Noise measurement;Training;Multichannel speech enhancement;multivariate complex Cauchy distribution;variational autoencoder;nonnegative matrix factorization"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/Roznn/EUSIPCO/main/eusipco2019url.bib","creationDate":"2021-02-11T19:15:22.140Z","downloads":0,"keywords":["gradient methods;inference mechanisms;learning (artificial intelligence);matrix algebra;speech enhancement;statistical distributions;latent variable model;variational autoencoder framework;low-rank model;noise source;iterative inference algorithm;speech latent variables;noisy speech;nonstationary noise;cauchy multichannel speech enhancement;semisupervised multichannel speech enhancement;probabilistic model;source magnitude spectrograms;source spatial scatter matrices;heavy-tailed multivariate complex cauchy distribution;gradient descent method;majoriation-equalization strategy;standard deviation scores;speech enhancement;computational modeling;spectrogram;decoding;probabilistic logic;noise measurement;training;multichannel speech enhancement;multivariate complex cauchy distribution;variational autoencoder;nonnegative matrix factorization"],"search_terms":["cauchy","multichannel","speech","enhancement","deep","speech","prior","fontaine","nugraha","badeau","yoshii","liutkus"],"title":"Cauchy Multichannel Speech Enhancement with a Deep Speech Prior","year":2019,"dataSources":["NqWTiMfRR56v86wRs","r6oz3cMyC99QfiuHW"]}