A curriculum learning method for improved noise robustness in automatic speech recognition

A curriculum learning method for improved noise robustness in automatic speech recognition. Braun, S., Neil, D., & Liu, S. In 2017 25th European Signal Processing Conference (EUSIPCO), pages 548-552, Aug, 2017.

Paper doi abstract bibtex

The performance of automatic speech recognition systems under noisy environments still leaves room for improvement. Speech enhancement or feature enhancement techniques for increasing noise robustness of these systems usually add components to the recognition system that need careful optimization. In this work, we propose the use of a relatively simple curriculum training strategy called accordion annealing (ACCAN). It uses a multi-stage training schedule where samples at signal-to-noise ratio (SNR) values as low as 0dB are first added and samples at increasing higher SNR values are gradually added up to an SNR value of 50dB. We also use a method called per-epoch noise mixing (PEM) that generates noisy training samples online during training and thus enables dynamically changing the SNR of our training data. Both the ACCAN and the PEM methods are evaluated on a end-to-end speech recognition pipeline on the Wall Street Journal corpus. ACCAN decreases the average word error rate (WER) on the 20dB to -10dB SNR range by up to 31.4% when compared to a conventional multi-condition training method.

@InProceedings{8081267,
  author = {S. Braun and D. Neil and S. Liu},
  booktitle = {2017 25th European Signal Processing Conference (EUSIPCO)},
  title = {A curriculum learning method for improved noise robustness in automatic speech recognition},
  year = {2017},
  pages = {548-552},
  abstract = {The performance of automatic speech recognition systems under noisy environments still leaves room for improvement. Speech enhancement or feature enhancement techniques for increasing noise robustness of these systems usually add components to the recognition system that need careful optimization. In this work, we propose the use of a relatively simple curriculum training strategy called accordion annealing (ACCAN). It uses a multi-stage training schedule where samples at signal-to-noise ratio (SNR) values as low as 0dB are first added and samples at increasing higher SNR values are gradually added up to an SNR value of 50dB. We also use a method called per-epoch noise mixing (PEM) that generates noisy training samples online during training and thus enables dynamically changing the SNR of our training data. Both the ACCAN and the PEM methods are evaluated on a end-to-end speech recognition pipeline on the Wall Street Journal corpus. ACCAN decreases the average word error rate (WER) on the 20dB to -10dB SNR range by up to 31.4% when compared to a conventional multi-condition training method.},
  keywords = {learning (artificial intelligence);optimisation;speech enhancement;speech recognition;noisy training samples;training data;ACCAN;PEM methods;end-to-end speech recognition pipeline;curriculum learning method;improved noise robustness;automatic speech recognition systems;feature enhancement techniques;accordion annealing;multistage training schedule;signal-to-noise ratio;SNR value;per-epoch noise mixing;speech enhancement;curriculum training strategy;word error rate;noise figure 0.0 dB;noise figure 50.0 dB;noise figure 20.0 dB to -10 dB;Training;Signal to noise ratio;Noise robustness;Training data;Noise measurement;Feature extraction;Neural networks},
  doi = {10.23919/EUSIPCO.2017.8081267},
  issn = {2076-1465},
  month = {Aug},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570341635.pdf},
}

Downloads: 0

{"_id":"RTq5eXbcoRH3bpq4Z","bibbaseid":"braun-neil-liu-acurriculumlearningmethodforimprovednoiserobustnessinautomaticspeechrecognition-2017","authorIDs":[],"author_short":["Braun, S.","Neil, D.","Liu, S."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["S."],"propositions":[],"lastnames":["Braun"],"suffixes":[]},{"firstnames":["D."],"propositions":[],"lastnames":["Neil"],"suffixes":[]},{"firstnames":["S."],"propositions":[],"lastnames":["Liu"],"suffixes":[]}],"booktitle":"2017 25th European Signal Processing Conference (EUSIPCO)","title":"A curriculum learning method for improved noise robustness in automatic speech recognition","year":"2017","pages":"548-552","abstract":"The performance of automatic speech recognition systems under noisy environments still leaves room for improvement. Speech enhancement or feature enhancement techniques for increasing noise robustness of these systems usually add components to the recognition system that need careful optimization. In this work, we propose the use of a relatively simple curriculum training strategy called accordion annealing (ACCAN). It uses a multi-stage training schedule where samples at signal-to-noise ratio (SNR) values as low as 0dB are first added and samples at increasing higher SNR values are gradually added up to an SNR value of 50dB. We also use a method called per-epoch noise mixing (PEM) that generates noisy training samples online during training and thus enables dynamically changing the SNR of our training data. Both the ACCAN and the PEM methods are evaluated on a end-to-end speech recognition pipeline on the Wall Street Journal corpus. ACCAN decreases the average word error rate (WER) on the 20dB to -10dB SNR range by up to 31.4% when compared to a conventional multi-condition training method.","keywords":"learning (artificial intelligence);optimisation;speech enhancement;speech recognition;noisy training samples;training data;ACCAN;PEM methods;end-to-end speech recognition pipeline;curriculum learning method;improved noise robustness;automatic speech recognition systems;feature enhancement techniques;accordion annealing;multistage training schedule;signal-to-noise ratio;SNR value;per-epoch noise mixing;speech enhancement;curriculum training strategy;word error rate;noise figure 0.0 dB;noise figure 50.0 dB;noise figure 20.0 dB to -10 dB;Training;Signal to noise ratio;Noise robustness;Training data;Noise measurement;Feature extraction;Neural networks","doi":"10.23919/EUSIPCO.2017.8081267","issn":"2076-1465","month":"Aug","url":"https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570341635.pdf","bibtex":"@InProceedings{8081267,\n author = {S. Braun and D. Neil and S. Liu},\n booktitle = {2017 25th European Signal Processing Conference (EUSIPCO)},\n title = {A curriculum learning method for improved noise robustness in automatic speech recognition},\n year = {2017},\n pages = {548-552},\n abstract = {The performance of automatic speech recognition systems under noisy environments still leaves room for improvement. Speech enhancement or feature enhancement techniques for increasing noise robustness of these systems usually add components to the recognition system that need careful optimization. In this work, we propose the use of a relatively simple curriculum training strategy called accordion annealing (ACCAN). It uses a multi-stage training schedule where samples at signal-to-noise ratio (SNR) values as low as 0dB are first added and samples at increasing higher SNR values are gradually added up to an SNR value of 50dB. We also use a method called per-epoch noise mixing (PEM) that generates noisy training samples online during training and thus enables dynamically changing the SNR of our training data. Both the ACCAN and the PEM methods are evaluated on a end-to-end speech recognition pipeline on the Wall Street Journal corpus. ACCAN decreases the average word error rate (WER) on the 20dB to -10dB SNR range by up to 31.4% when compared to a conventional multi-condition training method.},\n keywords = {learning (artificial intelligence);optimisation;speech enhancement;speech recognition;noisy training samples;training data;ACCAN;PEM methods;end-to-end speech recognition pipeline;curriculum learning method;improved noise robustness;automatic speech recognition systems;feature enhancement techniques;accordion annealing;multistage training schedule;signal-to-noise ratio;SNR value;per-epoch noise mixing;speech enhancement;curriculum training strategy;word error rate;noise figure 0.0 dB;noise figure 50.0 dB;noise figure 20.0 dB to -10 dB;Training;Signal to noise ratio;Noise robustness;Training data;Noise measurement;Feature extraction;Neural networks},\n doi = {10.23919/EUSIPCO.2017.8081267},\n issn = {2076-1465},\n month = {Aug},\n url = {https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570341635.pdf},\n}\n\n","author_short":["Braun, S.","Neil, D.","Liu, S."],"key":"8081267","id":"8081267","bibbaseid":"braun-neil-liu-acurriculumlearningmethodforimprovednoiserobustnessinautomaticspeechrecognition-2017","role":"author","urls":{"Paper":"https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570341635.pdf"},"keyword":["learning (artificial intelligence);optimisation;speech enhancement;speech recognition;noisy training samples;training data;ACCAN;PEM methods;end-to-end speech recognition pipeline;curriculum learning method;improved noise robustness;automatic speech recognition systems;feature enhancement techniques;accordion annealing;multistage training schedule;signal-to-noise ratio;SNR value;per-epoch noise mixing;speech enhancement;curriculum training strategy;word error rate;noise figure 0.0 dB;noise figure 50.0 dB;noise figure 20.0 dB to -10 dB;Training;Signal to noise ratio;Noise robustness;Training data;Noise measurement;Feature extraction;Neural networks"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/Roznn/EUSIPCO/main/eusipco2017url.bib","creationDate":"2021-02-13T16:38:25.551Z","downloads":0,"keywords":["learning (artificial intelligence);optimisation;speech enhancement;speech recognition;noisy training samples;training data;accan;pem methods;end-to-end speech recognition pipeline;curriculum learning method;improved noise robustness;automatic speech recognition systems;feature enhancement techniques;accordion annealing;multistage training schedule;signal-to-noise ratio;snr value;per-epoch noise mixing;speech enhancement;curriculum training strategy;word error rate;noise figure 0.0 db;noise figure 50.0 db;noise figure 20.0 db to -10 db;training;signal to noise ratio;noise robustness;training data;noise measurement;feature extraction;neural networks"],"search_terms":["curriculum","learning","method","improved","noise","robustness","automatic","speech","recognition","braun","neil","liu"],"title":"A curriculum learning method for improved noise robustness in automatic speech recognition","year":2017,"dataSources":["2MNbFYjMYTD6z7ExY","uP2aT6Qs8sfZJ6s8b","moGHosEp2wsfFe7LR"]}