Robust speech recognition using warped DFT-based cepstral features in clean and multistyle training

Robust speech recognition using warped DFT-based cepstral features in clean and multistyle training. Alam, M. J., Kenny, P., Dumouchel, P., & O'Shaughnessy, D. In 2014 22nd European Signal Processing Conference (EUSIPCO), pages 1791-1795, Sep., 2014.

Paper abstract bibtex

This paper investigates the robustness of the warped discrete Fourier transform (WDFT)-based cepstral features for continuous speech recognition under clean and multistyle training conditions. In the MFCC and PLP front-ends, in order to approximate the nonlinear characteristics of the human auditory system in frequency, the speech spectrum is warped using the Mel-scale filterbank, which typically consists of overlapping triangular filters. It is well known that such nonlinear frequency transformation-based features provide better speech recognition accuracy than linear frequency scale features. It has been found that warping the DFT spectrum directly, rather than using filterbank averaging, provides a more precise approximation to the perceptual scales. WDFT provides non-uniform resolution filter-banks whereas DFT provides uniform resolution filter-banks. Here, we provide a performance evaluation of the following variants of the warped cepstral features: WDFT, and WDFT-linear prediction-based MFCC features. Experiments were carried out on the AURORA-4 task. Experimental results demonstrate that the WDFT-based cepstral features outperform the conventional MFCC and PLP both in clean and multistyle training conditions in terms of recognition error rates.

@InProceedings{6952658,
  author = {M. J. Alam and P. Kenny and P. Dumouchel and D. O'Shaughnessy},
  booktitle = {2014 22nd European Signal Processing Conference (EUSIPCO)},
  title = {Robust speech recognition using warped DFT-based cepstral features in clean and multistyle training},
  year = {2014},
  pages = {1791-1795},
  abstract = {This paper investigates the robustness of the warped discrete Fourier transform (WDFT)-based cepstral features for continuous speech recognition under clean and multistyle training conditions. In the MFCC and PLP front-ends, in order to approximate the nonlinear characteristics of the human auditory system in frequency, the speech spectrum is warped using the Mel-scale filterbank, which typically consists of overlapping triangular filters. It is well known that such nonlinear frequency transformation-based features provide better speech recognition accuracy than linear frequency scale features. It has been found that warping the DFT spectrum directly, rather than using filterbank averaging, provides a more precise approximation to the perceptual scales. WDFT provides non-uniform resolution filter-banks whereas DFT provides uniform resolution filter-banks. Here, we provide a performance evaluation of the following variants of the warped cepstral features: WDFT, and WDFT-linear prediction-based MFCC features. Experiments were carried out on the AURORA-4 task. Experimental results demonstrate that the WDFT-based cepstral features outperform the conventional MFCC and PLP both in clean and multistyle training conditions in terms of recognition error rates.},
  keywords = {channel bank filters;discrete Fourier transforms;speech recognition;robust speech recognition;warped DFT based cepstral features;clean training;multistyle training;warped discrete Fourier transform;MFCC front end;PLP front end;human auditory system nonlinear characteristics;Mel-scale filter bank;perceptual scale;AURORA-4 task;Mel frequency cepstral coefficient;Speech;Speech recognition;Feature extraction;Discrete Fourier transforms;Training;Warped DFT;speech recognition;multi-style training;spectrum enhancement;linear prediction},
  issn = {2076-1465},
  month = {Sep.},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2014/html/papers/1569926727.pdf},
}

Downloads: 0

{"_id":"oPcR26wxwa53wctaN","bibbaseid":"alam-kenny-dumouchel-oshaughnessy-robustspeechrecognitionusingwarpeddftbasedcepstralfeaturesincleanandmultistyletraining-2014","authorIDs":[],"author_short":["Alam, M. J.","Kenny, P.","Dumouchel, P.","O'Shaughnessy, D."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["M.","J."],"propositions":[],"lastnames":["Alam"],"suffixes":[]},{"firstnames":["P."],"propositions":[],"lastnames":["Kenny"],"suffixes":[]},{"firstnames":["P."],"propositions":[],"lastnames":["Dumouchel"],"suffixes":[]},{"firstnames":["D."],"propositions":[],"lastnames":["O'Shaughnessy"],"suffixes":[]}],"booktitle":"2014 22nd European Signal Processing Conference (EUSIPCO)","title":"Robust speech recognition using warped DFT-based cepstral features in clean and multistyle training","year":"2014","pages":"1791-1795","abstract":"This paper investigates the robustness of the warped discrete Fourier transform (WDFT)-based cepstral features for continuous speech recognition under clean and multistyle training conditions. In the MFCC and PLP front-ends, in order to approximate the nonlinear characteristics of the human auditory system in frequency, the speech spectrum is warped using the Mel-scale filterbank, which typically consists of overlapping triangular filters. It is well known that such nonlinear frequency transformation-based features provide better speech recognition accuracy than linear frequency scale features. It has been found that warping the DFT spectrum directly, rather than using filterbank averaging, provides a more precise approximation to the perceptual scales. WDFT provides non-uniform resolution filter-banks whereas DFT provides uniform resolution filter-banks. Here, we provide a performance evaluation of the following variants of the warped cepstral features: WDFT, and WDFT-linear prediction-based MFCC features. Experiments were carried out on the AURORA-4 task. Experimental results demonstrate that the WDFT-based cepstral features outperform the conventional MFCC and PLP both in clean and multistyle training conditions in terms of recognition error rates.","keywords":"channel bank filters;discrete Fourier transforms;speech recognition;robust speech recognition;warped DFT based cepstral features;clean training;multistyle training;warped discrete Fourier transform;MFCC front end;PLP front end;human auditory system nonlinear characteristics;Mel-scale filter bank;perceptual scale;AURORA-4 task;Mel frequency cepstral coefficient;Speech;Speech recognition;Feature extraction;Discrete Fourier transforms;Training;Warped DFT;speech recognition;multi-style training;spectrum enhancement;linear prediction","issn":"2076-1465","month":"Sep.","url":"https://www.eurasip.org/proceedings/eusipco/eusipco2014/html/papers/1569926727.pdf","bibtex":"@InProceedings{6952658,\n author = {M. J. Alam and P. Kenny and P. Dumouchel and D. O'Shaughnessy},\n booktitle = {2014 22nd European Signal Processing Conference (EUSIPCO)},\n title = {Robust speech recognition using warped DFT-based cepstral features in clean and multistyle training},\n year = {2014},\n pages = {1791-1795},\n abstract = {This paper investigates the robustness of the warped discrete Fourier transform (WDFT)-based cepstral features for continuous speech recognition under clean and multistyle training conditions. In the MFCC and PLP front-ends, in order to approximate the nonlinear characteristics of the human auditory system in frequency, the speech spectrum is warped using the Mel-scale filterbank, which typically consists of overlapping triangular filters. It is well known that such nonlinear frequency transformation-based features provide better speech recognition accuracy than linear frequency scale features. It has been found that warping the DFT spectrum directly, rather than using filterbank averaging, provides a more precise approximation to the perceptual scales. WDFT provides non-uniform resolution filter-banks whereas DFT provides uniform resolution filter-banks. Here, we provide a performance evaluation of the following variants of the warped cepstral features: WDFT, and WDFT-linear prediction-based MFCC features. Experiments were carried out on the AURORA-4 task. Experimental results demonstrate that the WDFT-based cepstral features outperform the conventional MFCC and PLP both in clean and multistyle training conditions in terms of recognition error rates.},\n keywords = {channel bank filters;discrete Fourier transforms;speech recognition;robust speech recognition;warped DFT based cepstral features;clean training;multistyle training;warped discrete Fourier transform;MFCC front end;PLP front end;human auditory system nonlinear characteristics;Mel-scale filter bank;perceptual scale;AURORA-4 task;Mel frequency cepstral coefficient;Speech;Speech recognition;Feature extraction;Discrete Fourier transforms;Training;Warped DFT;speech recognition;multi-style training;spectrum enhancement;linear prediction},\n issn = {2076-1465},\n month = {Sep.},\n url = {https://www.eurasip.org/proceedings/eusipco/eusipco2014/html/papers/1569926727.pdf},\n}\n\n","author_short":["Alam, M. J.","Kenny, P.","Dumouchel, P.","O'Shaughnessy, D."],"key":"6952658","id":"6952658","bibbaseid":"alam-kenny-dumouchel-oshaughnessy-robustspeechrecognitionusingwarpeddftbasedcepstralfeaturesincleanandmultistyletraining-2014","role":"author","urls":{"Paper":"https://www.eurasip.org/proceedings/eusipco/eusipco2014/html/papers/1569926727.pdf"},"keyword":["channel bank filters;discrete Fourier transforms;speech recognition;robust speech recognition;warped DFT based cepstral features;clean training;multistyle training;warped discrete Fourier transform;MFCC front end;PLP front end;human auditory system nonlinear characteristics;Mel-scale filter bank;perceptual scale;AURORA-4 task;Mel frequency cepstral coefficient;Speech;Speech recognition;Feature extraction;Discrete Fourier transforms;Training;Warped DFT;speech recognition;multi-style training;spectrum enhancement;linear prediction"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/Roznn/EUSIPCO/main/eusipco2014url.bib","creationDate":"2021-02-13T17:43:41.733Z","downloads":0,"keywords":["channel bank filters;discrete fourier transforms;speech recognition;robust speech recognition;warped dft based cepstral features;clean training;multistyle training;warped discrete fourier transform;mfcc front end;plp front end;human auditory system nonlinear characteristics;mel-scale filter bank;perceptual scale;aurora-4 task;mel frequency cepstral coefficient;speech;speech recognition;feature extraction;discrete fourier transforms;training;warped dft;speech recognition;multi-style training;spectrum enhancement;linear prediction"],"search_terms":["robust","speech","recognition","using","warped","dft","based","cepstral","features","clean","multistyle","training","alam","kenny","dumouchel","o'shaughnessy"],"title":"Robust speech recognition using warped DFT-based cepstral features in clean and multistyle training","year":2014,"dataSources":["A2ezyFL6GG6na7bbs","oZFG3eQZPXnykPgnE"]}