Robust speech recognition using warped DFT-based cepstral features in clean and multistyle training. Alam, M. J., Kenny, P., Dumouchel, P., & O'Shaughnessy, D. In 2014 22nd European Signal Processing Conference (EUSIPCO), pages 1791-1795, Sep., 2014.
Robust speech recognition using warped DFT-based cepstral features in clean and multistyle training [pdf]Paper  abstract   bibtex   
This paper investigates the robustness of the warped discrete Fourier transform (WDFT)-based cepstral features for continuous speech recognition under clean and multistyle training conditions. In the MFCC and PLP front-ends, in order to approximate the nonlinear characteristics of the human auditory system in frequency, the speech spectrum is warped using the Mel-scale filterbank, which typically consists of overlapping triangular filters. It is well known that such nonlinear frequency transformation-based features provide better speech recognition accuracy than linear frequency scale features. It has been found that warping the DFT spectrum directly, rather than using filterbank averaging, provides a more precise approximation to the perceptual scales. WDFT provides non-uniform resolution filter-banks whereas DFT provides uniform resolution filter-banks. Here, we provide a performance evaluation of the following variants of the warped cepstral features: WDFT, and WDFT-linear prediction-based MFCC features. Experiments were carried out on the AURORA-4 task. Experimental results demonstrate that the WDFT-based cepstral features outperform the conventional MFCC and PLP both in clean and multistyle training conditions in terms of recognition error rates.

Downloads: 0