Variants of mel-frequency cepstral coefficients for improved whispered speech speaker verification in mismatched conditions

Variants of mel-frequency cepstral coefficients for improved whispered speech speaker verification in mismatched conditions. Sarria-Paja, M. & Falk, T. H. In 2017 25th European Signal Processing Conference (EUSIPCO), pages 91-95, Aug, 2017.

Paper doi abstract bibtex

In this paper, automatic speaker verification using normal and whispered speech is explored. Typically, for speaker verification systems, varying vocal effort inputs during the testing stage significantly degrades system performance. Solutions such as feature mapping or addition of multi-style data during training and enrollment stages have been proposed but do not show similar advantages for the involved speaking styles. Herein, we focus attention on the extraction of invariant speaker-dependent information from normal and whispered speech, thus allowing for improved multi vocal effort speaker verification. We base our search on previously reported perceptual and acoustic insights and propose variants of the mel-frequency cepstral coefficients (MFCC). We show the complementarity of the proposed features via three fusion schemes. Gains as high as 39% and 43% can be achieved for normal and whispered speech, respectively, relative to the existing systems based on conventional MFCC features.

@InProceedings{8081175,
  author = {M. Sarria-Paja and T. H. Falk},
  booktitle = {2017 25th European Signal Processing Conference (EUSIPCO)},
  title = {Variants of mel-frequency cepstral coefficients for improved whispered speech speaker verification in mismatched conditions},
  year = {2017},
  pages = {91-95},
  abstract = {In this paper, automatic speaker verification using normal and whispered speech is explored. Typically, for speaker verification systems, varying vocal effort inputs during the testing stage significantly degrades system performance. Solutions such as feature mapping or addition of multi-style data during training and enrollment stages have been proposed but do not show similar advantages for the involved speaking styles. Herein, we focus attention on the extraction of invariant speaker-dependent information from normal and whispered speech, thus allowing for improved multi vocal effort speaker verification. We base our search on previously reported perceptual and acoustic insights and propose variants of the mel-frequency cepstral coefficients (MFCC). We show the complementarity of the proposed features via three fusion schemes. Gains as high as 39% and 43% can be achieved for normal and whispered speech, respectively, relative to the existing systems based on conventional MFCC features.},
  keywords = {cepstral analysis;feature extraction;speaker recognition;mel-frequency cepstral coefficients;automatic speaker verification;feature mapping;invariant speaker-dependent information;whispered speech speaker verification;invariant speaker-dependent information extraction;Speech;Mel frequency cepstral coefficient;Feature extraction;Databases;Data mining;Speech processing;Whispered speech;speaker verification;fusion;i-vector extraction;MFCC},
  doi = {10.23919/EUSIPCO.2017.8081175},
  issn = {2076-1465},
  month = {Aug},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570345615.pdf},
}

Downloads: 0

{"_id":"rsz2436uHLc34Nyix","bibbaseid":"sarriapaja-falk-variantsofmelfrequencycepstralcoefficientsforimprovedwhisperedspeechspeakerverificationinmismatchedconditions-2017","authorIDs":[],"author_short":["Sarria-Paja, M.","Falk, T. H."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["M."],"propositions":[],"lastnames":["Sarria-Paja"],"suffixes":[]},{"firstnames":["T.","H."],"propositions":[],"lastnames":["Falk"],"suffixes":[]}],"booktitle":"2017 25th European Signal Processing Conference (EUSIPCO)","title":"Variants of mel-frequency cepstral coefficients for improved whispered speech speaker verification in mismatched conditions","year":"2017","pages":"91-95","abstract":"In this paper, automatic speaker verification using normal and whispered speech is explored. Typically, for speaker verification systems, varying vocal effort inputs during the testing stage significantly degrades system performance. Solutions such as feature mapping or addition of multi-style data during training and enrollment stages have been proposed but do not show similar advantages for the involved speaking styles. Herein, we focus attention on the extraction of invariant speaker-dependent information from normal and whispered speech, thus allowing for improved multi vocal effort speaker verification. We base our search on previously reported perceptual and acoustic insights and propose variants of the mel-frequency cepstral coefficients (MFCC). We show the complementarity of the proposed features via three fusion schemes. Gains as high as 39% and 43% can be achieved for normal and whispered speech, respectively, relative to the existing systems based on conventional MFCC features.","keywords":"cepstral analysis;feature extraction;speaker recognition;mel-frequency cepstral coefficients;automatic speaker verification;feature mapping;invariant speaker-dependent information;whispered speech speaker verification;invariant speaker-dependent information extraction;Speech;Mel frequency cepstral coefficient;Feature extraction;Databases;Data mining;Speech processing;Whispered speech;speaker verification;fusion;i-vector extraction;MFCC","doi":"10.23919/EUSIPCO.2017.8081175","issn":"2076-1465","month":"Aug","url":"https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570345615.pdf","bibtex":"@InProceedings{8081175,\n author = {M. Sarria-Paja and T. H. Falk},\n booktitle = {2017 25th European Signal Processing Conference (EUSIPCO)},\n title = {Variants of mel-frequency cepstral coefficients for improved whispered speech speaker verification in mismatched conditions},\n year = {2017},\n pages = {91-95},\n abstract = {In this paper, automatic speaker verification using normal and whispered speech is explored. Typically, for speaker verification systems, varying vocal effort inputs during the testing stage significantly degrades system performance. Solutions such as feature mapping or addition of multi-style data during training and enrollment stages have been proposed but do not show similar advantages for the involved speaking styles. Herein, we focus attention on the extraction of invariant speaker-dependent information from normal and whispered speech, thus allowing for improved multi vocal effort speaker verification. We base our search on previously reported perceptual and acoustic insights and propose variants of the mel-frequency cepstral coefficients (MFCC). We show the complementarity of the proposed features via three fusion schemes. Gains as high as 39% and 43% can be achieved for normal and whispered speech, respectively, relative to the existing systems based on conventional MFCC features.},\n keywords = {cepstral analysis;feature extraction;speaker recognition;mel-frequency cepstral coefficients;automatic speaker verification;feature mapping;invariant speaker-dependent information;whispered speech speaker verification;invariant speaker-dependent information extraction;Speech;Mel frequency cepstral coefficient;Feature extraction;Databases;Data mining;Speech processing;Whispered speech;speaker verification;fusion;i-vector extraction;MFCC},\n doi = {10.23919/EUSIPCO.2017.8081175},\n issn = {2076-1465},\n month = {Aug},\n url = {https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570345615.pdf},\n}\n\n","author_short":["Sarria-Paja, M.","Falk, T. H."],"key":"8081175","id":"8081175","bibbaseid":"sarriapaja-falk-variantsofmelfrequencycepstralcoefficientsforimprovedwhisperedspeechspeakerverificationinmismatchedconditions-2017","role":"author","urls":{"Paper":"https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570345615.pdf"},"keyword":["cepstral analysis;feature extraction;speaker recognition;mel-frequency cepstral coefficients;automatic speaker verification;feature mapping;invariant speaker-dependent information;whispered speech speaker verification;invariant speaker-dependent information extraction;Speech;Mel frequency cepstral coefficient;Feature extraction;Databases;Data mining;Speech processing;Whispered speech;speaker verification;fusion;i-vector extraction;MFCC"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/Roznn/EUSIPCO/main/eusipco2017url.bib","creationDate":"2021-02-13T16:38:25.505Z","downloads":0,"keywords":["cepstral analysis;feature extraction;speaker recognition;mel-frequency cepstral coefficients;automatic speaker verification;feature mapping;invariant speaker-dependent information;whispered speech speaker verification;invariant speaker-dependent information extraction;speech;mel frequency cepstral coefficient;feature extraction;databases;data mining;speech processing;whispered speech;speaker verification;fusion;i-vector extraction;mfcc"],"search_terms":["variants","mel","frequency","cepstral","coefficients","improved","whispered","speech","speaker","verification","mismatched","conditions","sarria-paja","falk"],"title":"Variants of mel-frequency cepstral coefficients for improved whispered speech speaker verification in mismatched conditions","year":2017,"dataSources":["2MNbFYjMYTD6z7ExY","uP2aT6Qs8sfZJ6s8b"]}