Class specific GMM based sparse feature for speech units classification

Class specific GMM based sparse feature for speech units classification. Sharma, P., Abrol, V., Dileep, A. D., & Sao, A. K. In 2017 25th European Signal Processing Conference (EUSIPCO), pages 528-532, Aug, 2017.

Paper doi abstract bibtex

In this paper, features based on the sparse representation (SR) are proposed for the classification of speech units. The proposed method employs multiple dictionaries to effectively model variations present in the speech signal. Here, a Gaussian mixture model (GMM) is built using spectral features corresponding to frames of all the examples of a speech class. Multiple dictionaries corresponding to different mixture are learned using the respective speech frames. Given a train/test speech frame, minimum spectral distance measure from the GMM means is employed to select an appropriate dictionary. The selected dictionary is used to obtain the sparse feature representation, which is used for the classification of speech units. The effectiveness of the proposed feature is demonstrated using continuous density hidden Markov model (CDHMM) based classifiers for (i) classification of isolated utterances of E-set of English alphabet, (ii) classification of consonant-vowel (CV) segments in Hindi language and (iii) classification of phoneme from TIMIT phonetic corpus. Experimental results reveal that the proposed features outperforms existing feature representations for various speech units classification tasks.

@InProceedings{8081263,
  author = {P. Sharma and V. Abrol and A. D. Dileep and A. K. Sao},
  booktitle = {2017 25th European Signal Processing Conference (EUSIPCO)},
  title = {Class specific GMM based sparse feature for speech units classification},
  year = {2017},
  pages = {528-532},
  abstract = {In this paper, features based on the sparse representation (SR) are proposed for the classification of speech units. The proposed method employs multiple dictionaries to effectively model variations present in the speech signal. Here, a Gaussian mixture model (GMM) is built using spectral features corresponding to frames of all the examples of a speech class. Multiple dictionaries corresponding to different mixture are learned using the respective speech frames. Given a train/test speech frame, minimum spectral distance measure from the GMM means is employed to select an appropriate dictionary. The selected dictionary is used to obtain the sparse feature representation, which is used for the classification of speech units. The effectiveness of the proposed feature is demonstrated using continuous density hidden Markov model (CDHMM) based classifiers for (i) classification of isolated utterances of E-set of English alphabet, (ii) classification of consonant-vowel (CV) segments in Hindi language and (iii) classification of phoneme from TIMIT phonetic corpus. Experimental results reveal that the proposed features outperforms existing feature representations for various speech units classification tasks.},
  keywords = {feature extraction;Gaussian processes;hidden Markov models;mixture models;natural language processing;signal representation;speaker recognition;speech processing;speech recognition;speech units classification tasks;sparse representation;multiple dictionaries;speech signal;Gaussian mixture model;spectral features;train/test speech frame;minimum spectral distance measure;sparse feature representation;Markov model based classifiers;feature representations;speech frames;class specific GMM based sparse feature;continuous density hidden Markov model;CDHMM;consonant-vowel segments;CV segments;TIMIT phonetic corpus;Speech;Dictionaries;Mel frequency cepstral coefficient;Speech recognition;Hidden Markov models;Machine learning;Sparse representation;speech recognition;dictionary learning},
  doi = {10.23919/EUSIPCO.2017.8081263},
  issn = {2076-1465},
  month = {Aug},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570347441.pdf},
}

Downloads: 0

{"_id":"HtdsGYkBq3S3DfWbu","bibbaseid":"sharma-abrol-dileep-sao-classspecificgmmbasedsparsefeatureforspeechunitsclassification-2017","authorIDs":[],"author_short":["Sharma, P.","Abrol, V.","Dileep, A. D.","Sao, A. K."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["P."],"propositions":[],"lastnames":["Sharma"],"suffixes":[]},{"firstnames":["V."],"propositions":[],"lastnames":["Abrol"],"suffixes":[]},{"firstnames":["A.","D."],"propositions":[],"lastnames":["Dileep"],"suffixes":[]},{"firstnames":["A.","K."],"propositions":[],"lastnames":["Sao"],"suffixes":[]}],"booktitle":"2017 25th European Signal Processing Conference (EUSIPCO)","title":"Class specific GMM based sparse feature for speech units classification","year":"2017","pages":"528-532","abstract":"In this paper, features based on the sparse representation (SR) are proposed for the classification of speech units. The proposed method employs multiple dictionaries to effectively model variations present in the speech signal. Here, a Gaussian mixture model (GMM) is built using spectral features corresponding to frames of all the examples of a speech class. Multiple dictionaries corresponding to different mixture are learned using the respective speech frames. Given a train/test speech frame, minimum spectral distance measure from the GMM means is employed to select an appropriate dictionary. The selected dictionary is used to obtain the sparse feature representation, which is used for the classification of speech units. The effectiveness of the proposed feature is demonstrated using continuous density hidden Markov model (CDHMM) based classifiers for (i) classification of isolated utterances of E-set of English alphabet, (ii) classification of consonant-vowel (CV) segments in Hindi language and (iii) classification of phoneme from TIMIT phonetic corpus. Experimental results reveal that the proposed features outperforms existing feature representations for various speech units classification tasks.","keywords":"feature extraction;Gaussian processes;hidden Markov models;mixture models;natural language processing;signal representation;speaker recognition;speech processing;speech recognition;speech units classification tasks;sparse representation;multiple dictionaries;speech signal;Gaussian mixture model;spectral features;train/test speech frame;minimum spectral distance measure;sparse feature representation;Markov model based classifiers;feature representations;speech frames;class specific GMM based sparse feature;continuous density hidden Markov model;CDHMM;consonant-vowel segments;CV segments;TIMIT phonetic corpus;Speech;Dictionaries;Mel frequency cepstral coefficient;Speech recognition;Hidden Markov models;Machine learning;Sparse representation;speech recognition;dictionary learning","doi":"10.23919/EUSIPCO.2017.8081263","issn":"2076-1465","month":"Aug","url":"https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570347441.pdf","bibtex":"@InProceedings{8081263,\n author = {P. Sharma and V. Abrol and A. D. Dileep and A. K. Sao},\n booktitle = {2017 25th European Signal Processing Conference (EUSIPCO)},\n title = {Class specific GMM based sparse feature for speech units classification},\n year = {2017},\n pages = {528-532},\n abstract = {In this paper, features based on the sparse representation (SR) are proposed for the classification of speech units. The proposed method employs multiple dictionaries to effectively model variations present in the speech signal. Here, a Gaussian mixture model (GMM) is built using spectral features corresponding to frames of all the examples of a speech class. Multiple dictionaries corresponding to different mixture are learned using the respective speech frames. Given a train/test speech frame, minimum spectral distance measure from the GMM means is employed to select an appropriate dictionary. The selected dictionary is used to obtain the sparse feature representation, which is used for the classification of speech units. The effectiveness of the proposed feature is demonstrated using continuous density hidden Markov model (CDHMM) based classifiers for (i) classification of isolated utterances of E-set of English alphabet, (ii) classification of consonant-vowel (CV) segments in Hindi language and (iii) classification of phoneme from TIMIT phonetic corpus. Experimental results reveal that the proposed features outperforms existing feature representations for various speech units classification tasks.},\n keywords = {feature extraction;Gaussian processes;hidden Markov models;mixture models;natural language processing;signal representation;speaker recognition;speech processing;speech recognition;speech units classification tasks;sparse representation;multiple dictionaries;speech signal;Gaussian mixture model;spectral features;train/test speech frame;minimum spectral distance measure;sparse feature representation;Markov model based classifiers;feature representations;speech frames;class specific GMM based sparse feature;continuous density hidden Markov model;CDHMM;consonant-vowel segments;CV segments;TIMIT phonetic corpus;Speech;Dictionaries;Mel frequency cepstral coefficient;Speech recognition;Hidden Markov models;Machine learning;Sparse representation;speech recognition;dictionary learning},\n doi = {10.23919/EUSIPCO.2017.8081263},\n issn = {2076-1465},\n month = {Aug},\n url = {https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570347441.pdf},\n}\n\n","author_short":["Sharma, P.","Abrol, V.","Dileep, A. D.","Sao, A. K."],"key":"8081263","id":"8081263","bibbaseid":"sharma-abrol-dileep-sao-classspecificgmmbasedsparsefeatureforspeechunitsclassification-2017","role":"author","urls":{"Paper":"https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570347441.pdf"},"keyword":["feature extraction;Gaussian processes;hidden Markov models;mixture models;natural language processing;signal representation;speaker recognition;speech processing;speech recognition;speech units classification tasks;sparse representation;multiple dictionaries;speech signal;Gaussian mixture model;spectral features;train/test speech frame;minimum spectral distance measure;sparse feature representation;Markov model based classifiers;feature representations;speech frames;class specific GMM based sparse feature;continuous density hidden Markov model;CDHMM;consonant-vowel segments;CV segments;TIMIT phonetic corpus;Speech;Dictionaries;Mel frequency cepstral coefficient;Speech recognition;Hidden Markov models;Machine learning;Sparse representation;speech recognition;dictionary learning"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/Roznn/EUSIPCO/main/eusipco2017url.bib","creationDate":"2021-02-13T16:38:25.549Z","downloads":0,"keywords":["feature extraction;gaussian processes;hidden markov models;mixture models;natural language processing;signal representation;speaker recognition;speech processing;speech recognition;speech units classification tasks;sparse representation;multiple dictionaries;speech signal;gaussian mixture model;spectral features;train/test speech frame;minimum spectral distance measure;sparse feature representation;markov model based classifiers;feature representations;speech frames;class specific gmm based sparse feature;continuous density hidden markov model;cdhmm;consonant-vowel segments;cv segments;timit phonetic corpus;speech;dictionaries;mel frequency cepstral coefficient;speech recognition;hidden markov models;machine learning;sparse representation;speech recognition;dictionary learning"],"search_terms":["class","specific","gmm","based","sparse","feature","speech","units","classification","sharma","abrol","dileep","sao"],"title":"Class specific GMM based sparse feature for speech units classification","year":2017,"dataSources":["2MNbFYjMYTD6z7ExY","uP2aT6Qs8sfZJ6s8b"]}