Combining temporal and spectral information for Query-by-Example Spoken Term Detection

Combining temporal and spectral information for Query-by-Example Spoken Term Detection. Gracia, C., Anguera, X., & Binefa, X. In 2014 22nd European Signal Processing Conference (EUSIPCO), pages 1487-1491, Sep., 2014.

Paper abstract bibtex

We present a system for Query-by-Example Spoken Term Detection on zero-resource languages. The system compares speech patterns by representing the signal using two different acoustic models, a Spectral Acoustic (SA) model covering the spectral characteristics of the signal, and a Temporal Acoustic (TA) model covering the temporal evolution of the speech signal. Given a query and a utterance to be compared, first we compute their posterior probabilities according to each of the two models, compute similarity matrices for each model and combine these into a single enhanced matrix. Subsequence-Dynamic Time Warping (S-DTW) algorithm is used to find optimal subsequence alignment paths on this final matrix. Our experiments on data from the 2013 Spoken Web Search (SWS) task at Mediaeval benchmark evaluation show that this approach provides state of the art results and significantly improves both the single model strategies and the standard metric baselines.

@InProceedings{6952537,
  author = {C. Gracia and X. Anguera and X. Binefa},
  booktitle = {2014 22nd European Signal Processing Conference (EUSIPCO)},
  title = {Combining temporal and spectral information for Query-by-Example Spoken Term Detection},
  year = {2014},
  pages = {1487-1491},
  abstract = {We present a system for Query-by-Example Spoken Term Detection on zero-resource languages. The system compares speech patterns by representing the signal using two different acoustic models, a Spectral Acoustic (SA) model covering the spectral characteristics of the signal, and a Temporal Acoustic (TA) model covering the temporal evolution of the speech signal. Given a query and a utterance to be compared, first we compute their posterior probabilities according to each of the two models, compute similarity matrices for each model and combine these into a single enhanced matrix. Subsequence-Dynamic Time Warping (S-DTW) algorithm is used to find optimal subsequence alignment paths on this final matrix. Our experiments on data from the 2013 Spoken Web Search (SWS) task at Mediaeval benchmark evaluation show that this approach provides state of the art results and significantly improves both the single model strategies and the standard metric baselines.},
  keywords = {audio databases;learning (artificial intelligence);pattern matching;query processing;speech processing;query-by-example spoken term detection;optimal subsequence alignment paths;subsequence dynamic time warping algorithm;speech signal;temporal acoustic model;spectral acoustic model;speech patterns;zero resource languages;spectral information;temporal information;Acoustics;Speech;Vectors;Data models;Computational modeling;Hidden Markov models;Adaptation models;Query by example;zero resources languages;unsupervised learning;long temporal context},
  issn = {2076-1465},
  month = {Sep.},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2014/html/papers/1569925447.pdf},
}

Downloads: 0

{"_id":"FzkE4benKtTSdHSKY","bibbaseid":"gracia-anguera-binefa-combiningtemporalandspectralinformationforquerybyexamplespokentermdetection-2014","authorIDs":[],"author_short":["Gracia, C.","Anguera, X.","Binefa, X."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["C."],"propositions":[],"lastnames":["Gracia"],"suffixes":[]},{"firstnames":["X."],"propositions":[],"lastnames":["Anguera"],"suffixes":[]},{"firstnames":["X."],"propositions":[],"lastnames":["Binefa"],"suffixes":[]}],"booktitle":"2014 22nd European Signal Processing Conference (EUSIPCO)","title":"Combining temporal and spectral information for Query-by-Example Spoken Term Detection","year":"2014","pages":"1487-1491","abstract":"We present a system for Query-by-Example Spoken Term Detection on zero-resource languages. The system compares speech patterns by representing the signal using two different acoustic models, a Spectral Acoustic (SA) model covering the spectral characteristics of the signal, and a Temporal Acoustic (TA) model covering the temporal evolution of the speech signal. Given a query and a utterance to be compared, first we compute their posterior probabilities according to each of the two models, compute similarity matrices for each model and combine these into a single enhanced matrix. Subsequence-Dynamic Time Warping (S-DTW) algorithm is used to find optimal subsequence alignment paths on this final matrix. Our experiments on data from the 2013 Spoken Web Search (SWS) task at Mediaeval benchmark evaluation show that this approach provides state of the art results and significantly improves both the single model strategies and the standard metric baselines.","keywords":"audio databases;learning (artificial intelligence);pattern matching;query processing;speech processing;query-by-example spoken term detection;optimal subsequence alignment paths;subsequence dynamic time warping algorithm;speech signal;temporal acoustic model;spectral acoustic model;speech patterns;zero resource languages;spectral information;temporal information;Acoustics;Speech;Vectors;Data models;Computational modeling;Hidden Markov models;Adaptation models;Query by example;zero resources languages;unsupervised learning;long temporal context","issn":"2076-1465","month":"Sep.","url":"https://www.eurasip.org/proceedings/eusipco/eusipco2014/html/papers/1569925447.pdf","bibtex":"@InProceedings{6952537,\n author = {C. Gracia and X. Anguera and X. Binefa},\n booktitle = {2014 22nd European Signal Processing Conference (EUSIPCO)},\n title = {Combining temporal and spectral information for Query-by-Example Spoken Term Detection},\n year = {2014},\n pages = {1487-1491},\n abstract = {We present a system for Query-by-Example Spoken Term Detection on zero-resource languages. The system compares speech patterns by representing the signal using two different acoustic models, a Spectral Acoustic (SA) model covering the spectral characteristics of the signal, and a Temporal Acoustic (TA) model covering the temporal evolution of the speech signal. Given a query and a utterance to be compared, first we compute their posterior probabilities according to each of the two models, compute similarity matrices for each model and combine these into a single enhanced matrix. Subsequence-Dynamic Time Warping (S-DTW) algorithm is used to find optimal subsequence alignment paths on this final matrix. Our experiments on data from the 2013 Spoken Web Search (SWS) task at Mediaeval benchmark evaluation show that this approach provides state of the art results and significantly improves both the single model strategies and the standard metric baselines.},\n keywords = {audio databases;learning (artificial intelligence);pattern matching;query processing;speech processing;query-by-example spoken term detection;optimal subsequence alignment paths;subsequence dynamic time warping algorithm;speech signal;temporal acoustic model;spectral acoustic model;speech patterns;zero resource languages;spectral information;temporal information;Acoustics;Speech;Vectors;Data models;Computational modeling;Hidden Markov models;Adaptation models;Query by example;zero resources languages;unsupervised learning;long temporal context},\n issn = {2076-1465},\n month = {Sep.},\n url = {https://www.eurasip.org/proceedings/eusipco/eusipco2014/html/papers/1569925447.pdf},\n}\n\n","author_short":["Gracia, C.","Anguera, X.","Binefa, X."],"key":"6952537","id":"6952537","bibbaseid":"gracia-anguera-binefa-combiningtemporalandspectralinformationforquerybyexamplespokentermdetection-2014","role":"author","urls":{"Paper":"https://www.eurasip.org/proceedings/eusipco/eusipco2014/html/papers/1569925447.pdf"},"keyword":["audio databases;learning (artificial intelligence);pattern matching;query processing;speech processing;query-by-example spoken term detection;optimal subsequence alignment paths;subsequence dynamic time warping algorithm;speech signal;temporal acoustic model;spectral acoustic model;speech patterns;zero resource languages;spectral information;temporal information;Acoustics;Speech;Vectors;Data models;Computational modeling;Hidden Markov models;Adaptation models;Query by example;zero resources languages;unsupervised learning;long temporal context"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/Roznn/EUSIPCO/main/eusipco2014url.bib","creationDate":"2021-02-13T17:43:41.695Z","downloads":0,"keywords":["audio databases;learning (artificial intelligence);pattern matching;query processing;speech processing;query-by-example spoken term detection;optimal subsequence alignment paths;subsequence dynamic time warping algorithm;speech signal;temporal acoustic model;spectral acoustic model;speech patterns;zero resource languages;spectral information;temporal information;acoustics;speech;vectors;data models;computational modeling;hidden markov models;adaptation models;query by example;zero resources languages;unsupervised learning;long temporal context"],"search_terms":["combining","temporal","spectral","information","query","example","spoken","term","detection","gracia","anguera","binefa"],"title":"Combining temporal and spectral information for Query-by-Example Spoken Term Detection","year":2014,"dataSources":["A2ezyFL6GG6na7bbs","oZFG3eQZPXnykPgnE"]}