In 2014 22nd European Signal Processing Conference (EUSIPCO), pages 1487-1491, Sep., 2014. Paper abstract bibtex
We present a system for Query-by-Example Spoken Term Detection on zero-resource languages. The system compares speech patterns by representing the signal using two different acoustic models, a Spectral Acoustic (SA) model covering the spectral characteristics of the signal, and a Temporal Acoustic (TA) model covering the temporal evolution of the speech signal. Given a query and a utterance to be compared, first we compute their posterior probabilities according to each of the two models, compute similarity matrices for each model and combine these into a single enhanced matrix. Subsequence-Dynamic Time Warping (S-DTW) algorithm is used to find optimal subsequence alignment paths on this final matrix. Our experiments on data from the 2013 Spoken Web Search (SWS) task at Mediaeval benchmark evaluation show that this approach provides state of the art results and significantly improves both the single model strategies and the standard metric baselines.