A Critical Assessment of Spoken Utterance Retrieval through Approximate Lattice Representations. Kazemian, S. Master's thesis, Department of Computer Science, University of Toronto, January, 2009.
abstract   bibtex   
This paper compares the performance of Position-specific Posterior Lattices (PSPL) and Confusion Networks (CN) applied to Spoken Utterance Retrieval, and tests these recent proposals against several baselines, namely 1-best transcription, using the whole lattice, and the set-of-words baseline. The set-of-words baseline is used for the first time in context of Spoken Utterance Retrieval. PSPL and CN provide compact representations that generalize the original segment lattices and provide greater recall robustness, but have yet to be evaluated against each other in multiple WER conditions for Spoken Utterance Retrieval. Our comparisons suggest that while PSPL and Confusion Networks have comparable recall, the former is slightly more precise, although its merit appears to be coupled to the assumptions of low-frequency search queries and low- WER environments. While in the low-WER environments all methods tested have comparable performance, both PSPL and CN significantly outperform the 1-best transcription in high-WER environments but perform similarly to the whole lattice and set-of-words baselines.
@MastersThesis{	  kazemian:2009:thesis,
  author	= {Siavash Kazemian},
  title		= {A Critical Assessment of Spoken Utterance Retrieval
		  through Approximate Lattice Representations},
  year		= {2009},
  school	= {Department of Computer Science, University of Toronto},
  month		= {January},
  abstract	= {This paper compares the performance of Position-specific
		  Posterior Lattices (PSPL) and Confusion Networks (CN)
		  applied to Spoken Utterance Retrieval, and tests these
		  recent proposals against several baselines, namely 1-best
		  transcription, using the whole lattice, and the
		  set-of-words baseline. The set-of-words baseline is used
		  for the first time in context of Spoken Utterance
		  Retrieval. PSPL and CN provide compact representations that
		  generalize the original segment lattices and provide
		  greater recall robustness, but have yet to be evaluated
		  against each other in multiple WER conditions for Spoken
		  Utterance Retrieval. Our comparisons suggest that while
		  PSPL and Confusion Networks have comparable recall, the
		  former is slightly more precise, although its merit appears
		  to be coupled to the assumptions of low-frequency search
		  queries and low- WER environments. While in the low-WER
		  environments all methods tested have comparable
		  performance, both PSPL and CN significantly outperform the
		  1-best transcription in high-WER environments but perform
		  similarly to the whole lattice and set-of-words baselines.},
  download	= {http://ftp.cs.toronto.edu/pub/gh/Kazemian-MSc-paper.pdf}
}

Downloads: 0