Feature extraction and temporal segmentation of acoustic signals. Rossignol, S.; Rodet, X.; Soumagne, J.; Collette, J.; and Depalle, P. In Proc. ICMC, volume 98, pages 199--202, 1998. Citeseer.
bibtex   
@InProceedings{Rossignol1998,
  Title                    = {Feature extraction and temporal segmentation of acoustic signals},
  Author                   = {Rossignol, S. and Rodet, X. and Soumagne, J. and Collette, J. and Depalle, P.},
  Booktitle                = {Proc. ICMC},
  Year                     = {1998},
  Organization             = {Citeseer},
  Pages                    = {199--202},
  Volume                   = {98},

  Review                   = {- want to characterize sound
- proposes various different features to segment on

Rossignol \etal \cite{Rossignol1998} wants to characterize sounds, and does so by three layers of segmentation. The first layer separates between speech, singing voice and instrumental components. The second layer separates the vibrato from the source signal. The last layer segments into notes or phones. The first layer is done by looking at the mean and variance of STFT, the spectral centroid and the zero-crossing rate (ZCR) between two successive sound frames. Classification is done with pre-trained GMM, \emph{k}-NN and NN classifiers, with the HMM showing the best results if only STFT features are used, and \emph{k}-NN showing the best results if all 6 features are used. The vibrato can be segmented out by a series of frequency-based thresholding. Notes and phone segmenation is conducted by looking at frequency, energy and STFT features.

Rossignol \etal \cite{Rossignol1998} characterize sounds into speech, singing voice and instrumental components. This is done by looking at the mean and variance of STFT, the spectral centroid and the ZCR between two successive sound frames. Classification is done with pre-trained GMM, \emph{k}-NN and NN classifiers, with \emph{k}-NN showing the best result. $Ver_{AllPoints}$ verification was performed on 10 minutes of labelled speech data and 10 minutes of labelled music data. $Ver_{Class}$ for the GMM, \emph{k}-NN and NN classifiers were 23\%, 6\% and 9\%, respectively.},
  Timestamp                = {2013.08.29}
}
Downloads: 0