Statistical modeling and retrieval of polyphonic music. Unal, E., Georgiou, P. G., Narayanan, S. S., & Chew, E. In 2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Proceedings, pages 405–409, Crete, 2007.
doi  abstract   bibtex   
AbstractIn this article, we propose a solution to the problem of query by example for polyphonic music audio.We first present a generic mid-level representation for audio queries. Unlike previous efforts in the literature, the proposed representation is not dependent on the different spectral characteristics of different musical instruments and the accurate location of note onsets and offsets. This is achieved by first mapping the short term frequency spectrum of consecutive audio frames to the musical space (The Spiral Array) and defining a tonal identity with respect to center of effect that is generated by the spectral weights of the musical notes. We then use the resulting single dimensional text representations of the audio to create n-gram statistical sequence models to track the tonal characteristics and the behavior of the pieces. After performing appropriate smoothing, we build a collection of melodic n-gram models for testing. Using perplexity-based scoring, we test the likelihood of a sequence of lexical chords (an audio query) given each model in the database collection. Initial results show that, some variations of the input piece appears in the top 5 results 81pct of the time for whole melody inputs within a 500 polyphonic melody database. We also tested the retrieval engine for small audio clips. Using 25s segments, variations of the input piece are among the top 5 results 75pct of the time.
@InProceedings{    unal.ea2007-statistical,
    author       = {Unal, Erdem and Georgiou, Panayiotis G. and Narayanan,
                   Shrikanth S. and Chew, Elaine},
    year         = {2007},
    title        = {Statistical modeling and retrieval of polyphonic music},
    abstract     = {AbstractIn this article, we propose a solution to the
                   problem of query by example for polyphonic music audio.We
                   first present a generic mid-level representation for audio
                   queries. Unlike previous efforts in the literature, the
                   proposed representation is not dependent on the different
                   spectral characteristics of different musical instruments
                   and the accurate location of note onsets and offsets. This
                   is achieved by first mapping the short term frequency
                   spectrum of consecutive audio frames to the musical space
                   (The Spiral Array) and defining a tonal identity with
                   respect to center of effect that is generated by the
                   spectral weights of the musical notes. We then use the
                   resulting single dimensional text representations of the
                   audio to create n-gram statistical sequence models to
                   track the tonal characteristics and the behavior of the
                   pieces. After performing appropriate smoothing, we build a
                   collection of melodic n-gram models for testing. Using
                   perplexity-based scoring, we test the likelihood of a
                   sequence of lexical chords (an audio query) given each
                   model in the database collection. Initial results show
                   that, some variations of the input piece appears in the
                   top 5 results 81pct of the time for whole melody inputs
                   within a 500 polyphonic melody database. We also tested
                   the retrieval engine for small audio clips. Using 25s
                   segments, variations of the input piece are among the top
                   5 results 75pct of the time.},
    address      = {Crete},
    booktitle    = {2007 IEEE 9Th International Workshop on Multimedia Signal
                   Processing, MMSP 2007 - Proceedings},
    doi          = {10.1109/MMSP.2007.4412902},
    isbn         = {1424412749},
    keywords     = {computer and music},
    mendeley-tags= {computer and music},
    number       = {November},
    pages        = {405--409}
}

Downloads: 0