Deep convolutional networks on the pitch spiral for musical instrument recognition

Deep convolutional networks on the pitch spiral for musical instrument recognition. Lostanlen, V. & Cella, C. January, 2017. arXiv:1605.06644 [cs]

Paper doi abstract bibtex

Musical performance combines a wide range of pitches, nuances, and expressive techniques. Audio-based classification of musical instruments thus requires to build signal representations that are invariant to such transformations. This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data. In this context, we benchmark three different weight sharing strategies for deep convolutional networks in the time-frequency domain: temporal kernels; time-frequency kernels; and a linear combination of time-frequency kernels which are one octave apart, akin to a Shepard pitch spiral. We provide an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds with a fixed spectral envelope, which are archetypal of musical notes. The best classification accuracy is obtained by hybridizing all three convolutional layers into a single deep learning architecture.

@misc{lostanlen_deep_2017,
	title = {Deep convolutional networks on the pitch spiral for musical instrument recognition},
	url = {http://arxiv.org/abs/1605.06644},
	doi = {10.48550/arXiv.1605.06644},
	abstract = {Musical performance combines a wide range of pitches, nuances, and expressive techniques. Audio-based classification of musical instruments thus requires to build signal representations that are invariant to such transformations. This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data. In this context, we benchmark three different weight sharing strategies for deep convolutional networks in the time-frequency domain: temporal kernels; time-frequency kernels; and a linear combination of time-frequency kernels which are one octave apart, akin to a Shepard pitch spiral. We provide an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds with a fixed spectral envelope, which are archetypal of musical notes. The best classification accuracy is obtained by hybridizing all three convolutional layers into a single deep learning architecture.},
	urldate = {2022-10-12},
	publisher = {arXiv},
	author = {Lostanlen, Vincent and Cella, Carmine-Emanuele},
	month = jan,
	year = {2017},
	note = {arXiv:1605.06644 [cs]},
	keywords = {Cited, Computer Science - Sound},
}

Downloads: 0

{"_id":"zTxGekoBSPYLhHhHt","bibbaseid":"lostanlen-cella-deepconvolutionalnetworksonthepitchspiralformusicalinstrumentrecognition-2017","author_short":["Lostanlen, V.","Cella, C."],"bibdata":{"bibtype":"misc","type":"misc","title":"Deep convolutional networks on the pitch spiral for musical instrument recognition","url":"http://arxiv.org/abs/1605.06644","doi":"10.48550/arXiv.1605.06644","abstract":"Musical performance combines a wide range of pitches, nuances, and expressive techniques. Audio-based classification of musical instruments thus requires to build signal representations that are invariant to such transformations. This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data. In this context, we benchmark three different weight sharing strategies for deep convolutional networks in the time-frequency domain: temporal kernels; time-frequency kernels; and a linear combination of time-frequency kernels which are one octave apart, akin to a Shepard pitch spiral. We provide an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds with a fixed spectral envelope, which are archetypal of musical notes. The best classification accuracy is obtained by hybridizing all three convolutional layers into a single deep learning architecture.","urldate":"2022-10-12","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Lostanlen"],"firstnames":["Vincent"],"suffixes":[]},{"propositions":[],"lastnames":["Cella"],"firstnames":["Carmine-Emanuele"],"suffixes":[]}],"month":"January","year":"2017","note":"arXiv:1605.06644 [cs]","keywords":"Cited, Computer Science - Sound","bibtex":"@misc{lostanlen_deep_2017,\n\ttitle = {Deep convolutional networks on the pitch spiral for musical instrument recognition},\n\turl = {http://arxiv.org/abs/1605.06644},\n\tdoi = {10.48550/arXiv.1605.06644},\n\tabstract = {Musical performance combines a wide range of pitches, nuances, and expressive techniques. Audio-based classification of musical instruments thus requires to build signal representations that are invariant to such transformations. This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data. In this context, we benchmark three different weight sharing strategies for deep convolutional networks in the time-frequency domain: temporal kernels; time-frequency kernels; and a linear combination of time-frequency kernels which are one octave apart, akin to a Shepard pitch spiral. We provide an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds with a fixed spectral envelope, which are archetypal of musical notes. The best classification accuracy is obtained by hybridizing all three convolutional layers into a single deep learning architecture.},\n\turldate = {2022-10-12},\n\tpublisher = {arXiv},\n\tauthor = {Lostanlen, Vincent and Cella, Carmine-Emanuele},\n\tmonth = jan,\n\tyear = {2017},\n\tnote = {arXiv:1605.06644 [cs]},\n\tkeywords = {Cited, Computer Science - Sound},\n}\n\n","author_short":["Lostanlen, V.","Cella, C."],"key":"lostanlen_deep_2017","id":"lostanlen_deep_2017","bibbaseid":"lostanlen-cella-deepconvolutionalnetworksonthepitchspiralformusicalinstrumentrecognition-2017","role":"author","urls":{"Paper":"http://arxiv.org/abs/1605.06644"},"keyword":["Cited","Computer Science - Sound"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"misc","biburl":"https://bibbase.org/zotero/mxmplx","dataSources":["aXmRAq63YsH7a3ufx"],"keywords":["cited","computer science - sound"],"search_terms":["deep","convolutional","networks","pitch","spiral","musical","instrument","recognition","lostanlen","cella"],"title":"Deep convolutional networks on the pitch spiral for musical instrument recognition","year":2017}