Deep convolutional networks on the pitch spiral for musical instrument recognition. Lostanlen, V. & Cella, C. January, 2017. arXiv:1605.06644 [cs]
Deep convolutional networks on the pitch spiral for musical instrument recognition [link]Paper  doi  abstract   bibtex   
Musical performance combines a wide range of pitches, nuances, and expressive techniques. Audio-based classification of musical instruments thus requires to build signal representations that are invariant to such transformations. This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data. In this context, we benchmark three different weight sharing strategies for deep convolutional networks in the time-frequency domain: temporal kernels; time-frequency kernels; and a linear combination of time-frequency kernels which are one octave apart, akin to a Shepard pitch spiral. We provide an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds with a fixed spectral envelope, which are archetypal of musical notes. The best classification accuracy is obtained by hybridizing all three convolutional layers into a single deep learning architecture.
@misc{lostanlen_deep_2017,
	title = {Deep convolutional networks on the pitch spiral for musical instrument recognition},
	url = {http://arxiv.org/abs/1605.06644},
	doi = {10.48550/arXiv.1605.06644},
	abstract = {Musical performance combines a wide range of pitches, nuances, and expressive techniques. Audio-based classification of musical instruments thus requires to build signal representations that are invariant to such transformations. This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data. In this context, we benchmark three different weight sharing strategies for deep convolutional networks in the time-frequency domain: temporal kernels; time-frequency kernels; and a linear combination of time-frequency kernels which are one octave apart, akin to a Shepard pitch spiral. We provide an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds with a fixed spectral envelope, which are archetypal of musical notes. The best classification accuracy is obtained by hybridizing all three convolutional layers into a single deep learning architecture.},
	urldate = {2022-10-12},
	publisher = {arXiv},
	author = {Lostanlen, Vincent and Cella, Carmine-Emanuele},
	month = jan,
	year = {2017},
	note = {arXiv:1605.06644 [cs]},
	keywords = {Cited, Computer Science - Sound},
}

Downloads: 0