HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation. Wei, W., Li, P., Yu, Y., & Li, W. undefined, 2022.
HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation [link]Paper  doi  abstract   bibtex   
Sounds, especially music, contain various harmonic components scattered in the frequency dimension. It is difficult for normal convolutional neural networks to observe these overtones. This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently. The harmonic is helpful for pitch estimation, which is important for many sound processing applications. We propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and other dilated convolutions in pitch estimation. The results show that this model outperforms the DeepF0, yields state-of-the-art performance in three datasets, and simultane-ously reduces more than 90% parameters. We also find that it has stronger noise resistance and fewer octave errors.
@article{wei_harmof0_2022,
	title = {{HarmoF0}: {Logarithmic} {Scale} {Dilated} {Convolution} {For} {Pitch} {Estimation}},
	shorttitle = {{HarmoF0}},
	url = {https://www.semanticscholar.org/paper/HarmoF0%3A-Logarithmic-Scale-Dilated-Convolution-For-Wei-Li/915b1a0554d02fe9a79ae8956f1382395e315336},
	doi = {10.48550/arXiv.2205.01019},
	abstract = {Sounds, especially music, contain various harmonic components scattered in the frequency dimension. It is difficult for normal convolutional neural networks to observe these overtones. This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently. The harmonic is helpful for pitch estimation, which is important for many sound processing applications. We propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and other dilated convolutions in pitch estimation. The results show that this model outperforms the DeepF0, yields state-of-the-art performance in three datasets, and simultane-ously reduces more than 90\% parameters. We also find that it has stronger noise resistance and fewer octave errors.},
	language = {en},
	urldate = {2022-05-09},
	journal = {undefined},
	author = {Wei, Weixing and Li, P. and Yu, Yi and Li, Wei},
	year = {2022},
	keywords = {\#nosource},
}

Downloads: 0