GACELA: A Generative Adversarial Context Encoder for Long Audio Inpainting of Music. Marafioti, A., Majdak, P., Holighaus, N., & Perraudin, N. IEEE Journal of Selected Topics in Signal Processing, 15(11):120–131, January, 2021.
GACELA: A Generative Adversarial Context Encoder for Long Audio Inpainting of Music [link]Paper  doi  abstract   bibtex   
In this article, we introduce GACELA, a conditional generative adversarial network (cGAN) designed to restore missing audio data with durations ranging between hundreds of milliseconds and a few seconds, i.e., to perform long-gap audio inpainting. While previous work either addressed shorter gaps or relied on exemplars by copying available information from other signal parts, GACELA addresses the inpainting of long gaps in two aspects. First, it considers various time scales of audio information by relying on five parallel discriminators with increasing resolution of receptive fields. Second, it is conditioned not only on the available information surrounding the gap, i.e., the context, but also on the latent variable of the cGAN. This addresses the inherent multi-modality of audio inpainting for such long gaps while providing the user with different inpainting options. GACELA was evaluated in listening tests on music signals of varying complexity and varying gap durations from 375 to 1500 ms. Under laboratory conditions, our subjects were often able to detect the inpainting. However, the severity of the inpainted artifacts was rated between not disturbing and mildly disturbing. GACELA represents a framework capable of integrating future improvements such as processing of more auditory-related features or explicit musical features. Our software and trained models, complemented by instructive examples, are available online.
@article{marafioti_gacela_2021,
	title = {{GACELA}: {A} {Generative} {Adversarial} {Context} {Encoder} for {Long} {Audio} {Inpainting} of {Music}},
	volume = {15},
	issn = {1941-0484},
	url = {https://ieeexplore.ieee.org/abstract/document/9257074},
	doi = {10.1109/JSTSP.2020.3037506},
	abstract = {In this article, we introduce GACELA, a conditional generative adversarial network (cGAN) designed to restore missing audio data with durations ranging between hundreds of milliseconds and a few seconds, i.e., to perform long-gap audio inpainting. While previous work either addressed shorter gaps or relied on exemplars by copying available information from other signal parts, GACELA addresses the inpainting of long gaps in two aspects. First, it considers various time scales of audio information by relying on five parallel discriminators with increasing resolution of receptive fields. Second, it is conditioned not only on the available information surrounding the gap, i.e., the context, but also on the latent variable of the cGAN. This addresses the inherent multi-modality of audio inpainting for such long gaps while providing the user with different inpainting options. GACELA was evaluated in listening tests on music signals of varying complexity and varying gap durations from 375 to 1500 ms. Under laboratory conditions, our subjects were often able to detect the inpainting. However, the severity of the inpainted artifacts was rated between not disturbing and mildly disturbing. GACELA represents a framework capable of integrating future improvements such as processing of more auditory-related features or explicit musical features. Our software and trained models, complemented by instructive examples, are available online.},
	number = {11},
	journal = {IEEE Journal of Selected Topics in Signal Processing},
	author = {Marafioti, Andrés and Majdak, Piotr and Holighaus, Nicki and Perraudin, Nathanaël},
	month = jan,
	year = {2021},
	keywords = {\#nosource},
	pages = {120--131},
}

Downloads: 0