Foreground-Background Ambient Sound Scene Separation. Olvera, M., Vincent, E., Serizel, R., & Gasso, G. In 2020 28th European Signal Processing Conference (EUSIPCO), pages 281-285, Aug, 2020.
Foreground-Background Ambient Sound Scene Separation [pdf]Paper  doi  abstract   bibtex   
Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background. We consider the task of separating these events from the background, which we call foreground-background ambient sound scene separation. We propose a deep learning-based separation framework with a suitable feature normalization scheme and an optional auxiliary network capturing the background statistics, and we investigate its ability to handle the great variety of sound classes encountered in ambient sound scenes, which have often not been seen in training. To do so, we create single-channel foreground-background mixtures using isolated sounds from the DESED and Audioset datasets, and we conduct extensive experiments with mixtures of seen or unseen sound classes at various signal-to-noise ratios. Our experimental findings demonstrate the generalization ability of the proposed approach.
@InProceedings{9287436,
  author = {M. Olvera and E. Vincent and R. Serizel and G. Gasso},
  booktitle = {2020 28th European Signal Processing Conference (EUSIPCO)},
  title = {Foreground-Background Ambient Sound Scene Separation},
  year = {2020},
  pages = {281-285},
  abstract = {Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background. We consider the task of separating these events from the background, which we call foreground-background ambient sound scene separation. We propose a deep learning-based separation framework with a suitable feature normalization scheme and an optional auxiliary network capturing the background statistics, and we investigate its ability to handle the great variety of sound classes encountered in ambient sound scenes, which have often not been seen in training. To do so, we create single-channel foreground-background mixtures using isolated sounds from the DESED and Audioset datasets, and we conduct extensive experiments with mixtures of seen or unseen sound classes at various signal-to-noise ratios. Our experimental findings demonstrate the generalization ability of the proposed approach.},
  keywords = {Training;Adaptation models;Protocols;Signal processing;Separation processes;Task analysis;Signal to noise ratio;Audio source separation;ambient sound scenes;generalization ability;deep learning},
  doi = {10.23919/Eusipco47968.2020.9287436},
  issn = {2076-1465},
  month = {Aug},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2020/pdfs/0000281.pdf},
}
Downloads: 0