Audio/video supervised independent vector analysis through multimodal pilot dependent components. Nesta, F., Mosayyebpour, S., Koldovský, Z., & Paleček, K. In 2017 25th European Signal Processing Conference (EUSIPCO), pages 1150-1164, Aug, 2017.
Audio/video supervised independent vector analysis through multimodal pilot dependent components [pdf]Paper  doi  abstract   bibtex   
Independent Vector Analysis is a powerful tool for estimating the broadband acoustic transfer function between multiple sources and the microphones in the frequency domain. In this work, we consider an extended IVA model which adopts the concept of pilot dependent signals. Without imposing any constraint on the de-mixing system, pilot signals depending on the target source are injected into the model enforcing the permutation of outputs to be consistent over time. A neural network trained on acoustic data and a lip motion detection are jointly used to produce a multimodal pilot signal dependent on the target source. It is shown through experimental results that this structure allows the enhancement of a predefined target source in very difficult and ambiguous scenarios.
@InProceedings{8081388,
  author = {F. Nesta and S. Mosayyebpour and Z. Koldovský and K. Paleček},
  booktitle = {2017 25th European Signal Processing Conference (EUSIPCO)},
  title = {Audio/video supervised independent vector analysis through multimodal pilot dependent components},
  year = {2017},
  pages = {1150-1164},
  abstract = {Independent Vector Analysis is a powerful tool for estimating the broadband acoustic transfer function between multiple sources and the microphones in the frequency domain. In this work, we consider an extended IVA model which adopts the concept of pilot dependent signals. Without imposing any constraint on the de-mixing system, pilot signals depending on the target source are injected into the model enforcing the permutation of outputs to be consistent over time. A neural network trained on acoustic data and a lip motion detection are jointly used to produce a multimodal pilot signal dependent on the target source. It is shown through experimental results that this structure allows the enhancement of a predefined target source in very difficult and ambiguous scenarios.},
  keywords = {audio signal processing;frequency-domain analysis;image motion analysis;independent component analysis;learning (artificial intelligence);neural nets;object detection;transfer functions;vectors;video signal processing;audio-video supervised independent vector analysis;multimodal pilot signal dependent component;acoustic transfer function;neural network;lip motion detection;pilot dependent signals;extended IVA model;frequency domain;multiple sources;broadband acoustic transfer function;multimodal pilot dependent components;independent vector analysis;acoustic data;Speech;Acoustics;Artificial neural networks;Time-frequency analysis;Microphones;Lips;Training;independent vector analysis;source separation;independent component analysis;speech enhancement;multimodal processing},
  doi = {10.23919/EUSIPCO.2017.8081388},
  issn = {2076-1465},
  month = {Aug},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570341532.pdf},
}
Downloads: 0