Blind spatial sound source clustering and activity detection using uncalibrated microphone array. Nakamura, K. & Mizumoto, T. In 2017 25th European Signal Processing Conference (EUSIPCO), pages 2438-2442, Aug, 2017.
Blind spatial sound source clustering and activity detection using uncalibrated microphone array [pdf]Paper  doi  abstract   bibtex   
This paper presents a method for estimating the number, as well as the activity periods of spatially distributed sound sources using an uncalibrated microphone array. This methodology is applied for the purposes of speaker diarization. In general, speaker diarization has difficulty with: 1) estimating the number of sound sources (speakers), and 2) activity detection of multiple sound sources including overlap of utterances. Several microphone array based techniques have already tackled these challenges. However, existing methods mainly assume that the steering vectors for the microphone array are calibrated in advance to identify sound sources, which is difficult to satisfy when ad-hoc or flexible microphone arrays are used. Thus our approach estimates the number of sound sources blindly in two steps. First, Time Delay of Arrival (TDOA) of the observed signal is clustered. Second, the sound source activity is detected by clustering the long-term spatial spectrum using the TDOA based steering vector for each cluster. The validity of the algorithm is confirmed by both synthesized signals and a real-world flexible microphone array application.
@InProceedings{8081648,
  author = {K. Nakamura and T. Mizumoto},
  booktitle = {2017 25th European Signal Processing Conference (EUSIPCO)},
  title = {Blind spatial sound source clustering and activity detection using uncalibrated microphone array},
  year = {2017},
  pages = {2438-2442},
  abstract = {This paper presents a method for estimating the number, as well as the activity periods of spatially distributed sound sources using an uncalibrated microphone array. This methodology is applied for the purposes of speaker diarization. In general, speaker diarization has difficulty with: 1) estimating the number of sound sources (speakers), and 2) activity detection of multiple sound sources including overlap of utterances. Several microphone array based techniques have already tackled these challenges. However, existing methods mainly assume that the steering vectors for the microphone array are calibrated in advance to identify sound sources, which is difficult to satisfy when ad-hoc or flexible microphone arrays are used. Thus our approach estimates the number of sound sources blindly in two steps. First, Time Delay of Arrival (TDOA) of the observed signal is clustered. Second, the sound source activity is detected by clustering the long-term spatial spectrum using the TDOA based steering vector for each cluster. The validity of the algorithm is confirmed by both synthesized signals and a real-world flexible microphone array application.},
  keywords = {array signal processing;blind source separation;microphone arrays;pattern clustering;speaker recognition;activity detection;uncalibrated microphone array;spatially distributed sound sources;speaker diarization;microphone array based techniques;flexible microphone arrays;sound source activity;TDOA based steering vector;real-world flexible microphone array application;blind spatial sound source clustering;Microphone arrays;Estimation;Robots;Histograms;Reverberation;Robustness},
  doi = {10.23919/EUSIPCO.2017.8081648},
  issn = {2076-1465},
  month = {Aug},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2017/papers/1570345746.pdf},
}
Downloads: 0