Room identification using frequency dependence of spectral decay statistics.
Moore, A. H.; Naylor, P. A.; and Brookes, M.
In
ICASSP, Calgery, Canada, April 2018.
link
bibtex
abstract
@inproceedings{Moore2018,
address = {Calgery, Canada},
title = {Room identification using frequency dependence of spectral decay statistics},
abstract = {A method for room identification is proposed based on the reverber- ation properties of multichannel speech recordings. The approach exploits the dependence of spectral decay statistics on the reverber- ation time of a room. The average negative-side variance within 1/3- octave bands is proposed as the identifying feature and shown to be effective in a classification experiment. However, negative-side vari- ance is also dependent on the direct-to-reverberant energy ratio. The resulting sensitivity to different spatial configurations of source and microphones within a room are mitigated using a novel reverberation enhancement algorithm. A classification experiment using speech convolved with measured impulse responses and contaminated with environmental noise demonstrates the effectiveness of the proposed method, achieving 79\% correct identification in the most demanding condition compared to 40\% using unenhanced signals.},
booktitle = {{ICASSP}},
author = {Moore, Alastair H. and Naylor, Patrick A. and Brookes, Mike},
month = apr,
year = {2018},
}
A method for room identification is proposed based on the reverber- ation properties of multichannel speech recordings. The approach exploits the dependence of spectral decay statistics on the reverber- ation time of a room. The average negative-side variance within 1/3- octave bands is proposed as the identifying feature and shown to be effective in a classification experiment. However, negative-side vari- ance is also dependent on the direct-to-reverberant energy ratio. The resulting sensitivity to different spatial configurations of source and microphones within a room are mitigated using a novel reverberation enhancement algorithm. A classification experiment using speech convolved with measured impulse responses and contaminated with environmental noise demonstrates the effectiveness of the proposed method, achieving 79% correct identification in the most demanding condition compared to 40% using unenhanced signals.
Modulation-Domain Multichannel Kalman Filtering for Speech Enhancement.
Xue, W.; Moore, A. H.; Brookes, M.; and Naylor, P. A.
IEEE_ACM_J_ASLP, 26(10): 1833–1847. October 2018.
doi
link
bibtex
abstract
@article{Xue2018a,
title = {Modulation-{Domain} {Multichannel} {Kalman} {Filtering} for {Speech} {Enhancement}},
volume = {26},
issn = {2329-9290},
doi = {10.1109/TASLP.2018.2845665},
abstract = {Compared with single-channel speech enhancement methods, multichannel methods can utilize spatial information to design optimal filters. Although some filters adaptively consider second-order signal statistics, the temporal evolution of the speech spectrum is usually neglected. By using linear prediction (LP) to model the inter-frame temporal evolution of speech, single-channel Kalman filtering (KF) based methods have been developed for speech enhancement. In this paper, we derive a multichannel KF (MKF) that jointly uses both interchannel spatial correlation and interframe temporal correlation for speech enhancement. We perform LP in the modulation domain, and by incorporating the spatial information, derive an optimal MKF gain in the short-time Fourier transform domain. We show that the proposed MKF reduces to the conventional multichannel Wiener filter if the LP information is discarded. Furthermore, we show that, under an appropriate assumption, the MKF is equivalent to a concatenation of the minimum variance distortion response beamformer and a single-channel modulation-domain KF and therefore present an alternative implementation of the MKF. Experiments conducted on a public head-related impulse response database demonstrate the effectiveness of the proposed method.},
number = {10},
journal = {IEEE\_ACM\_J\_ASLP},
author = {Xue, W. and Moore, A. H. and Brookes, M. and Naylor, P. A.},
month = oct,
year = {2018},
pages = {1833--1847},
}
Compared with single-channel speech enhancement methods, multichannel methods can utilize spatial information to design optimal filters. Although some filters adaptively consider second-order signal statistics, the temporal evolution of the speech spectrum is usually neglected. By using linear prediction (LP) to model the inter-frame temporal evolution of speech, single-channel Kalman filtering (KF) based methods have been developed for speech enhancement. In this paper, we derive a multichannel KF (MKF) that jointly uses both interchannel spatial correlation and interframe temporal correlation for speech enhancement. We perform LP in the modulation domain, and by incorporating the spatial information, derive an optimal MKF gain in the short-time Fourier transform domain. We show that the proposed MKF reduces to the conventional multichannel Wiener filter if the LP information is discarded. Furthermore, we show that, under an appropriate assumption, the MKF is equivalent to a concatenation of the minimum variance distortion response beamformer and a single-channel modulation-domain KF and therefore present an alternative implementation of the MKF. Experiments conducted on a public head-related impulse response database demonstrate the effectiveness of the proposed method.
Multichannel Kalman filtering for speech ehnancement.
Xue, W.; Moore, A. H.; Brookes, M.; and Naylor, P. A.
In
ICASSP, pages 41–45, April 2018.
doi
link
bibtex
abstract
@inproceedings{Xue2018,
title = {Multichannel {Kalman} filtering for speech ehnancement},
doi = {10.1109/ICASSP.2018.8461903},
abstract = {The use of spatial information in multichannel speech enhancement methods is well established but information associated with the temporal evolution of speech is less commonly exploited. Speech signals can be modelled using an autoregressive process in the time-frequency modulation domain, and Kalman filtering based speech enhancement algorithms have been developed for single-channel processing. In this paper, a multichannel Kalman filter (MKF) for speech enhancement is derived that jointly considers the multichannel spatial information and the temporal correlations of speech. We model the temporal evolution of speech in the modulation domain and, by incorporating the spatial information, an optimal MKF gain is derived in the short-time Fourier transform domain. We also show that the proposed MKF becomes a conventional multichannel Wiener filter if the temporal information is discarded. Experiments using the signals generated from a public head-related impulse response database demonstrate the effectiveness of the proposed method in comparison to other techniques.},
booktitle = {{ICASSP}},
author = {Xue, W. and Moore, A. H. and Brookes, M. and Naylor, P. A.},
month = apr,
year = {2018},
pages = {41--45},
}
The use of spatial information in multichannel speech enhancement methods is well established but information associated with the temporal evolution of speech is less commonly exploited. Speech signals can be modelled using an autoregressive process in the time-frequency modulation domain, and Kalman filtering based speech enhancement algorithms have been developed for single-channel processing. In this paper, a multichannel Kalman filter (MKF) for speech enhancement is derived that jointly considers the multichannel spatial information and the temporal correlations of speech. We model the temporal evolution of speech in the modulation domain and, by incorporating the spatial information, an optimal MKF gain is derived in the short-time Fourier transform domain. We also show that the proposed MKF becomes a conventional multichannel Wiener filter if the temporal information is discarded. Experiments using the signals generated from a public head-related impulse response database demonstrate the effectiveness of the proposed method in comparison to other techniques.