Source and Filter Estimation for Throat-Microphone Speech Enhancement

Source and Filter Estimation for Throat-Microphone Speech Enhancement. Turan, M. A. T. & Erzin, E. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 24(2):265-275, FEB, 2016.
doi abstract bibtex

In this paper, we propose a new statistical enhancement system for throat microphone recordings through source and filter separation. Throat microphones (TM) are skin-attached piezoelectric sensors that can capture speech sound signals in the form of tissue vibrations. Due to their limited bandwidth, TM recorded speech suffers from intelligibility and naturalness. In this paper, we investigate learning phone-dependent Gaussian mixture model (GMM)-based statistical mappings using parallel recordings of acoustic microphone (AM) and TM for enhancement of the spectral envelope and excitation signals of the TM speech. The proposed mappings address the phone-dependent variability of tissue conduction with TM recordings. While the spectral envelope mapping estimates the line spectral frequency (LSF) representation of AM from TM recordings, the excitation mapping is constructed based on the spectral energy difference (SED) of AM and TM excitation signals. The excitation enhancement is modeled as an estimation of the SED features from the TM signal. The proposed enhancement system is evaluated using both objective and subjective tests. Objective evaluations are performed with the log-spectral distortion (LSD), the wideband perceptual evaluation of speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective evaluations are performed with an A/B comparison test. Experimental results indicate that the proposed phone-dependent mappings exhibit enhancements over phone-independent mappings. Furthermore enhancement of the TM excitation through statistical mappings of the SED features introduces significant objective and subjective performance improvements to the enhancement of TM recordings.

@article{ ISI:000367950900001,
Author = {Turan, M. A. Tugtekin and Erzin, Engin},
Title = {{Source and Filter Estimation for Throat-Microphone Speech Enhancement}},
Journal = {{IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING}},
Year = {{2016}},
Volume = {{24}},
Number = {{2}},
Pages = {{265-275}},
Month = {{FEB}},
Abstract = {{In this paper, we propose a new statistical enhancement system for
   throat microphone recordings through source and filter separation.
   Throat microphones (TM) are skin-attached piezoelectric sensors that can
   capture speech sound signals in the form of tissue vibrations. Due to
   their limited bandwidth, TM recorded speech suffers from intelligibility
   and naturalness. In this paper, we investigate learning phone-dependent
   Gaussian mixture model (GMM)-based statistical mappings using parallel
   recordings of acoustic microphone (AM) and TM for enhancement of the
   spectral envelope and excitation signals of the TM speech. The proposed
   mappings address the phone-dependent variability of tissue conduction
   with TM recordings. While the spectral envelope mapping estimates the
   line spectral frequency (LSF) representation of AM from TM recordings,
   the excitation mapping is constructed based on the spectral energy
   difference (SED) of AM and TM excitation signals. The excitation
   enhancement is modeled as an estimation of the SED features from the TM
   signal. The proposed enhancement system is evaluated using both
   objective and subjective tests. Objective evaluations are performed with
   the log-spectral distortion (LSD), the wideband perceptual evaluation of
   speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective
   evaluations are performed with an A/B comparison test. Experimental
   results indicate that the proposed phone-dependent mappings exhibit
   enhancements over phone-independent mappings. Furthermore enhancement of
   the TM excitation through statistical mappings of the SED features
   introduces significant objective and subjective performance improvements
   to the enhancement of TM recordings.}},
DOI = {{10.1109/TASLP.2015.2499040}},
ISSN = {{2329-9290}},
ResearcherID-Numbers = {{Erzin, Engin/H-1716-2011}},
ORCID-Numbers = {{Erzin, Engin/0000-0002-2715-2368}},
Unique-ID = {{ISI:000367950900001}},
}

Downloads: 0

{"_id":"LiMCLMaFCn5DZ2GtT","bibbaseid":"turan-erzin-sourceandfilterestimationforthroatmicrophonespeechenhancement-2016","downloads":0,"creationDate":"2016-12-09T13:41:59.768Z","title":"Source and Filter Estimation for Throat-Microphone Speech Enhancement","author_short":["Turan, M. A. T.","Erzin, E."],"year":2016,"bibtype":"article","biburl":"http://home.ku.edu.tr/~eerzin/pubs/mvgl.bib","bibdata":{"bibtype":"article","type":"article","author":[{"propositions":[],"lastnames":["Turan"],"firstnames":["M.","A.","Tugtekin"],"suffixes":[]},{"propositions":[],"lastnames":["Erzin"],"firstnames":["Engin"],"suffixes":[]}],"title":"Source and Filter Estimation for Throat-Microphone Speech Enhancement","journal":"IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING","year":"2016","volume":"24","number":"2","pages":"265-275","month":"FEB","abstract":"In this paper, we propose a new statistical enhancement system for throat microphone recordings through source and filter separation. Throat microphones (TM) are skin-attached piezoelectric sensors that can capture speech sound signals in the form of tissue vibrations. Due to their limited bandwidth, TM recorded speech suffers from intelligibility and naturalness. In this paper, we investigate learning phone-dependent Gaussian mixture model (GMM)-based statistical mappings using parallel recordings of acoustic microphone (AM) and TM for enhancement of the spectral envelope and excitation signals of the TM speech. The proposed mappings address the phone-dependent variability of tissue conduction with TM recordings. While the spectral envelope mapping estimates the line spectral frequency (LSF) representation of AM from TM recordings, the excitation mapping is constructed based on the spectral energy difference (SED) of AM and TM excitation signals. The excitation enhancement is modeled as an estimation of the SED features from the TM signal. The proposed enhancement system is evaluated using both objective and subjective tests. Objective evaluations are performed with the log-spectral distortion (LSD), the wideband perceptual evaluation of speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective evaluations are performed with an A/B comparison test. Experimental results indicate that the proposed phone-dependent mappings exhibit enhancements over phone-independent mappings. Furthermore enhancement of the TM excitation through statistical mappings of the SED features introduces significant objective and subjective performance improvements to the enhancement of TM recordings.","doi":"10.1109/TASLP.2015.2499040","issn":"2329-9290","researcherid-numbers":"Erzin, Engin/H-1716-2011","orcid-numbers":"Erzin, Engin/0000-0002-2715-2368","unique-id":"ISI:000367950900001","bibtex":"@article{ ISI:000367950900001,\nAuthor = {Turan, M. A. Tugtekin and Erzin, Engin},\nTitle = {{Source and Filter Estimation for Throat-Microphone Speech Enhancement}},\nJournal = {{IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING}},\nYear = {{2016}},\nVolume = {{24}},\nNumber = {{2}},\nPages = {{265-275}},\nMonth = {{FEB}},\nAbstract = {{In this paper, we propose a new statistical enhancement system for\n throat microphone recordings through source and filter separation.\n Throat microphones (TM) are skin-attached piezoelectric sensors that can\n capture speech sound signals in the form of tissue vibrations. Due to\n their limited bandwidth, TM recorded speech suffers from intelligibility\n and naturalness. In this paper, we investigate learning phone-dependent\n Gaussian mixture model (GMM)-based statistical mappings using parallel\n recordings of acoustic microphone (AM) and TM for enhancement of the\n spectral envelope and excitation signals of the TM speech. The proposed\n mappings address the phone-dependent variability of tissue conduction\n with TM recordings. While the spectral envelope mapping estimates the\n line spectral frequency (LSF) representation of AM from TM recordings,\n the excitation mapping is constructed based on the spectral energy\n difference (SED) of AM and TM excitation signals. The excitation\n enhancement is modeled as an estimation of the SED features from the TM\n signal. The proposed enhancement system is evaluated using both\n objective and subjective tests. Objective evaluations are performed with\n the log-spectral distortion (LSD), the wideband perceptual evaluation of\n speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective\n evaluations are performed with an A/B comparison test. Experimental\n results indicate that the proposed phone-dependent mappings exhibit\n enhancements over phone-independent mappings. Furthermore enhancement of\n the TM excitation through statistical mappings of the SED features\n introduces significant objective and subjective performance improvements\n to the enhancement of TM recordings.}},\nDOI = {{10.1109/TASLP.2015.2499040}},\nISSN = {{2329-9290}},\nResearcherID-Numbers = {{Erzin, Engin/H-1716-2011}},\nORCID-Numbers = {{Erzin, Engin/0000-0002-2715-2368}},\nUnique-ID = {{ISI:000367950900001}},\n}\n\n","author_short":["Turan, M. A. T.","Erzin, E."],"key":"ISI:000367950900001","id":"ISI:000367950900001","bibbaseid":"turan-erzin-sourceandfilterestimationforthroatmicrophonespeechenhancement-2016","role":"author","urls":{},"metadata":{"authorlinks":{}},"downloads":0,"html":""},"search_terms":["source","filter","estimation","throat","microphone","speech","enhancement","turan","erzin"],"keywords":[],"authorIDs":[],"dataSources":["qdxgtcm62G2GRfdCu","fCCPetp9C4KtYpnWc","P7SB4qiBxZPhjXYRW","rK8ax5mYeZPx6iNbQ"]}