Co-registration of articulographic and real-time magnetic resonance imaging data for multimodal analysis of running speech. Kim, J., Lammert, A., Proctor, M. I., & Narayanan, S. S. In Journal of the Meeting of the Acoustical Society of America, Oct, 2012.
doi  abstract   bibtex   
We propose a method for co-registrating speech articulatory/acoustic data from two modalities that provide complementary advantages. Electromagnetic Articulography (EMA) provides high temporal resolution (100 samples/second in WAVE system) and flesh-point tracking, while real-time Magnetic Resonance Imaging, rtMRI, (23 frames/second) offers a complete midsagittal view of the vocal tract, including articulated structures and the articulatory environment. Co-registration was achieved through iterative alignment in the acoustic and articulatory domains. Acoustic signals were aligned temporally using Dynamic Time Warping, while articulatory signals were aligned variously by minimization of mean total error between articulatometry data and estimated corresponding flesh points and by using mutual information derived from articulatory parameters for each sentence. We demonstrate our method on a subset of the TIMIT corpus elicited from a male and a female speaker of American English, and illustrate the benefits of co-registered multi-modal data in the study of liquid and fricative consonant production in rapid speech. [Supported by NIH and NSF grants.]
@inproceedings{Kim2012Co-registrationofarticulographicand,
 abstract = {We propose a method for co-registrating speech articulatory/acoustic data from two modalities that provide complementary advantages. Electromagnetic Articulography (EMA) provides high temporal resolution (100 samples/second in WAVE system) and flesh-point tracking, while real-time Magnetic Resonance Imaging, rtMRI, (23 frames/second) offers a complete midsagittal view of the vocal tract, including articulated structures and the articulatory environment. Co-registration was achieved through iterative alignment in the acoustic and articulatory domains. Acoustic signals were aligned temporally using Dynamic Time Warping, while articulatory signals were aligned variously by minimization of mean total error between articulatometry data and estimated corresponding flesh points and by using mutual information derived from articulatory parameters for each sentence. We demonstrate our method on a subset of the TIMIT corpus elicited from a male and a female speaker of American English, and illustrate the benefits of co-registered multi-modal data in the study of liquid and fricative consonant production in rapid speech. [Supported by NIH and NSF grants.]},
 author = {Kim, Jangwon and Lammert, Adam and Proctor, Michael I. and Narayanan, Shrikanth S.},
 bib2html_rescat = {span},
 booktitle = {Journal of the Meeting of the Acoustical Society of America},
 doi = {10.1121/1.4755722},
 link = {http://sail.usc.edu/publications/files/jangwon_ASA_fall_2012.pdf},
 location = {Kansas City},
 month = {Oct},
 title = {Co-registration of articulographic and real-time magnetic resonance imaging data for multimodal analysis of running speech},
 year = {2012}
}

Downloads: 0