Assessing the prosody of non-native speakers of English: Measures and feature sets. Coutinho, E., H"onig, F, Zhang, Y., Hantke, S., Batliner, A., N"oth, E, Schuller, B, Florian, H, Zhang, Y., Hantke, S., Batliner, A., & Elmar, N In Calzolari, N, Choukri, K, Declerck, T, Goggi, S, Grobelnik, M, Maegaard, B, Mariani, J, Mazo, H, Moreno, A, Odijk, J, & Piperidis, S, editors, Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC), volume 645378, pages 0–4, jan, 2015. Paris, France, European Language Resources Association (ELRA).
abstract   bibtex   
In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance. Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra-segmental audio features; prosodic features seem to be the most important ones.
@inproceedings{coutinho2016assessingsets,
abstract = {In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance. Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra-segmental audio features; prosodic features seem to be the most important ones.},
author = {Coutinho, Eduardo and H{\"{o}}nig, F and Zhang, Yue and Hantke, Simone and Batliner, Anton and N{\"{o}}th, E and Schuller, B and Florian, H and Zhang, Yue and Hantke, Simone and Batliner, Anton and Elmar, N},
booktitle = {Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC)},
editor = {Calzolari, N and Choukri, K and Declerck, T and Goggi, S and Grobelnik, M and Maegaard, B and Mariani, J and Mazo, H and Moreno, A and Odijk, J and Piperidis, S},
file = {:C\:/Users/eadward/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Coutinho et al. - 2016 - Assessing the prosody of non-native speakers of English Measures and feature sets.pdf:pdf},
isbn = {978-2-9517408-9-1},
keywords = {article,conference},
mendeley-tags = {article,conference},
month = {jan},
number = {645378},
organization = {Paris, France},
pages = {0--4},
publisher = {European Language Resources Association (ELRA)},
title = {{Assessing the prosody of non-native speakers of English: Measures and feature sets}},
volume = {645378},
year = {2015}
}

Downloads: 0