Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process. Montenegro, C., Santana, R., & Lozano, J., A. Engineering Applications of Artificial Intelligence, 100:104189, 2021.
abstract   bibtex   1 download  
An End-Of-Turn Detection Module (EOTD-M) is an essential component of automatic Spoken Dialogue Systems. The capability of correctly detecting whether a user's utterance has ended or not improves the accuracy in interpreting the meaning of the message and decreases the latency in the answer. Usually, in dialogue systems, an EOTD-M is coupled with an Automatic Speech Recognition Module (ASR-M) to transmit complete utterances to the Natural Language Understanding unit. Mistakes in the ASR-M transcription can have a strong effect on the performance of the EOTD-M. The actual extent of this effect depends on the particular combination of ASR-M transcription errors and the sentence featurization techniques implemented as part of the EOTD-M. In this paper we investigate this important relationship for an EOTD-M based on semantic information and particular characteristics of the speakers (speech profiles). We introduce an Automatic Speech Recognition Simulator (ASR-SIM) that models different types of semantic mistakes in the ASR-M transcription as well as different speech profiles. We use the simulator to evaluate the sensitivity to ASR-M mistakes of a Long Short-Term Memory network classifier trained in EOTD with different featurization techniques. Our experiments reveal the different ways in which the performance of the model is influenced by the ASR-M errors. We corroborate that not only is the ASR-SIM useful to estimate the performance of an EOTD-M in customized noisy scenarios, but it can also be used to generate training datasets with the expected error rates of real working conditions, which leads to better performance.
@article{
 title = {Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process},
 type = {article},
 year = {2021},
 keywords = {Automatic speech recognition,End of turn detection,Natural language processing,Neural networks,Spoken dialogue systems},
 pages = {104189},
 volume = {100},
 id = {a23fa6e1-a3a5-31aa-bea3-f509c214dd02},
 created = {2021-11-12T08:30:31.200Z},
 file_attached = {false},
 profile_id = {789246de-927b-32cc-ae4f-1b7e2b31674c},
 group_id = {e3c82d43-35db-3bbb-b28a-0fd521d70498},
 last_modified = {2021-11-12T08:30:31.200Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {false},
 hidden = {false},
 source_type = {article},
 private_publication = {false},
 abstract = {An End-Of-Turn Detection Module (EOTD-M) is an essential component of automatic Spoken Dialogue Systems. The capability of correctly detecting whether a user's utterance has ended or not improves the accuracy in interpreting the meaning of the message and decreases the latency in the answer. Usually, in dialogue systems, an EOTD-M is coupled with an Automatic Speech Recognition Module (ASR-M) to transmit complete utterances to the Natural Language Understanding unit. Mistakes in the ASR-M transcription can have a strong effect on the performance of the EOTD-M. The actual extent of this effect depends on the particular combination of ASR-M transcription errors and the sentence featurization techniques implemented as part of the EOTD-M. In this paper we investigate this important relationship for an EOTD-M based on semantic information and particular characteristics of the speakers (speech profiles). We introduce an Automatic Speech Recognition Simulator (ASR-SIM) that models different types of semantic mistakes in the ASR-M transcription as well as different speech profiles. We use the simulator to evaluate the sensitivity to ASR-M mistakes of a Long Short-Term Memory network classifier trained in EOTD with different featurization techniques. Our experiments reveal the different ways in which the performance of the model is influenced by the ASR-M errors. We corroborate that not only is the ASR-SIM useful to estimate the performance of an EOTD-M in customized noisy scenarios, but it can also be used to generate training datasets with the expected error rates of real working conditions, which leads to better performance.},
 bibtype = {article},
 author = {Montenegro, César and Santana, Roberto and Lozano, Jose A},
 journal = {Engineering Applications of Artificial Intelligence}
}

Downloads: 1