Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process

Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process. Montenegro, C., Santana, R., & Lozano, J., A. Engineering Applications of Artificial Intelligence, 100:104189, 2021.
abstract bibtex 1 download

An End-Of-Turn Detection Module (EOTD-M) is an essential component of automatic Spoken Dialogue Systems. The capability of correctly detecting whether a user's utterance has ended or not improves the accuracy in interpreting the meaning of the message and decreases the latency in the answer. Usually, in dialogue systems, an EOTD-M is coupled with an Automatic Speech Recognition Module (ASR-M) to transmit complete utterances to the Natural Language Understanding unit. Mistakes in the ASR-M transcription can have a strong effect on the performance of the EOTD-M. The actual extent of this effect depends on the particular combination of ASR-M transcription errors and the sentence featurization techniques implemented as part of the EOTD-M. In this paper we investigate this important relationship for an EOTD-M based on semantic information and particular characteristics of the speakers (speech profiles). We introduce an Automatic Speech Recognition Simulator (ASR-SIM) that models different types of semantic mistakes in the ASR-M transcription as well as different speech profiles. We use the simulator to evaluate the sensitivity to ASR-M mistakes of a Long Short-Term Memory network classifier trained in EOTD with different featurization techniques. Our experiments reveal the different ways in which the performance of the model is influenced by the ASR-M errors. We corroborate that not only is the ASR-SIM useful to estimate the performance of an EOTD-M in customized noisy scenarios, but it can also be used to generate training datasets with the expected error rates of real working conditions, which leads to better performance.

@article{
 title = {Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process},
 type = {article},
 year = {2021},
 keywords = {Automatic speech recognition,End of turn detection,Natural language processing,Neural networks,Spoken dialogue systems},
 pages = {104189},
 volume = {100},
 id = {a23fa6e1-a3a5-31aa-bea3-f509c214dd02},
 created = {2021-11-12T08:30:31.200Z},
 file_attached = {false},
 profile_id = {789246de-927b-32cc-ae4f-1b7e2b31674c},
 group_id = {e3c82d43-35db-3bbb-b28a-0fd521d70498},
 last_modified = {2021-11-12T08:30:31.200Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {false},
 hidden = {false},
 source_type = {article},
 private_publication = {false},
 abstract = {An End-Of-Turn Detection Module (EOTD-M) is an essential component of automatic Spoken Dialogue Systems. The capability of correctly detecting whether a user's utterance has ended or not improves the accuracy in interpreting the meaning of the message and decreases the latency in the answer. Usually, in dialogue systems, an EOTD-M is coupled with an Automatic Speech Recognition Module (ASR-M) to transmit complete utterances to the Natural Language Understanding unit. Mistakes in the ASR-M transcription can have a strong effect on the performance of the EOTD-M. The actual extent of this effect depends on the particular combination of ASR-M transcription errors and the sentence featurization techniques implemented as part of the EOTD-M. In this paper we investigate this important relationship for an EOTD-M based on semantic information and particular characteristics of the speakers (speech profiles). We introduce an Automatic Speech Recognition Simulator (ASR-SIM) that models different types of semantic mistakes in the ASR-M transcription as well as different speech profiles. We use the simulator to evaluate the sensitivity to ASR-M mistakes of a Long Short-Term Memory network classifier trained in EOTD with different featurization techniques. Our experiments reveal the different ways in which the performance of the model is influenced by the ASR-M errors. We corroborate that not only is the ASR-SIM useful to estimate the performance of an EOTD-M in customized noisy scenarios, but it can also be used to generate training datasets with the expected error rates of real working conditions, which leads to better performance.},
 bibtype = {article},
 author = {Montenegro, César and Santana, Roberto and Lozano, Jose A},
 journal = {Engineering Applications of Artificial Intelligence}
}

Downloads: 1

{"_id":"wCFN3v8QKz9tDhFdL","bibbaseid":"montenegro-santana-lozano-analysisofthesensitivityoftheendofturndetectiontasktoerrorsgeneratedbytheautomaticspeechrecognitionprocess-2021","authorIDs":[],"author_short":["Montenegro, C.","Santana, R.","Lozano, J., A."],"bibdata":{"title":"Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process","type":"article","year":"2021","keywords":"Automatic speech recognition,End of turn detection,Natural language processing,Neural networks,Spoken dialogue systems","pages":"104189","volume":"100","id":"a23fa6e1-a3a5-31aa-bea3-f509c214dd02","created":"2021-11-12T08:30:31.200Z","file_attached":false,"profile_id":"789246de-927b-32cc-ae4f-1b7e2b31674c","group_id":"e3c82d43-35db-3bbb-b28a-0fd521d70498","last_modified":"2021-11-12T08:30:31.200Z","read":false,"starred":false,"authored":false,"confirmed":false,"hidden":false,"source_type":"article","private_publication":false,"abstract":"An End-Of-Turn Detection Module (EOTD-M) is an essential component of automatic Spoken Dialogue Systems. The capability of correctly detecting whether a user's utterance has ended or not improves the accuracy in interpreting the meaning of the message and decreases the latency in the answer. Usually, in dialogue systems, an EOTD-M is coupled with an Automatic Speech Recognition Module (ASR-M) to transmit complete utterances to the Natural Language Understanding unit. Mistakes in the ASR-M transcription can have a strong effect on the performance of the EOTD-M. The actual extent of this effect depends on the particular combination of ASR-M transcription errors and the sentence featurization techniques implemented as part of the EOTD-M. In this paper we investigate this important relationship for an EOTD-M based on semantic information and particular characteristics of the speakers (speech profiles). We introduce an Automatic Speech Recognition Simulator (ASR-SIM) that models different types of semantic mistakes in the ASR-M transcription as well as different speech profiles. We use the simulator to evaluate the sensitivity to ASR-M mistakes of a Long Short-Term Memory network classifier trained in EOTD with different featurization techniques. Our experiments reveal the different ways in which the performance of the model is influenced by the ASR-M errors. We corroborate that not only is the ASR-SIM useful to estimate the performance of an EOTD-M in customized noisy scenarios, but it can also be used to generate training datasets with the expected error rates of real working conditions, which leads to better performance.","bibtype":"article","author":"Montenegro, César and Santana, Roberto and Lozano, Jose A","journal":"Engineering Applications of Artificial Intelligence","bibtex":"@article{\n title = {Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process},\n type = {article},\n year = {2021},\n keywords = {Automatic speech recognition,End of turn detection,Natural language processing,Neural networks,Spoken dialogue systems},\n pages = {104189},\n volume = {100},\n id = {a23fa6e1-a3a5-31aa-bea3-f509c214dd02},\n created = {2021-11-12T08:30:31.200Z},\n file_attached = {false},\n profile_id = {789246de-927b-32cc-ae4f-1b7e2b31674c},\n group_id = {e3c82d43-35db-3bbb-b28a-0fd521d70498},\n last_modified = {2021-11-12T08:30:31.200Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {false},\n hidden = {false},\n source_type = {article},\n private_publication = {false},\n abstract = {An End-Of-Turn Detection Module (EOTD-M) is an essential component of automatic Spoken Dialogue Systems. The capability of correctly detecting whether a user's utterance has ended or not improves the accuracy in interpreting the meaning of the message and decreases the latency in the answer. Usually, in dialogue systems, an EOTD-M is coupled with an Automatic Speech Recognition Module (ASR-M) to transmit complete utterances to the Natural Language Understanding unit. Mistakes in the ASR-M transcription can have a strong effect on the performance of the EOTD-M. The actual extent of this effect depends on the particular combination of ASR-M transcription errors and the sentence featurization techniques implemented as part of the EOTD-M. In this paper we investigate this important relationship for an EOTD-M based on semantic information and particular characteristics of the speakers (speech profiles). We introduce an Automatic Speech Recognition Simulator (ASR-SIM) that models different types of semantic mistakes in the ASR-M transcription as well as different speech profiles. We use the simulator to evaluate the sensitivity to ASR-M mistakes of a Long Short-Term Memory network classifier trained in EOTD with different featurization techniques. Our experiments reveal the different ways in which the performance of the model is influenced by the ASR-M errors. We corroborate that not only is the ASR-SIM useful to estimate the performance of an EOTD-M in customized noisy scenarios, but it can also be used to generate training datasets with the expected error rates of real working conditions, which leads to better performance.},\n bibtype = {article},\n author = {Montenegro, César and Santana, Roberto and Lozano, Jose A},\n journal = {Engineering Applications of Artificial Intelligence}\n}","author_short":["Montenegro, C.","Santana, R.","Lozano, J., A."],"biburl":"https://bibbase.org/service/mendeley/789246de-927b-32cc-ae4f-1b7e2b31674c","bibbaseid":"montenegro-santana-lozano-analysisofthesensitivityoftheendofturndetectiontasktoerrorsgeneratedbytheautomaticspeechrecognitionprocess-2021","role":"author","urls":{},"keyword":["Automatic speech recognition","End of turn detection","Natural language processing","Neural networks","Spoken dialogue systems"],"metadata":{"authorlinks":{}},"downloads":1},"bibtype":"article","creationDate":"2021-02-17T18:30:46.780Z","downloads":1,"keywords":["automatic speech recognition","end of turn detection","natural language processing","neural networks","spoken dialogue systems"],"search_terms":["analysis","sensitivity","end","turn","detection","task","errors","generated","automatic","speech","recognition","process","montenegro","santana","lozano"],"title":"Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process","year":2021,"biburl":"https://bibbase.org/service/mendeley/789246de-927b-32cc-ae4f-1b7e2b31674c","dataSources":["F7jCKzHzcjJZH8w9i","ya2CyA73rpZseyrZ8","2252seNhipfTmjEBQ"]}