A data-driven caption for L2 listening

A data-driven caption for L2 listening. Mirzaei, M. & Meshgi, K. In Proceedings of 11th International Conference of Experimental Linguistics, 2020. ExLing Society.

Paper doi abstract bibtex

Partial and Synchronized Caption (PSC) is a tool that automatically detects difficult segments for the second language (L2) listeners and displays them in the caption while omitting easy-to-recognize cases to reduce cognitive load. Given that the number of words to be shown in this caption is limited, the main challenge lies in selecting and prioritizing difficult words. Since partialization is a classifying task, we made a dataset of labeled words in TED talks (easy vs. difficult) for a target proficiency-level. A deep classifier is trained on this dataset to automate the detection of difficult words/phrases without explicitly extracting word features. This proposed data-driven PSC outperforms its feature-based versions by adopting a selection pattern that is more similar to the annotations, capturing more complicated cases, and minimizing the false positives.

@inproceedings{mirzaei_data-driven_2020,
	title = {A data-driven caption for {L2} listening},
	isbn = {9786188458512},
	url = {https://exlingsociety.com/wp-content/uploads/proceedings/exling-2020/11_0042_000457.pdf},
	doi = {10.36505/ExLing-2020/11/0042/000457},
	abstract = {Partial and Synchronized Caption (PSC) is a tool that automatically detects difficult segments for the second language (L2) listeners and displays them in the caption while omitting easy-to-recognize cases to reduce cognitive load. Given that the number of words to be shown in this caption is limited, the main challenge lies in selecting and prioritizing difficult words. Since partialization is a classifying task, we made a dataset of labeled words in TED talks (easy vs. difficult) for a target proficiency-level. A deep classifier is trained on this dataset to automate the detection of difficult words/phrases without explicitly extracting word features. This proposed data-driven PSC outperforms its feature-based versions by adopting a selection pattern that is more similar to the annotations, capturing more complicated cases, and minimizing the false positives.},
	urldate = {2025-04-17},
	booktitle = {Proceedings of 11th {International} {Conference} of {Experimental} {Linguistics}},
	publisher = {ExLing Society},
	author = {Mirzaei, Maryam and Meshgi, Kourosh},
	year = {2020},
}

Downloads: 0

{"_id":"JD9JqkZrkWaXihrL3","bibbaseid":"mirzaei-meshgi-adatadrivencaptionforl2listening-2020","author_short":["Mirzaei, M.","Meshgi, K."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"A data-driven caption for L2 listening","isbn":"9786188458512","url":"https://exlingsociety.com/wp-content/uploads/proceedings/exling-2020/11_0042_000457.pdf","doi":"10.36505/ExLing-2020/11/0042/000457","abstract":"Partial and Synchronized Caption (PSC) is a tool that automatically detects difficult segments for the second language (L2) listeners and displays them in the caption while omitting easy-to-recognize cases to reduce cognitive load. Given that the number of words to be shown in this caption is limited, the main challenge lies in selecting and prioritizing difficult words. Since partialization is a classifying task, we made a dataset of labeled words in TED talks (easy vs. difficult) for a target proficiency-level. A deep classifier is trained on this dataset to automate the detection of difficult words/phrases without explicitly extracting word features. This proposed data-driven PSC outperforms its feature-based versions by adopting a selection pattern that is more similar to the annotations, capturing more complicated cases, and minimizing the false positives.","urldate":"2025-04-17","booktitle":"Proceedings of 11th International Conference of Experimental Linguistics","publisher":"ExLing Society","author":[{"propositions":[],"lastnames":["Mirzaei"],"firstnames":["Maryam"],"suffixes":[]},{"propositions":[],"lastnames":["Meshgi"],"firstnames":["Kourosh"],"suffixes":[]}],"year":"2020","bibtex":"@inproceedings{mirzaei_data-driven_2020,\n\ttitle = {A data-driven caption for {L2} listening},\n\tisbn = {9786188458512},\n\turl = {https://exlingsociety.com/wp-content/uploads/proceedings/exling-2020/11_0042_000457.pdf},\n\tdoi = {10.36505/ExLing-2020/11/0042/000457},\n\tabstract = {Partial and Synchronized Caption (PSC) is a tool that automatically detects difficult segments for the second language (L2) listeners and displays them in the caption while omitting easy-to-recognize cases to reduce cognitive load. Given that the number of words to be shown in this caption is limited, the main challenge lies in selecting and prioritizing difficult words. Since partialization is a classifying task, we made a dataset of labeled words in TED talks (easy vs. difficult) for a target proficiency-level. A deep classifier is trained on this dataset to automate the detection of difficult words/phrases without explicitly extracting word features. This proposed data-driven PSC outperforms its feature-based versions by adopting a selection pattern that is more similar to the annotations, capturing more complicated cases, and minimizing the false positives.},\n\turldate = {2025-04-17},\n\tbooktitle = {Proceedings of 11th {International} {Conference} of {Experimental} {Linguistics}},\n\tpublisher = {ExLing Society},\n\tauthor = {Mirzaei, Maryam and Meshgi, Kourosh},\n\tyear = {2020},\n}\n\n\n\n","author_short":["Mirzaei, M.","Meshgi, K."],"key":"mirzaei_data-driven_2020","id":"mirzaei_data-driven_2020","bibbaseid":"mirzaei-meshgi-adatadrivencaptionforl2listening-2020","role":"author","urls":{"Paper":"https://exlingsociety.com/wp-content/uploads/proceedings/exling-2020/11_0042_000457.pdf"},"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero-group/Inter_Linguistic_Sociey/5953948","dataSources":["wqFBtZsLZwQqPkACg"],"keywords":[],"search_terms":["data","driven","caption","listening","mirzaei","meshgi"],"title":"A data-driven caption for L2 listening","year":2020}