The Corpus DIMEx100: transcription and evaluation

The Corpus DIMEx100: transcription and evaluation. Pineda, L. A., Castellanos, H., Cuétara, J. O, Galescu, L., Juárez, J., Llisterri, J., Pérez, P., & Villaseñor, L. Language Resources and Evaluation, 44(4):347-370, 2009.

Paper doi abstract bibtex

In this paper the transcription and evaluation of the corpus DIMEx100 for Mexican Spanish is presented. First we describe the corpus and explain the linguistic and computational motivation for its design and collection process; then, the phonetic antecedents and the alphabet adopted for the transcription task are presented; the corpus has been transcribed at three different granularity levels, which are also specified in detail. The corpus statistics for each transcription level are also presented. A set of phonetic rules describing phonetic context observed empirically in spontaneous conversation is also validated with the transcription. The corpus has been used for the construction of acoustic models and a phonetic dictionary for the construction of a speech recognition system. Initial performance results suggest that the data can be used to train good quality acoustic models.

@article{pineda_corpus_2009,
	Author = {Pineda, Luis Alberto and Castellanos, Hayde and Cuétara, Javier O and Galescu, Lucian and Juárez, Janet and Llisterri, Joaquim and Pérez, Patricia and Villaseñor, Luis},
	Date = {2009},
	Date-Modified = {2018-07-21 09:47:32 +0000},
	Doi = {10.1007/s10579-009-9109-9},
	Issn = {1574-020X (Print) 1574-0218 (Online)},
	Journal = {Language Resources and Evaluation},
	Keywords = {geographical variation, language resources, Spanish, speech corpus, speech technology, transcription, América, México, segmental transcription, speech recognition, speech technology},
	Number = {4},
	Pages = {347-370},
	Title = {The Corpus DIMEx100: transcription and evaluation},
	Url = {http://liceu.uab.cat/~joaquim/publicacions/Pineda_et_al_09_DIMEx100.pdf},
	Volume = {44},
	Year = {2009},
	Abstract = {In this paper the transcription and evaluation of the corpus DIMEx100 for Mexican Spanish is presented. First we describe the corpus and explain the linguistic and computational motivation for its design and collection process; then, the phonetic antecedents and the alphabet adopted for the transcription task are presented; the corpus has been transcribed at three different granularity levels, which are also specified in detail. The corpus statistics for each transcription level are also presented. A set of phonetic rules describing phonetic context observed empirically in spontaneous conversation is also validated with the transcription. The corpus has been used for the construction of acoustic models and a phonetic dictionary for the construction of a speech recognition system. Initial performance results suggest that the data can be used to train good quality acoustic models.},
	Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YV8QVS4uLy4uL0JpYmxpb2dyYWZpYS9QYXBlcnMvUGluZWRhL1RoZSBDb3JwdXMgRElNRXgxMDAgdHJhbnNjcmlwdGlvbiBhbmQgZXZhbHVhdGlvbi5wZGbSFwsYGVdOUy5kYXRhTxECQAAAAAACQAACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAy/YfzkgrAAAQhnOiH1RoZSBDb3JwdXMgRElNRXgxMCMxMkQxMkVCNy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABLRLrfWW0hZUERGIENBUk8AAgAEAAAJIAAAAAAAAAAAAAAAAAAAAAZQaW5lZGEAEAAIAADL9gOuAAAAEQAIAADWWzpJAAAAAQAUEIZzohCGZY4ABfxHAAX7mAAAwEYAAgBkTWFjaW50b3NoIEhEOlVzZXJzOgBqb2FxdWltX2xsaXN0ZXJyaToAQmlibGlvZ3JhZmlhOgBQYXBlcnM6AFBpbmVkYToAVGhlIENvcnB1cyBESU1FeDEwIzEyRDEyRUI3LnBkZgAOAGoANABUAGgAZQAgAEMAbwByAHAAdQBzACAARABJAE0ARQB4ADEAMAAwACAAdAByAGEAbgBzAGMAcgBpAHAAdABpAG8AbgAgAGEAbgBkACAAZQB2AGEAbAB1AGEAdABpAG8AbgAuAHAAZABmAA8AGgAMAE0AYQBjAGkAbgB0AG8AcwBoACAASABEABIAZ1VzZXJzL2pvYXF1aW1fbGxpc3RlcnJpL0JpYmxpb2dyYWZpYS9QYXBlcnMvUGluZWRhL1RoZSBDb3JwdXMgRElNRXgxMDAgdHJhbnNjcmlwdGlvbiBhbmQgZXZhbHVhdGlvbi5wZGYAABMAAS8AABUAAgAY//8AAIAG0hscHR5aJGNsYXNzbmFtZVgkY2xhc3Nlc11OU011dGFibGVEYXRhox0fIFZOU0RhdGFYTlNPYmplY3TSGxwiI1xOU0RpY3Rpb25hcnmiIiBfEA9OU0tleWVkQXJjaGl2ZXLRJidUcm9vdIABAAgAEQAaACMALQAyADcAQABGAE0AVQBgAGcAagBsAG4AcQBzAHUAdwCEAI4A5gDrAPMDNwM5Az4DSQNSA2ADZANrA3QDeQOGA4kDmwOeA6MAAAAAAAACAQAAAAAAAAAoAAAAAAAAAAAAAAAAAAADpQ==},
	Bdsk-Url-1 = {http://liceu.uab.cat/~joaquim/publicacions/Pineda_et_al_09_DIMEx100.pdf},
	Bdsk-Url-2 = {http://dx.doi.org/10.1007/s10579-009-9109-9}}

Downloads: 0

{"_id":"skFy34KJZNzmvMKxd","bibbaseid":"pineda-castellanos-cutara-galescu-jurez-llisterri-prez-villaseor-thecorpusdimex100transcriptionandevaluation-2009","downloads":0,"creationDate":"2016-09-18T21:37:54.716Z","title":"The Corpus DIMEx100: transcription and evaluation","author_short":["Pineda, L. A.","Castellanos, H.","Cuétara, J. O","Galescu, L.","Juárez, J.","Llisterri, J.","Pérez, P.","Villaseñor, L."],"year":2009,"bibtype":"article","biburl":"http://liceu.uab.cat/~joaquim/publicacions/Publications_Joaquim_Llisterri.bib","bibdata":{"bibtype":"article","type":"article","author":[{"propositions":[],"lastnames":["Pineda"],"firstnames":["Luis","Alberto"],"suffixes":[]},{"propositions":[],"lastnames":["Castellanos"],"firstnames":["Hayde"],"suffixes":[]},{"propositions":[],"lastnames":["Cuétara"],"firstnames":["Javier","O"],"suffixes":[]},{"propositions":[],"lastnames":["Galescu"],"firstnames":["Lucian"],"suffixes":[]},{"propositions":[],"lastnames":["Juárez"],"firstnames":["Janet"],"suffixes":[]},{"propositions":[],"lastnames":["Llisterri"],"firstnames":["Joaquim"],"suffixes":[]},{"propositions":[],"lastnames":["Pérez"],"firstnames":["Patricia"],"suffixes":[]},{"propositions":[],"lastnames":["Villaseñor"],"firstnames":["Luis"],"suffixes":[]}],"date":"2009","date-modified":"2018-07-21 09:47:32 +0000","doi":"10.1007/s10579-009-9109-9","issn":"1574-020X (Print) 1574-0218 (Online)","journal":"Language Resources and Evaluation","keywords":"geographical variation, language resources, Spanish, speech corpus, speech technology, transcription, América, México, segmental transcription, speech recognition, speech technology","number":"4","pages":"347-370","title":"The Corpus DIMEx100: transcription and evaluation","url":"http://liceu.uab.cat/~joaquim/publicacions/Pineda_et_al_09_DIMEx100.pdf","volume":"44","year":"2009","abstract":"In this paper the transcription and evaluation of the corpus DIMEx100 for Mexican Spanish is presented. First we describe the corpus and explain the linguistic and computational motivation for its design and collection process; then, the phonetic antecedents and the alphabet adopted for the transcription task are presented; the corpus has been transcribed at three different granularity levels, which are also specified in detail. The corpus statistics for each transcription level are also presented. A set of phonetic rules describing phonetic context observed empirically in spontaneous conversation is also validated with the transcription. The corpus has been used for the construction of acoustic models and a phonetic dictionary for the construction of a speech recognition system. Initial performance results suggest that the data can be used to train good quality acoustic models.","bdsk-file-1":"YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YV8QVS4uLy4uL0JpYmxpb2dyYWZpYS9QYXBlcnMvUGluZWRhL1RoZSBDb3JwdXMgRElNRXgxMDAgdHJhbnNjcmlwdGlvbiBhbmQgZXZhbHVhdGlvbi5wZGbSFwsYGVdOUy5kYXRhTxECQAAAAAACQAACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAy/YfzkgrAAAQhnOiH1RoZSBDb3JwdXMgRElNRXgxMCMxMkQxMkVCNy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABLRLrfWW0hZUERGIENBUk8AAgAEAAAJIAAAAAAAAAAAAAAAAAAAAAZQaW5lZGEAEAAIAADL9gOuAAAAEQAIAADWWzpJAAAAAQAUEIZzohCGZY4ABfxHAAX7mAAAwEYAAgBkTWFjaW50b3NoIEhEOlVzZXJzOgBqb2FxdWltX2xsaXN0ZXJyaToAQmlibGlvZ3JhZmlhOgBQYXBlcnM6AFBpbmVkYToAVGhlIENvcnB1cyBESU1FeDEwIzEyRDEyRUI3LnBkZgAOAGoANABUAGgAZQAgAEMAbwByAHAAdQBzACAARABJAE0ARQB4ADEAMAAwACAAdAByAGEAbgBzAGMAcgBpAHAAdABpAG8AbgAgAGEAbgBkACAAZQB2AGEAbAB1AGEAdABpAG8AbgAuAHAAZABmAA8AGgAMAE0AYQBjAGkAbgB0AG8AcwBoACAASABEABIAZ1VzZXJzL2pvYXF1aW1fbGxpc3RlcnJpL0JpYmxpb2dyYWZpYS9QYXBlcnMvUGluZWRhL1RoZSBDb3JwdXMgRElNRXgxMDAgdHJhbnNjcmlwdGlvbiBhbmQgZXZhbHVhdGlvbi5wZGYAABMAAS8AABUAAgAY//8AAIAG0hscHR5aJGNsYXNzbmFtZVgkY2xhc3Nlc11OU011dGFibGVEYXRhox0fIFZOU0RhdGFYTlNPYmplY3TSGxwiI1xOU0RpY3Rpb25hcnmiIiBfEA9OU0tleWVkQXJjaGl2ZXLRJidUcm9vdIABAAgAEQAaACMALQAyADcAQABGAE0AVQBgAGcAagBsAG4AcQBzAHUAdwCEAI4A5gDrAPMDNwM5Az4DSQNSA2ADZANrA3QDeQOGA4kDmwOeA6MAAAAAAAACAQAAAAAAAAAoAAAAAAAAAAAAAAAAAAADpQ==","bdsk-url-1":"http://liceu.uab.cat/~joaquim/publicacions/Pineda_et_al_09_DIMEx100.pdf","bdsk-url-2":"http://dx.doi.org/10.1007/s10579-009-9109-9","bibtex":"@article{pineda_corpus_2009,\n\tAuthor = {Pineda, Luis Alberto and Castellanos, Hayde and Cuétara, Javier O and Galescu, Lucian and Juárez, Janet and Llisterri, Joaquim and Pérez, Patricia and Villaseñor, Luis},\n\tDate = {2009},\n\tDate-Modified = {2018-07-21 09:47:32 +0000},\n\tDoi = {10.1007/s10579-009-9109-9},\n\tIssn = {1574-020X (Print) 1574-0218 (Online)},\n\tJournal = {Language Resources and Evaluation},\n\tKeywords = {geographical variation, language resources, Spanish, speech corpus, speech technology, transcription, América, México, segmental transcription, speech recognition, speech technology},\n\tNumber = {4},\n\tPages = {347-370},\n\tTitle = {The Corpus DIMEx100: transcription and evaluation},\n\tUrl = {http://liceu.uab.cat/~joaquim/publicacions/Pineda_et_al_09_DIMEx100.pdf},\n\tVolume = {44},\n\tYear = {2009},\n\tAbstract = {In this paper the transcription and evaluation of the corpus DIMEx100 for Mexican Spanish is presented. First we describe the corpus and explain the linguistic and computational motivation for its design and collection process; then, the phonetic antecedents and the alphabet adopted for the transcription task are presented; the corpus has been transcribed at three different granularity levels, which are also specified in detail. The corpus statistics for each transcription level are also presented. A set of phonetic rules describing phonetic context observed empirically in spontaneous conversation is also validated with the transcription. The corpus has been used for the construction of acoustic models and a phonetic dictionary for the construction of a speech recognition system. Initial performance results suggest that the data can be used to train good quality acoustic models.},\n\tBdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YV8QVS4uLy4uL0JpYmxpb2dyYWZpYS9QYXBlcnMvUGluZWRhL1RoZSBDb3JwdXMgRElNRXgxMDAgdHJhbnNjcmlwdGlvbiBhbmQgZXZhbHVhdGlvbi5wZGbSFwsYGVdOUy5kYXRhTxECQAAAAAACQAACAAAMTWFjaW50b3NoIEhEAAAAAAAAAAAAAAAAAAAAy/YfzkgrAAAQhnOiH1RoZSBDb3JwdXMgRElNRXgxMCMxMkQxMkVCNy5wZGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABLRLrfWW0hZUERGIENBUk8AAgAEAAAJIAAAAAAAAAAAAAAAAAAAAAZQaW5lZGEAEAAIAADL9gOuAAAAEQAIAADWWzpJAAAAAQAUEIZzohCGZY4ABfxHAAX7mAAAwEYAAgBkTWFjaW50b3NoIEhEOlVzZXJzOgBqb2FxdWltX2xsaXN0ZXJyaToAQmlibGlvZ3JhZmlhOgBQYXBlcnM6AFBpbmVkYToAVGhlIENvcnB1cyBESU1FeDEwIzEyRDEyRUI3LnBkZgAOAGoANABUAGgAZQAgAEMAbwByAHAAdQBzACAARABJAE0ARQB4ADEAMAAwACAAdAByAGEAbgBzAGMAcgBpAHAAdABpAG8AbgAgAGEAbgBkACAAZQB2AGEAbAB1AGEAdABpAG8AbgAuAHAAZABmAA8AGgAMAE0AYQBjAGkAbgB0AG8AcwBoACAASABEABIAZ1VzZXJzL2pvYXF1aW1fbGxpc3RlcnJpL0JpYmxpb2dyYWZpYS9QYXBlcnMvUGluZWRhL1RoZSBDb3JwdXMgRElNRXgxMDAgdHJhbnNjcmlwdGlvbiBhbmQgZXZhbHVhdGlvbi5wZGYAABMAAS8AABUAAgAY//8AAIAG0hscHR5aJGNsYXNzbmFtZVgkY2xhc3Nlc11OU011dGFibGVEYXRhox0fIFZOU0RhdGFYTlNPYmplY3TSGxwiI1xOU0RpY3Rpb25hcnmiIiBfEA9OU0tleWVkQXJjaGl2ZXLRJidUcm9vdIABAAgAEQAaACMALQAyADcAQABGAE0AVQBgAGcAagBsAG4AcQBzAHUAdwCEAI4A5gDrAPMDNwM5Az4DSQNSA2ADZANrA3QDeQOGA4kDmwOeA6MAAAAAAAACAQAAAAAAAAAoAAAAAAAAAAAAAAAAAAADpQ==},\n\tBdsk-Url-1 = {http://liceu.uab.cat/~joaquim/publicacions/Pineda_et_al_09_DIMEx100.pdf},\n\tBdsk-Url-2 = {http://dx.doi.org/10.1007/s10579-009-9109-9}}\n\n","author_short":["Pineda, L. A.","Castellanos, H.","Cuétara, J. O","Galescu, L.","Juárez, J.","Llisterri, J.","Pérez, P.","Villaseñor, L."],"key":"pineda_corpus_2009","id":"pineda_corpus_2009","bibbaseid":"pineda-castellanos-cutara-galescu-jurez-llisterri-prez-villaseor-thecorpusdimex100transcriptionandevaluation-2009","role":"author","urls":{"Paper":"http://liceu.uab.cat/~joaquim/publicacions/Pineda_et_al_09_DIMEx100.pdf"},"keyword":["geographical variation","language resources","Spanish","speech corpus","speech technology","transcription","América","México","segmental transcription","speech recognition","speech technology"],"metadata":{"authorlinks":{}},"html":""},"search_terms":["corpus","dimex100","transcription","evaluation","pineda","castellanos","cuétara","galescu","juárez","llisterri","pérez","villaseñor"],"keywords":["geographical variation","language resources","spanish","speech corpus","speech technology","transcription","américa","méxico","segmental transcription","speech recognition","speech technology"],"authorIDs":["57df093238511ce8130002f7","5dee526feaaee4df010001cd","5df66df0797ba9de010000b6","5dfa9052669fc3df01000042","5e4593a786f11cdf010001b6","MKqqtvd3hMYXemJGu","hwNciSCQxeh8XSzRq","noTH58LsT8vwx6XdF"],"dataSources":["rhH9q6BBooyziBxfv"]}