ParsiNLU: A Suite of Language Understanding Challenges for Persian

ParsiNLU: A Suite of Language Understanding Challenges for Persian. Khashabi, D., Cohan, A., Shakeri, S., Hosseini, P., Pezeshkpour, P., Alikhani, M., Aminnaseri, M., Bitaab, M., Brahman, F., Ghazarian, S., Gheini, M., Kabiri, A., Mahabagdi, R. K., Memarrast, O., Mosallanezhad, A., Noury, E., Raji, S., Rasooli, M. S., Sadeghi, S., Azer, E. S., Samghabadi, N. S., Shafaei, M., Sheybani, S., Tazarv, A., & Yaghoobzadeh, Y. Transactions of the Association for Computational Linguistics, 9:1147–1162, MIT Press, Cambridge, MA, 2021.

Paper doi abstract bibtex

Abstract Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1

@article{khashabi-etal-2021-parsinlu,
    title = "ParsiNLU: A Suite of Language Understanding Challenges for Persian",
    author = "Khashabi, Daniel  and
      Cohan, Arman  and
      Shakeri, Siamak  and
      Hosseini, Pedram  and
      Pezeshkpour, Pouya  and
      Alikhani, Malihe  and
      Aminnaseri, Moin  and
      Bitaab, Marzieh  and
      Brahman, Faeze  and
      Ghazarian, Sarik  and
      Gheini, Mozhdeh  and
      Kabiri, Arman  and
      Mahabagdi, Rabeeh Karimi  and
      Memarrast, Omid  and
      Mosallanezhad, Ahmadreza  and
      Noury, Erfan  and
      Raji, Shahab  and
      Rasooli, Mohammad Sadegh  and
      Sadeghi, Sepideh  and
      Azer, Erfan Sadeqi  and
      Samghabadi, Niloofar Safi  and
      Shafaei, Mahsa  and
      Sheybani, Saber  and
      Tazarv, Ali  and
      Yaghoobzadeh, Yadollah",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "9",
    year = "2021",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/2021.tacl-1.68",
    doi = "10.1162/tacl_a_00419",
    pages = "1147--1162",
    abstract = "Abstract Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks{---}reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1",
}

Downloads: 0