A Repository of Conversational Datasets. Henderson, M., Budzianowski, P., Casanueva, I., Coope, S., Gerz, D., Kumar, G., Mrkšić, N., Spithourakis, G., Su, P., Vulić, I., & Wen, T.
A Repository of Conversational Datasets [link]Paper  abstract   bibtex   
Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.
@article{hendersonRepositoryConversationalDatasets2019,
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1904.06472},
  primaryClass = {cs},
  title = {A {{Repository}} of {{Conversational Datasets}}},
  url = {http://arxiv.org/abs/1904.06472},
  abstract = {Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.},
  urldate = {2019-04-18},
  date = {2019-04-12},
  keywords = {Computer Science - Computation and Language},
  author = {Henderson, Matthew and Budzianowski, Paweł and Casanueva, Iñigo and Coope, Sam and Gerz, Daniela and Kumar, Girish and Mrkšić, Nikola and Spithourakis, Georgios and Su, Pei-Hao and Vulić, Ivan and Wen, Tsung-Hsien},
  file = {/home/dimitri/Nextcloud/Zotero/storage/ZPI7GB2I/Henderson et al. - 2019 - A Repository of Conversational Datasets.pdf;/home/dimitri/Nextcloud/Zotero/storage/II559GXU/1904.html}
}
Downloads: 0