A Repository of Conversational Datasets

A Repository of Conversational Datasets. Henderson, M., Budzianowski, P., Casanueva, I., Coope, S., Gerz, D., Kumar, G., Mrkšić, N., Spithourakis, G., Su, P., Vulić, I., & Wen, T.

Paper abstract bibtex

Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.

@article{hendersonRepositoryConversationalDatasets2019,
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1904.06472},
  primaryClass = {cs},
  title = {A {{Repository}} of {{Conversational Datasets}}},
  url = {http://arxiv.org/abs/1904.06472},
  abstract = {Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.},
  urldate = {2019-04-18},
  date = {2019-04-12},
  keywords = {Computer Science - Computation and Language},
  author = {Henderson, Matthew and Budzianowski, Paweł and Casanueva, Iñigo and Coope, Sam and Gerz, Daniela and Kumar, Girish and Mrkšić, Nikola and Spithourakis, Georgios and Su, Pei-Hao and Vulić, Ivan and Wen, Tsung-Hsien},
  file = {/home/dimitri/Nextcloud/Zotero/storage/ZPI7GB2I/Henderson et al. - 2019 - A Repository of Conversational Datasets.pdf;/home/dimitri/Nextcloud/Zotero/storage/II559GXU/1904.html}
}

Downloads: 0

{"_id":"E7kSgMFLKEdALQcqW","bibbaseid":"henderson-budzianowski-casanueva-coope-gerz-kumar-mrki-spithourakis-etal-arepositoryofconversationaldatasets","authorIDs":[],"author_short":["Henderson, M.","Budzianowski, P.","Casanueva, I.","Coope, S.","Gerz, D.","Kumar, G.","Mrkšić, N.","Spithourakis, G.","Su, P.","Vulić, I.","Wen, T."],"bibdata":{"bibtype":"article","type":"article","archiveprefix":"arXiv","eprinttype":"arxiv","eprint":"1904.06472","primaryclass":"cs","title":"A Repository of Conversational Datasets","url":"http://arxiv.org/abs/1904.06472","abstract":"Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.","urldate":"2019-04-18","date":"2019-04-12","keywords":"Computer Science - Computation and Language","author":[{"propositions":[],"lastnames":["Henderson"],"firstnames":["Matthew"],"suffixes":[]},{"propositions":[],"lastnames":["Budzianowski"],"firstnames":["Paweł"],"suffixes":[]},{"propositions":[],"lastnames":["Casanueva"],"firstnames":["Iñigo"],"suffixes":[]},{"propositions":[],"lastnames":["Coope"],"firstnames":["Sam"],"suffixes":[]},{"propositions":[],"lastnames":["Gerz"],"firstnames":["Daniela"],"suffixes":[]},{"propositions":[],"lastnames":["Kumar"],"firstnames":["Girish"],"suffixes":[]},{"propositions":[],"lastnames":["Mrkšić"],"firstnames":["Nikola"],"suffixes":[]},{"propositions":[],"lastnames":["Spithourakis"],"firstnames":["Georgios"],"suffixes":[]},{"propositions":[],"lastnames":["Su"],"firstnames":["Pei-Hao"],"suffixes":[]},{"propositions":[],"lastnames":["Vulić"],"firstnames":["Ivan"],"suffixes":[]},{"propositions":[],"lastnames":["Wen"],"firstnames":["Tsung-Hsien"],"suffixes":[]}],"file":"/home/dimitri/Nextcloud/Zotero/storage/ZPI7GB2I/Henderson et al. - 2019 - A Repository of Conversational Datasets.pdf;/home/dimitri/Nextcloud/Zotero/storage/II559GXU/1904.html","bibtex":"@article{hendersonRepositoryConversationalDatasets2019,\n archivePrefix = {arXiv},\n eprinttype = {arxiv},\n eprint = {1904.06472},\n primaryClass = {cs},\n title = {A {{Repository}} of {{Conversational Datasets}}},\n url = {http://arxiv.org/abs/1904.06472},\n abstract = {Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.},\n urldate = {2019-04-18},\n date = {2019-04-12},\n keywords = {Computer Science - Computation and Language},\n author = {Henderson, Matthew and Budzianowski, Paweł and Casanueva, Iñigo and Coope, Sam and Gerz, Daniela and Kumar, Girish and Mrkšić, Nikola and Spithourakis, Georgios and Su, Pei-Hao and Vulić, Ivan and Wen, Tsung-Hsien},\n file = {/home/dimitri/Nextcloud/Zotero/storage/ZPI7GB2I/Henderson et al. - 2019 - A Repository of Conversational Datasets.pdf;/home/dimitri/Nextcloud/Zotero/storage/II559GXU/1904.html}\n}\n\n","author_short":["Henderson, M.","Budzianowski, P.","Casanueva, I.","Coope, S.","Gerz, D.","Kumar, G.","Mrkšić, N.","Spithourakis, G.","Su, P.","Vulić, I.","Wen, T."],"key":"hendersonRepositoryConversationalDatasets2019","id":"hendersonRepositoryConversationalDatasets2019","bibbaseid":"henderson-budzianowski-casanueva-coope-gerz-kumar-mrki-spithourakis-etal-arepositoryofconversationaldatasets","role":"author","urls":{"Paper":"http://arxiv.org/abs/1904.06472"},"keyword":["Computer Science - Computation and Language"],"downloads":0},"bibtype":"article","biburl":"https://raw.githubusercontent.com/dlozeve/newblog/master/bib/all.bib","creationDate":"2020-01-08T20:39:39.342Z","downloads":0,"keywords":["computer science - computation and language"],"search_terms":["repository","conversational","datasets","henderson","budzianowski","casanueva","coope","gerz","kumar","mrkšić","spithourakis","su","vulić","wen"],"title":"A Repository of Conversational Datasets","year":null,"dataSources":["3XqdvqRE7zuX4cm8m"]}