Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection

Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection. Wahle, J. P., Ruas, T., Meuschke, N., & Gipp, B. In 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pages 226–229, September, 2021. arXiv:2103.12450 [cs]

Paper doi abstract bibtex 2 downloads

The rise of language models such as BERT allows for high-quality text paraphrasing. This is a problem to academic integrity, as it is difficult to differentiate between original and machine-generated content. We propose a benchmark consisting of paraphrased articles using recent language models relying on the Transformer architecture. Our contribution fosters future research of paraphrase detection systems as it offers a large collection of aligned original and paraphrased documents, a study regarding its structure, classification experiments with state-of-the-art systems, and we make our findings publicly available.

@inproceedings{wahle_are_2021,
	title = {Are {Neural} {Language} {Models} {Good} {Plagiarists}? {A} {Benchmark} for {Neural} {Paraphrase} {Detection}},
	shorttitle = {Are {Neural} {Language} {Models} {Good} {Plagiarists}?},
	url = {https://doi.org/10.1109/JCDL52503.2021.00065},
	doi = {10.1109/JCDL52503.2021.00065},
	abstract = {The rise of language models such as BERT allows for high-quality text paraphrasing. This is a problem to academic integrity, as it is difficult to differentiate between original and machine-generated content. We propose a benchmark consisting of paraphrased articles using recent language models relying on the Transformer architecture. Our contribution fosters future research of paraphrase detection systems as it offers a large collection of aligned original and paraphrased documents, a study regarding its structure, classification experiments with state-of-the-art systems, and we make our findings publicly available.},
	urldate = {2022-11-04},
	booktitle = {2021 {ACM}/{IEEE} {Joint} {Conference} on {Digital} {Libraries} ({JCDL})},
	author = {Wahle, Jan Philip and Ruas, Terry and Meuschke, Norman and Gipp, Bela},
	month = sep,
	year = {2021},
	note = {arXiv:2103.12450 [cs]},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Digital Libraries},
	pages = {226--229},
}

Downloads: 2

{"_id":"HSnPjuNnSqnQLD2YE","bibbaseid":"wahle-ruas-meuschke-gipp-areneurallanguagemodelsgoodplagiaristsabenchmarkforneuralparaphrasedetection-2021","author_short":["Wahle, J. P.","Ruas, T.","Meuschke, N.","Gipp, B."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection","shorttitle":"Are Neural Language Models Good Plagiarists?","url":"https://doi.org/10.1109/JCDL52503.2021.00065","doi":"10.1109/JCDL52503.2021.00065","abstract":"The rise of language models such as BERT allows for high-quality text paraphrasing. This is a problem to academic integrity, as it is difficult to differentiate between original and machine-generated content. We propose a benchmark consisting of paraphrased articles using recent language models relying on the Transformer architecture. Our contribution fosters future research of paraphrase detection systems as it offers a large collection of aligned original and paraphrased documents, a study regarding its structure, classification experiments with state-of-the-art systems, and we make our findings publicly available.","urldate":"2022-11-04","booktitle":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","author":[{"propositions":[],"lastnames":["Wahle"],"firstnames":["Jan","Philip"],"suffixes":[]},{"propositions":[],"lastnames":["Ruas"],"firstnames":["Terry"],"suffixes":[]},{"propositions":[],"lastnames":["Meuschke"],"firstnames":["Norman"],"suffixes":[]},{"propositions":[],"lastnames":["Gipp"],"firstnames":["Bela"],"suffixes":[]}],"month":"September","year":"2021","note":"arXiv:2103.12450 [cs]","keywords":"Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Digital Libraries","pages":"226–229","bibtex":"@inproceedings{wahle_are_2021,\n\ttitle = {Are {Neural} {Language} {Models} {Good} {Plagiarists}? {A} {Benchmark} for {Neural} {Paraphrase} {Detection}},\n\tshorttitle = {Are {Neural} {Language} {Models} {Good} {Plagiarists}?},\n\turl = {https://doi.org/10.1109/JCDL52503.2021.00065},\n\tdoi = {10.1109/JCDL52503.2021.00065},\n\tabstract = {The rise of language models such as BERT allows for high-quality text paraphrasing. This is a problem to academic integrity, as it is difficult to differentiate between original and machine-generated content. We propose a benchmark consisting of paraphrased articles using recent language models relying on the Transformer architecture. Our contribution fosters future research of paraphrase detection systems as it offers a large collection of aligned original and paraphrased documents, a study regarding its structure, classification experiments with state-of-the-art systems, and we make our findings publicly available.},\n\turldate = {2022-11-04},\n\tbooktitle = {2021 {ACM}/{IEEE} {Joint} {Conference} on {Digital} {Libraries} ({JCDL})},\n\tauthor = {Wahle, Jan Philip and Ruas, Terry and Meuschke, Norman and Gipp, Bela},\n\tmonth = sep,\n\tyear = {2021},\n\tnote = {arXiv:2103.12450 [cs]},\n\tkeywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Digital Libraries},\n\tpages = {226--229},\n}\n\n","author_short":["Wahle, J. P.","Ruas, T.","Meuschke, N.","Gipp, B."],"key":"wahle_are_2021","id":"wahle_are_2021","bibbaseid":"wahle-ruas-meuschke-gipp-areneurallanguagemodelsgoodplagiaristsabenchmarkforneuralparaphrasedetection-2021","role":"author","urls":{"Paper":"https://doi.org/10.1109/JCDL52503.2021.00065"},"keyword":["Computer Science - Artificial Intelligence","Computer Science - Computation and Language","Computer Science - Digital Libraries"],"metadata":{"authorlinks":{}},"downloads":2},"bibtype":"inproceedings","biburl":"https://api.zotero.org/groups/2503580/items?key=t2fNBZxKJsjrlTOYPPQ4383E&format=bibtex&limit=100","dataSources":["Zp98Nuv7ftsXLefzT","aEHCfX6B2taJt8dfa","9qTaLWxMN5hLpMP8m","xteq4cdC6ATE2G6Fg","JNgeyAG2vQ8k88oYh","FPjHiAkAja6XvmScK","RTGAqwGfLTSqYQMsS","Y7kZGjoN5Erk3Lo2J","yM7MefT3mRkY9m7i4","jnWJCpbQCoWvxj9kz","F32umBkhFrpeJbp7A","BWzEyLkMvdMGpHpr6","hBAe6Z5DsNbrQtje2","e3AdWzdxYmb85Fn5D","MtqPmSRuq4X8FJqNT","YCwvFifyPbazBYMQD","6oZMeYhGKA2Mp8xhF","gYMS6DBXsNosXKcRC","bQwdfx3o8Q3vnsqfH","SzFkcrpurPzNHEyqX","XJBi8b8xDjDoWPzcZ","kHqqD8pzLteJJWS2X","hG7rv86o2PDG2z44d","aJH3D6QaHCDgg2JGg","dHLtmS5G7GmooD755","EvZZTzAZvA3EsuMjm","ajaQNNgWhEmTout8A","Ntor4bPwHAu6xbP9D"],"keywords":["computer science - artificial intelligence","computer science - computation and language","computer science - digital libraries"],"search_terms":["neural","language","models","good","plagiarists","benchmark","neural","paraphrase","detection","wahle","ruas","meuschke","gipp"],"title":"Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection","year":2021,"downloads":2}