Taxonomy of Mathematical Plagiarism. Satpute, A., Greiner-Petter, A., Gießing, N., Beckenbach, I., Schubotz, M., Teschke, O., Aizawa, A., & Gipp, B. In Goharian, N., Tonellotto, N., He, Y., Lipani, A., McDonald, G., Macdonald, C., & Ounis, I., editors, 46th European Conference on Information Retrieval (ECIR), volume 14611, pages 12–20, Glasgow, UK, March, 2024. Springer Nature Switzerland. Core Rank A
Taxonomy of Mathematical Plagiarism [link]Paper  doi  abstract   bibtex   
Plagiarism is a pressing concern, even more so with the availability of large language models. Existing plagiarism detection systems reliably find copied and moderately reworded text but fail for idea plagiarism, especially in mathematical science, which heavily uses formal mathematical notation. We make two contributions. First, we establish a taxonomy of mathematical content reuse by annotating potentially plagiarised 122 scientific document pairs. Second, we analyze the best-performing approaches to detect plagiarism and mathematical content similarity on the newly established taxonomy. We found that the best-performing methods for plagiarism and math content similarity achieve an overall detection score (PlagDet) of 0.06 and 0.16, respectively. The best-performing methods failed to detect most cases from all seven newly established math similarity types. Outlined contributions will benefit research in plagiarism detection systems, recommender systems, question-answering systems, and search engines. We make our experiment’s code and annotated dataset available to the community: https://github.com/gipplab/Taxonomy-of-Mathematical-Plagiarism.
@inproceedings{BibbaseSatputeGGB24,
	address = {Glasgow, UK},
	title = {Taxonomy of {Mathematical} {Plagiarism}},
	volume = {14611},
	isbn = {978-3-031-56065-1},
	url = {https://link.springer.com/10.1007/978-3-031-56066-8_2},
	doi = {10.1007/978-3-031-56066-8_2},
	abstract = {Plagiarism is a pressing concern, even more so with the availability of large language models. Existing plagiarism detection systems reliably find copied and moderately reworded text but fail for idea plagiarism, especially in mathematical science, which heavily uses formal mathematical notation. We make two contributions. First, we establish a taxonomy of mathematical content reuse by annotating potentially plagiarised 122 scientific document pairs. Second, we analyze the best-performing approaches to detect plagiarism and mathematical content similarity on the newly established taxonomy. We found that the best-performing methods for plagiarism and math content similarity achieve an overall detection score (PlagDet) of 0.06 and 0.16, respectively. The best-performing methods failed to detect most cases from all seven newly established math similarity types. Outlined contributions will benefit research in plagiarism detection systems, recommender systems, question-answering systems, and search engines. We make our experiment’s code and annotated dataset available to the community: https://github.com/gipplab/Taxonomy-of-Mathematical-Plagiarism.},
	language = {en},
	urldate = {2024-05-23},
	booktitle = {46th {European} {Conference} on {Information} {Retrieval} ({ECIR})},
	publisher = {Springer Nature Switzerland},
	author = {Satpute, Ankit and Greiner-Petter, André and Gießing, Noah and Beckenbach, Isabel and Schubotz, Moritz and Teschke, Olaf and Aizawa, Akiko and Gipp, Bela},
	editor = {Goharian, Nazli and Tonellotto, Nicola and He, Yulan and Lipani, Aldo and McDonald, Graham and Macdonald, Craig and Ounis, Iadh},
	month = mar,
	year = {2024},
	note = {Core Rank A},
	pages = {12--20},
}

Downloads: 0