A First Step Towards Content Protecting Plagiarism Detection

A First Step Towards Content Protecting Plagiarism Detection. Ihle, C., Schubotz, M., Meuschke, N., & Gipp, B. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), Virtual Event, August, 2020. Venue Rating: CORE A*

Paper

A First Step Towards Content Protecting Plagiarism Detection [link]

Code doi abstract bibtex 4 downloads

Plagiarism detection systems are essential tools for safeguarding academic and educational integrity. However, today’s systems require disclosing the full content of the input documents and the document collection to which the input documents are compared. Moreover, the systems are centralized and under the control of individual, typically commercial providers. This situation raises procedural and legal concerns regarding the confidentiality of sensitive data, which can limit or prohibit the use of plagiarism detection services. To eliminate these weaknesses of current systems, we seek to devise a plagiarism detection approach that does not require a centralized provider nor exposing any content as cleartext. This paper presents the initial results of our research. Specifically, we employ Private Set Intersection to devise a content-protecting variant of the citation-based similarity measure Bibliographic Coupling implemented in our plagiarism detection system HyPlag. Our evaluation shows that the content-protecting method achieves the same detection effectiveness as the original method while making common attacks to disclose the protected content practically infeasible. Our future work will extend this successful proof-of-concept by devising plagiarism detection methods that can analyze the entire content of documents without disclosing it as cleartext.

@inproceedings{IhleSMG20,
	address = {Virtual Event},
	title = {A {First} {Step} {Towards} {Content} {Protecting} {Plagiarism} {Detection}},
	url = {paper=https://www.gipp.com/wp-content/papercite-data/pdf/ihle2020.pdf code=https://github.com/ag-gipp/20CppdData},
	doi = {10.1145/3383583.3398620},
	abstract = {Plagiarism detection systems are essential tools for safeguarding academic and educational integrity. However, today’s systems require disclosing the full content of the input documents and the document collection to which the input documents are compared. Moreover, the systems are centralized and under the control of individual, typically commercial providers. This situation raises procedural and legal concerns regarding the confidentiality of sensitive data, which can limit or prohibit the use of plagiarism detection services. To eliminate these weaknesses of current systems, we seek to devise a plagiarism detection approach that does not require a centralized provider nor exposing any content as cleartext. This paper presents the initial results of our research. Specifically, we employ Private Set Intersection to devise a content-protecting variant of the citation-based similarity measure Bibliographic Coupling implemented in our plagiarism detection system HyPlag. Our evaluation shows that the content-protecting method achieves the same detection effectiveness as the original method while making common attacks to disclose the protected content practically infeasible. Our future work will extend this successful proof-of-concept by devising plagiarism detection methods that can analyze the entire content of documents without disclosing it as cleartext.},
	booktitle = {Proceedings of the {ACM}/{IEEE} {Joint} {Conference} on {Digital} {Libraries} ({JCDL})},
	author = {Ihle, Cornelius and Schubotz, Moritz and Meuschke, Norman and Gipp, Bela},
	month = aug,
	year = {2020},
	note = {Venue Rating: CORE A*},
	keywords = {Plagiarism Detection},
}

Downloads: 4

{"_id":"ws8N9dHuh2mMCj8DN","bibbaseid":"ihle-schubotz-meuschke-gipp-afirststeptowardscontentprotectingplagiarismdetection-2020","authorIDs":["3aamy24wTzcQoTPGY","7Crs4B84W7BbduMmq","97o4RCsEFAoSxEQqt","9dzP7gNRTLKvc9aPR","GYqCNzAZv2xc9nhmD","KLLNwF6yrTvRfDhAP","LKQ5pS2Y8Pc7FTkr7","TuCkHmKovwKzF3y8Z","ZDet9tokdva7KFSEH","ZJvJiH6kd887XEnz3","gBWY7RvNrDhhspCGi","nLJ4c698vfAyWRWTr","pCb6WupcebiMmhw8Y","qNrPNpAwKg5fp598G","s7Z2R2uTWDHRHN2bE","tFwG3DWb6fYeXs3sL","yiM4TojQ7StGdi2iD"],"author_short":["Ihle, C.","Schubotz, M.","Meuschke, N.","Gipp, B."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"Virtual Event","title":"A First Step Towards Content Protecting Plagiarism Detection","doi":"10.1145/3383583.3398620","abstract":"Plagiarism detection systems are essential tools for safeguarding academic and educational integrity. However, today’s systems require disclosing the full content of the input documents and the document collection to which the input documents are compared. Moreover, the systems are centralized and under the control of individual, typically commercial providers. This situation raises procedural and legal concerns regarding the confidentiality of sensitive data, which can limit or prohibit the use of plagiarism detection services. To eliminate these weaknesses of current systems, we seek to devise a plagiarism detection approach that does not require a centralized provider nor exposing any content as cleartext. This paper presents the initial results of our research. Specifically, we employ Private Set Intersection to devise a content-protecting variant of the citation-based similarity measure Bibliographic Coupling implemented in our plagiarism detection system HyPlag. Our evaluation shows that the content-protecting method achieves the same detection effectiveness as the original method while making common attacks to disclose the protected content practically infeasible. Our future work will extend this successful proof-of-concept by devising plagiarism detection methods that can analyze the entire content of documents without disclosing it as cleartext.","booktitle":"Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL)","author":[{"propositions":[],"lastnames":["Ihle"],"firstnames":["Cornelius"],"suffixes":[]},{"propositions":[],"lastnames":["Schubotz"],"firstnames":["Moritz"],"suffixes":[]},{"propositions":[],"lastnames":["Meuschke"],"firstnames":["Norman"],"suffixes":[]},{"propositions":[],"lastnames":["Gipp"],"firstnames":["Bela"],"suffixes":[]}],"month":"August","year":"2020","note":"Venue Rating: CORE A*","keywords":"Plagiarism Detection","bibtex":"@inproceedings{IhleSMG20,\n\taddress = {Virtual Event},\n\ttitle = {A {First} {Step} {Towards} {Content} {Protecting} {Plagiarism} {Detection}},\n\turl = {paper=https://www.gipp.com/wp-content/papercite-data/pdf/ihle2020.pdf code=https://github.com/ag-gipp/20CppdData},\n\tdoi = {10.1145/3383583.3398620},\n\tabstract = {Plagiarism detection systems are essential tools for safeguarding academic and educational integrity. However, today’s systems require disclosing the full content of the input documents and the document collection to which the input documents are compared. Moreover, the systems are centralized and under the control of individual, typically commercial providers. This situation raises procedural and legal concerns regarding the confidentiality of sensitive data, which can limit or prohibit the use of plagiarism detection services. To eliminate these weaknesses of current systems, we seek to devise a plagiarism detection approach that does not require a centralized provider nor exposing any content as cleartext. This paper presents the initial results of our research. Specifically, we employ Private Set Intersection to devise a content-protecting variant of the citation-based similarity measure Bibliographic Coupling implemented in our plagiarism detection system HyPlag. Our evaluation shows that the content-protecting method achieves the same detection effectiveness as the original method while making common attacks to disclose the protected content practically infeasible. Our future work will extend this successful proof-of-concept by devising plagiarism detection methods that can analyze the entire content of documents without disclosing it as cleartext.},\n\tbooktitle = {Proceedings of the {ACM}/{IEEE} {Joint} {Conference} on {Digital} {Libraries} ({JCDL})},\n\tauthor = {Ihle, Cornelius and Schubotz, Moritz and Meuschke, Norman and Gipp, Bela},\n\tmonth = aug,\n\tyear = {2020},\n\tnote = {Venue Rating: CORE A*},\n\tkeywords = {Plagiarism Detection},\n}\n\n\n\n","author_short":["Ihle, C.","Schubotz, M.","Meuschke, N.","Gipp, B."],"urlpaper":"https://www.gipp.com/wp-content/papercite-data/pdf/ihle2020.pdf","urlcode":"https://github.com/ag-gipp/20CppdData","key":"IhleSMG20","id":"IhleSMG20","bibbaseid":"ihle-schubotz-meuschke-gipp-afirststeptowardscontentprotectingplagiarismdetection-2020","role":"author","urls":{"Paper":"https://www.gipp.com/wp-content/papercite-data/pdf/ihle2020.pdf","Code":"https://github.com/ag-gipp/20CppdData"},"keyword":["Plagiarism Detection"],"metadata":{"authorlinks":{"meuschke, n":"https://gipplab.uni-goettingen.de/team/dr-norman-meuschke/publications-norman-meuschke/"}},"downloads":4},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero-group/nmeuschke/2532143","creationDate":"2020-04-15T13:02:33.942Z","downloads":4,"keywords":["plagiarism detection"],"search_terms":["first","step","towards","content","protecting","plagiarism","detection","ihle","schubotz","meuschke","gipp"],"title":"A First Step Towards Content Protecting Plagiarism Detection","year":2020,"dataSources":["aEHCfX6B2taJt8dfa","9qTaLWxMN5hLpMP8m","xteq4cdC6ATE2G6Fg","JNgeyAG2vQ8k88oYh","FPjHiAkAja6XvmScK","RTGAqwGfLTSqYQMsS","Y7kZGjoN5Erk3Lo2J","yM7MefT3mRkY9m7i4","jnWJCpbQCoWvxj9kz","F32umBkhFrpeJbp7A","BWzEyLkMvdMGpHpr6","e3AdWzdxYmb85Fn5D","MtqPmSRuq4X8FJqNT","YCwvFifyPbazBYMQD","6oZMeYhGKA2Mp8xhF","gYMS6DBXsNosXKcRC","bQwdfx3o8Q3vnsqfH","SzFkcrpurPzNHEyqX","dHLtmS5G7GmooD755","EvZZTzAZvA3EsuMjm","ajaQNNgWhEmTout8A"]}