Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context. Schubotz, M., Greiner-Petter, A., Scharpf, P., Meuschke, N., Cohl, H. S., & Gipp, B. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (JCDL), pages 233–242, Fort Worth, Texas, USA, May, 2018. ACM. Core Rank A*

Paper doi abstract bibtex 2 downloads

Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.

@inproceedings{BibbaseSchubotzGSM18,
	address = {Fort Worth, Texas, USA},
	title = {Improving the {Representation} and {Conversion} of {Mathematical} {Formulae} by {Considering} their {Textual} {Context}},
	isbn = {978-1-4503-5178-2},
	url = {https://arxiv.org/abs/1804.04956},
	doi = {10/ggv8jk},
	abstract = {Mathematical formulae represent complex semantic information in a concise form.
Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. 
Our main contributions are:
(1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics;
(2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions;
(3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions.
Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.},
	language = {en},
	urldate = {2021-09-06},
	booktitle = {Proceedings of the 18th {ACM}/{IEEE} on {Joint} {Conference} on {Digital} {Libraries} ({JCDL})},
	publisher = {ACM},
	author = {Schubotz, Moritz and Greiner-Petter, Andre and Scharpf, Philipp and Meuschke, Norman and Cohl, Howard S. and Gipp, Bela},
	month = may,
	year = {2018},
	note = {Core Rank A*},
	pages = {233--242},
}

Downloads: 2

{"_id":"hvTCNeXcPxchtYHvq","bibbaseid":"schubotz-greinerpetter-scharpf-meuschke-cohl-gipp-improvingtherepresentationandconversionofmathematicalformulaebyconsideringtheirtextualcontext-2018","authorIDs":["3aamy24wTzcQoTPGY","7Crs4B84W7BbduMmq","97o4RCsEFAoSxEQqt","9dzP7gNRTLKvc9aPR","GYqCNzAZv2xc9nhmD","KLLNwF6yrTvRfDhAP","LKQ5pS2Y8Pc7FTkr7","TuCkHmKovwKzF3y8Z","ZDet9tokdva7KFSEH","ZJvJiH6kd887XEnz3","gBWY7RvNrDhhspCGi","nLJ4c698vfAyWRWTr","pCb6WupcebiMmhw8Y","qNrPNpAwKg5fp598G","s7Z2R2uTWDHRHN2bE","tFwG3DWb6fYeXs3sL","yiM4TojQ7StGdi2iD"],"author_short":["Schubotz, M.","Greiner-Petter, A.","Scharpf, P.","Meuschke, N.","Cohl, H. S.","Gipp, B."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"Fort Worth, Texas, USA","title":"Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context","isbn":"978-1-4503-5178-2","url":"https://arxiv.org/abs/1804.04956","doi":"10/ggv8jk","abstract":"Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.","language":"en","urldate":"2021-09-06","booktitle":"Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (JCDL)","publisher":"ACM","author":[{"propositions":[],"lastnames":["Schubotz"],"firstnames":["Moritz"],"suffixes":[]},{"propositions":[],"lastnames":["Greiner-Petter"],"firstnames":["Andre"],"suffixes":[]},{"propositions":[],"lastnames":["Scharpf"],"firstnames":["Philipp"],"suffixes":[]},{"propositions":[],"lastnames":["Meuschke"],"firstnames":["Norman"],"suffixes":[]},{"propositions":[],"lastnames":["Cohl"],"firstnames":["Howard","S."],"suffixes":[]},{"propositions":[],"lastnames":["Gipp"],"firstnames":["Bela"],"suffixes":[]}],"month":"May","year":"2018","note":"Core Rank A*","pages":"233–242","bibtex":"@inproceedings{BibbaseSchubotzGSM18,\n\taddress = {Fort Worth, Texas, USA},\n\ttitle = {Improving the {Representation} and {Conversion} of {Mathematical} {Formulae} by {Considering} their {Textual} {Context}},\n\tisbn = {978-1-4503-5178-2},\n\turl = {https://arxiv.org/abs/1804.04956},\n\tdoi = {10/ggv8jk},\n\tabstract = {Mathematical formulae represent complex semantic information in a concise form.\nEspecially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. \nOur main contributions are:\n(1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics;\n(2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions;\n(3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions.\nOur benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.},\n\tlanguage = {en},\n\turldate = {2021-09-06},\n\tbooktitle = {Proceedings of the 18th {ACM}/{IEEE} on {Joint} {Conference} on {Digital} {Libraries} ({JCDL})},\n\tpublisher = {ACM},\n\tauthor = {Schubotz, Moritz and Greiner-Petter, Andre and Scharpf, Philipp and Meuschke, Norman and Cohl, Howard S. and Gipp, Bela},\n\tmonth = may,\n\tyear = {2018},\n\tnote = {Core Rank A*},\n\tpages = {233--242},\n}\n\n","author_short":["Schubotz, M.","Greiner-Petter, A.","Scharpf, P.","Meuschke, N.","Cohl, H. S.","Gipp, B."],"key":"BibbaseSchubotzGSM18","id":"BibbaseSchubotzGSM18","bibbaseid":"schubotz-greinerpetter-scharpf-meuschke-cohl-gipp-improvingtherepresentationandconversionofmathematicalformulaebyconsideringtheirtextualcontext-2018","role":"author","urls":{"Paper":"https://arxiv.org/abs/1804.04956"},"metadata":{"authorlinks":{"meuschke, n":"https://gipplab.uni-goettingen.de/team/dr-norman-meuschke/publications-norman-meuschke/"}},"downloads":2},"bibtype":"inproceedings","biburl":"https://api.zotero.org/users/7689706/collections/IBJGRWZX/items?key=R0b523dc3oYLxTGap1H4YXgd&format=bibtex&limit=100","creationDate":"2020-04-15T13:02:33.813Z","downloads":2,"keywords":[],"search_terms":["improving","representation","conversion","mathematical","formulae","considering","textual","context","schubotz","greiner-petter","scharpf","meuschke","cohl","gipp"],"title":"Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context","year":2018,"dataSources":["aEHCfX6B2taJt8dfa","9qTaLWxMN5hLpMP8m","xteq4cdC6ATE2G6Fg","JNgeyAG2vQ8k88oYh","FPjHiAkAja6XvmScK","QGwcHf7xnb5mCCQi7","x2wNFgXC2PE23H45p","3wTLgXcXueP5mYbfu","cZ8X4Ke5so9b7csrB","RTGAqwGfLTSqYQMsS","Y7kZGjoN5Erk3Lo2J","yM7MefT3mRkY9m7i4","jnWJCpbQCoWvxj9kz","F32umBkhFrpeJbp7A","BWzEyLkMvdMGpHpr6","hBAe6Z5DsNbrQtje2","e3AdWzdxYmb85Fn5D","MtqPmSRuq4X8FJqNT","YCwvFifyPbazBYMQD","6oZMeYhGKA2Mp8xhF","gYMS6DBXsNosXKcRC","wZtCXbB8M6GYSQHMx","bQwdfx3o8Q3vnsqfH","SzFkcrpurPzNHEyqX","dHLtmS5G7GmooD755","EvZZTzAZvA3EsuMjm","ajaQNNgWhEmTout8A"]}