Making Presentation Math Computable: A Context-Sensitive Approach for Translating LaTeX to Computer Algebra Systems. Greiner-Petter, A. Springer Fachmedien Wiesbaden, Wiesbaden, 2023. Doctoral Dissertation at University of Wuppertal, Germany
Making Presentation Math Computable: A Context-Sensitive Approach for Translating LaTeX to Computer Algebra Systems [link]Paper  doi  abstract   bibtex   2 downloads  
This thesis addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. This, so called, semantification process analyzes the structure of the formula and its textual context to conclude semantic information. The research for this semantification process additionally contributes towards related Mathematical Information Retrieval (MathIR) tasks, such as mathematical education assistance, math recommendation and question answering systems, search engines, automatic plagiarism detection, and math type assistance systems. Second, this thesis demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. LaCASt uses the developed semantification approach to transform LaTeX expressions into an intermediate semantic LaTeX format, which is then further translated to CAS based on translation patterns. These patterns were manually crafted by mathematicians to assure accurate and reliable translations. In comparison, this thesis additionally elaborates a non-context aware neural machine translation approach trained on a mathematical library generated by Mathematica. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This evaluation approach is based on the assumption that equations in digital mathematical libraries can be computationally verified by CAS, if a translation between both systems exists. In addition, the thesis provides an in-depth manual evaluation on mathematical articles from the English Wikipedia. The presented context-aware translation framework LaCASt increases the efficiency and reliability of translations to CAS. Via LaCASt, we strengthened the Digital Library of Mathematical Functions (DLMF) by identifying numerous of issues, from missing or wrong semantic annotations to sign errors. Further, via LaCASt, we were able to discover several issues with the commercial CAS Maple and Mathematica. The fundamental approaches to semantically enhance mathematics developed in this thesis additionally contributed towards several related MathIR tasks. For instance, the large-scale analysis of mathematical notations and the studies on math-embeddings motivated new approaches for math plagiarism detection systems, search engines, and allow typing assistance for mathematical inputs. Finally, LaCASt translations will have a direct real-world impact, as they are scheduled to be integrated into upcoming versions of the DLMF and Wikipedia.
@book{BibbaseGreinerPetter23,
	address = {Wiesbaden},
	title = {Making {Presentation} {Math} {Computable}: {A} {Context}-{Sensitive} {Approach} for {Translating} {LaTeX} to {Computer} {Algebra} {Systems}},
	isbn = {978-3-658-40472-7 978-3-658-40473-4},
	shorttitle = {Making {Presentation} {Math} {Computable}},
	url = {https://link.springer.com/10.1007/978-3-658-40473-4},
	abstract = {This thesis addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats.

To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions.

First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. This, so called, semantification process analyzes the structure of the formula and its textual context to conclude semantic information. 
The research for this semantification process additionally contributes towards related Mathematical Information Retrieval (MathIR) tasks, such as mathematical education assistance, math recommendation and question answering systems, search engines, automatic plagiarism detection, and math type assistance systems.

Second, this thesis demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. LaCASt uses the developed semantification approach to transform LaTeX expressions into an intermediate semantic LaTeX format, which is then further translated to CAS based on translation patterns. These patterns were manually crafted by mathematicians to assure accurate and reliable translations. In comparison, this thesis additionally elaborates a non-context aware neural machine translation approach trained on a mathematical library generated by Mathematica. 

Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This evaluation approach is based on the assumption that equations in digital mathematical libraries can be computationally verified by CAS, if a translation between both systems exists. In addition, the thesis provides an in-depth manual evaluation on mathematical articles from the English Wikipedia.

The presented context-aware translation framework LaCASt increases the efficiency and reliability of translations to CAS. Via LaCASt, we strengthened the Digital Library of Mathematical Functions (DLMF) by identifying numerous of issues, from missing or wrong semantic annotations to sign errors. 
Further, via LaCASt, we were able to discover several issues with the commercial CAS Maple and Mathematica.
The fundamental approaches to semantically enhance mathematics developed in this thesis additionally contributed towards several related MathIR tasks. For instance, the large-scale analysis of mathematical notations and the studies on math-embeddings motivated new approaches for math plagiarism detection systems, search engines, and allow typing assistance for mathematical inputs. Finally, LaCASt translations will have a direct real-world impact, as they are scheduled to be integrated into upcoming versions of the DLMF and Wikipedia.},
	language = {en},
	urldate = {2023-02-24},
	publisher = {Springer Fachmedien Wiesbaden},
	author = {Greiner-Petter, Andre},
	year = {2023},
	doi = {10.1007/978-3-658-40473-4},
	note = {Doctoral Dissertation at University of Wuppertal, Germany},
}

Downloads: 2