In *Proceedings of The Web Conference (WWW)*, pages 1445–1456, Taipei, Taiwan, April, 2020. ACM. Core Rank A*

Paper doi abstract bibtex

Paper doi abstract bibtex

Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access arXiv (2.5B mathematical objects) and the mathematical reviewing service for pure and applied mathematics zbMATH (61M mathematical objects). Our study lays a foundation for future research projects on mathematical information retrieval for large scientific corpora. Further, we demonstrate the relevance of our results to a variety of use-cases. For example, to assist semantic extraction systems, to improve scientific search engines, and to facilitate specialized math recommendation systems. The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e.g., linking $P_\{n\}{\textasciicircum}\{({\}alpha, {\}beta)\}{\}!{\}left(x{\}right)$ with `Jacobi polynomial'); (3) we extend zbMATH's search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems. To expedite future research projects, we have made available our source code and data.

@inproceedings{BibbaseGreinerPetterSMB20, address = {Taipei, Taiwan}, title = {Discovering {Mathematical} {Objects} of {Interest} — {A} {Study} of {Mathematical} {Notations}}, isbn = {978-1-4503-7023-3}, url = {https://arxiv.org/abs/2002.02712}, doi = {10.1145/3366423.3380218}, abstract = {Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access arXiv (2.5B mathematical objects) and the mathematical reviewing service for pure and applied mathematics zbMATH (61M mathematical objects). Our study lays a foundation for future research projects on mathematical information retrieval for large scientific corpora. Further, we demonstrate the relevance of our results to a variety of use-cases. For example, to assist semantic extraction systems, to improve scientific search engines, and to facilitate specialized math recommendation systems. The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e.g., linking \$P\_\{n\}{\textasciicircum}\{({\textbackslash}alpha, {\textbackslash}beta)\}{\textbackslash}!{\textbackslash}left(x{\textbackslash}right)\$ with `Jacobi polynomial'); (3) we extend zbMATH's search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems. To expedite future research projects, we have made available our source code and data.}, language = {en}, urldate = {2021-07-30}, booktitle = {Proceedings of {The} {Web} {Conference} ({WWW})}, publisher = {ACM}, author = {Greiner-Petter, André and Schubotz, Moritz and Müller, Fabian and Breitinger, Corinna and Cohl, Howard and Aizawa, Akiko and Gipp, Bela}, month = apr, year = {2020}, note = {Core Rank A*}, pages = {1445--1456}, }

Downloads: 0