ABSTRACT
Mathematical formulas are an important tool to concisely communicate ideas in science and education, used to clarify descriptions, calculations or derivations. When searching in scientific literature, mathematical notation, which is often written using the LATEX notation, therefore plays a crucial role that should not be neglected. The task of mathematics-aware information retrieval is to retrieve relevant passages given a query or question, which both can include natural language and mathematical formulas. As in many domains that rely on Natural Language Understanding, transformer-based models are now dominating the field of information retrieval [3]. Apart from their size and the transformerencoder architecture, pre-training is considered to be a key factor for the high performance of these models. It has also been shown that domain-adaptive pre-training improves their performance on down-stream tasks even further [2] especially when the vocabulary overlap between pre-training and in-domain data is low. This is also the case for the domain of mathematical documents.
- Goran Glavavs and Ivan Vulić. 2021. Is Supervised Syntactic Parsing Beneficial for Language Understanding Tasks? An Empirical Investigation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume . 3090--3104.Google Scholar
- Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . 8342--8360.Google ScholarCross Ref
- Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained Transformers for Text Ranking: BERT and Beyond .Morgan & Claypool Publishers.Google Scholar
- Shuai Peng, Ke Yuan, Liangcai Gao, and Zhi Tang. 2021. MathBERT: A Pre-Trained Model for Mathematical Formula Understanding. arXiv:2105.00377 (2021).Google Scholar
- Anja Reusch, Maik Thiele, and Wolfgang Lehner. 2021. TU_DBS in the ARQMath Lab 2021, CLEF. In CEUR Workshop Proceedings (Online).Google Scholar
- Anja Reusch, Maik Thiele, and Wolfgang Lehner. 2021, to appear. An ALBERT-based Similarity Measure for Mathematical Answer Retrieval. In Proceedings of the 44rd International ACM SIGIR Conference on Research and Development in Information Retrieval .Google ScholarDigital Library
Index Terms
- Pre-Training for Mathematics-Aware Retrieval
Recommendations
An ALBERT-based Similarity Measure for Mathematical Answer Retrieval
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalMathematical Language Processing (MLP) deals with the automated processing and analysis of mathematical documents and relies heavily on good representations of mathematical symbols and texts. The aim of this work is to explore the modeling capabilities ...
Transformer-Encoder-Based Mathematical Information Retrieval
Experimental IR Meets Multilinguality, Multimodality, and InteractionAbstractMathematical Information Retrieval (MIR) deals with the task of finding relevant documents that contain text and mathematical formulas. Therefore, retrieval systems should not only be able to process natural language, but also mathematical and ...
Semantification of Identifiers in Mathematics for Better Math Information Retrieval
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalMathematical formulae are essential in science, but face challenges of ambiguity, due to the use of a small number of identifiers to represent an immense number of concepts. Corresponding to word sense disambiguation in Natural Language Processing, we ...
Comments