skip to main content
10.1145/3477495.3531680acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
abstract

Pre-Training for Mathematics-Aware Retrieval

Published:07 July 2022Publication History

ABSTRACT

Mathematical formulas are an important tool to concisely communicate ideas in science and education, used to clarify descriptions, calculations or derivations. When searching in scientific literature, mathematical notation, which is often written using the LATEX notation, therefore plays a crucial role that should not be neglected. The task of mathematics-aware information retrieval is to retrieve relevant passages given a query or question, which both can include natural language and mathematical formulas. As in many domains that rely on Natural Language Understanding, transformer-based models are now dominating the field of information retrieval [3]. Apart from their size and the transformerencoder architecture, pre-training is considered to be a key factor for the high performance of these models. It has also been shown that domain-adaptive pre-training improves their performance on down-stream tasks even further [2] especially when the vocabulary overlap between pre-training and in-domain data is low. This is also the case for the domain of mathematical documents.

References

  1. Goran Glavavs and Ivan Vulić. 2021. Is Supervised Syntactic Parsing Beneficial for Language Understanding Tasks? An Empirical Investigation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume . 3090--3104.Google ScholarGoogle Scholar
  2. Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . 8342--8360.Google ScholarGoogle ScholarCross RefCross Ref
  3. Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained Transformers for Text Ranking: BERT and Beyond .Morgan & Claypool Publishers.Google ScholarGoogle Scholar
  4. Shuai Peng, Ke Yuan, Liangcai Gao, and Zhi Tang. 2021. MathBERT: A Pre-Trained Model for Mathematical Formula Understanding. arXiv:2105.00377 (2021).Google ScholarGoogle Scholar
  5. Anja Reusch, Maik Thiele, and Wolfgang Lehner. 2021. TU_DBS in the ARQMath Lab 2021, CLEF. In CEUR Workshop Proceedings (Online).Google ScholarGoogle Scholar
  6. Anja Reusch, Maik Thiele, and Wolfgang Lehner. 2021, to appear. An ALBERT-based Similarity Measure for Mathematical Answer Retrieval. In Proceedings of the 44rd International ACM SIGIR Conference on Research and Development in Information Retrieval .Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pre-Training for Mathematics-Aware Retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
          July 2022
          3569 pages
          ISBN:9781450387323
          DOI:10.1145/3477495

          Copyright © 2022 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 July 2022

          Check for updates

          Qualifiers

          • abstract

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%
        • Article Metrics

          • Downloads (Last 12 months)18
          • Downloads (Last 6 weeks)3

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader