skip to main content
10.1145/3477495.3531680acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
abstract

Pre-Training for Mathematics-Aware Retrieval

Published: 07 July 2022 Publication History

Abstract

Mathematical formulas are an important tool to concisely communicate ideas in science and education, used to clarify descriptions, calculations or derivations. When searching in scientific literature, mathematical notation, which is often written using the LATEX notation, therefore plays a crucial role that should not be neglected. The task of mathematics-aware information retrieval is to retrieve relevant passages given a query or question, which both can include natural language and mathematical formulas. As in many domains that rely on Natural Language Understanding, transformer-based models are now dominating the field of information retrieval [3]. Apart from their size and the transformerencoder architecture, pre-training is considered to be a key factor for the high performance of these models. It has also been shown that domain-adaptive pre-training improves their performance on down-stream tasks even further [2] especially when the vocabulary overlap between pre-training and in-domain data is low. This is also the case for the domain of mathematical documents.

References

[1]
Goran Glavavs and Ivan Vulić. 2021. Is Supervised Syntactic Parsing Beneficial for Language Understanding Tasks? An Empirical Investigation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume . 3090--3104.
[2]
Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . 8342--8360.
[3]
Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained Transformers for Text Ranking: BERT and Beyond .Morgan & Claypool Publishers.
[4]
Shuai Peng, Ke Yuan, Liangcai Gao, and Zhi Tang. 2021. MathBERT: A Pre-Trained Model for Mathematical Formula Understanding. arXiv:2105.00377 (2021).
[5]
Anja Reusch, Maik Thiele, and Wolfgang Lehner. 2021. TU_DBS in the ARQMath Lab 2021, CLEF. In CEUR Workshop Proceedings (Online).
[6]
Anja Reusch, Maik Thiele, and Wolfgang Lehner. 2021, to appear. An ALBERT-based Similarity Measure for Mathematical Answer Retrieval. In Proceedings of the 44rd International ACM SIGIR Conference on Research and Development in Information Retrieval .

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Check for updates

Author Tags

  1. mathematical language processing
  2. transformer-based models

Qualifiers

  • Abstract

Funding Sources

  • German Research Foundation

Conference

SIGIR '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 97
    Total Downloads
  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)3
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media