skip to main content
10.1145/3511808.3557567acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper
Public Access

Contextualized Formula Search Using Math Abstract Meaning Representation

Published: 17 October 2022 Publication History

Abstract

In math formula search, relevance is determined not only by the similarity of formulas in isolation, but also by their surrounding context. We introduce MathAMR, a new unified representation for sentences containing math. MathAMR generalizes Abstract Meaning Representation (AMR) graphs to include math formula operations and arguments. We then use Sentence-BERT to embed linearized MathAMR graphs for use in formula retrieval. In our first experiment, we compare MathAMR against raw text using the same formula representation (Operator Trees), and find that MathAMR produces more effective rankings. We then apply our MathAMR embeddings to reranking runs from the ARQMath-2 formula retrieval task, where in most cases effectiveness measures are improved. The strongest reranked run matches the best P$'$@10 for an original run, and exceeds the original runs in nDCG$'$@10.

References

[1]
Robin Avenoso, Behrooz Mansouri, and Richard Zanibbi. 2021. In XY-PHOC Symbol Location Embeddings for Math Formula Retrieval and Autocompletion.
[2]
Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract Meaning Representation for Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability With Discourse.
[3]
Michele Bevilacqua, Rexhina Blloshmi, and Roberto Navigli. 2021. One SPRING to Rule Them Both: Symmetric AMR Semantic Parsing and Generation Without a Complex Pipeline. In Proceedings of AAAI.
[4]
Claire Bonial, Julia Bonn, Kathryn Conger, Jena D Hwang, and Martha Palmer. 2014. Propbank: Semantics of New Predicate Types. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14).
[5]
Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a Similarity Metric Discriminatively, with Application to Face Verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). IEEE.
[6]
Pankaj Dadure, Partha Pakray, and Sivaji Bandyopadhyay. 2021. BERT-Based Embedding Model for Formula Retrieval. In Working Notes of CLEF.
[7]
Kenny Davila and Richard Zanibbi. 2017. Layout and Semantics: Combining Representations for Mathematical Formula Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[8]
Jeffrey Flanigan, Sam Thomson, Jaime G Carbonell, Chris Dyer, and Noah A Smith. 2014. A Discriminative Graph-Based Parser for the Abstract Meaning Representation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.
[9]
Dallas Fraser, Andrew Kane, and Frank Wm Tompa. 2018. Choosing Math Features for BM25 Ranking with Tangent-L. In Proceedings of the ACM Symposium on Document Engineering 2019.
[10]
Sahil Garg, Aram Galstyan, Ulf Hermjakob, and Daniel Marcu. 2016. Extracting Biomolecular Interactions using Semantic Parsing of Biomedical Text. In Thirtieth AAAI Conference on Artificial Intelligence.
[11]
Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-Hsuan Sung, László Lukács, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil. 2017. Efficient Natural Language Response Suggestion for Smart Reply. arXiv preprint arXiv:1705.00652 (2017).
[12]
Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, Salim Roukos, Alexander Gray, Ramón Fernandez Astudillo, Maria Chang, Cristina Cornelio, Saswati Dana, Achille Fokoue-Nkoutche, et al. 2021. Leveraging Abstract Meaning Representation for Knowledge Base Question Answering. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
[13]
Giovanni Yoko Kristianto, Goran Topic, and Akiko Aizawa. 2016. MCAT Math Retrieval System for NTCIR-12 MathIR Task. In NTCIR.
[14]
Kriste Krstovski and David M Blei. 2018. Equation Embeddings. arXiv preprint arXiv:1803.09123 (2018).
[15]
Irene Langkilde and Kevin Knight. 1998. Generation That Exploits Corpus-Based Statistical Knowledge. In COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics.
[16]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
[17]
Kexin Liao, Logan Lebanoff, and Fei Liu. 2018. Abstract Meaning Representation for Multi-Document Summarization. In Proceedings of the 27th International Conference on Computational Linguistics.
[18]
Fei Liu, Jeffrey Flanigan, Sam Thomson, Norman Sadeh, and Noah A Smith. 2015. Toward Abstractive Summarization Using Semantic Representations. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[19]
Behrooz Mansouri, Douglas W Oard, and Richard Zanibbi. 2021a. DPRL Systems in the CLEF 2021 ARQMath Lab: Sentence-BERT for Answer Retrieval, Learning-to-Rank for Formula Retrieval. (2021).
[20]
Behrooz Mansouri, Shaurya Rohatgi, Douglas W Oard, Jian Wu, C Lee Giles, and Richard Zanibbi. 2019. Tangent-CFT: An Embedding Model for Mathematical Formulas. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval.
[21]
Behrooz Mansouri, Richard Zanibbi, and Douglas W Oard. 2021b. Learning to Rank for Mathematical Formula Retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[22]
Behrooz Mansouri, Richard Zanibbi, Douglas W Oard, and Anurag Agarwal. 2021c. Overview of ARQMath-2 (2021): Second CLEF Lab on Answer Retrieval for Questions on Math. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer.
[23]
Jonathan May and Jay Priyadarshi. 2017. SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017).
[24]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. Advances in Neural Information Processing Systems (2013).
[25]
Yin Ki Ng, Dallas Fraser, Besat Kassaie, and Frank Tompa. 2021. Dowsing for Answers to Math Questions: Ongoing Viability of Traditional MathIR. In Working Notes of CLEF.
[26]
Shuai Peng, Ke Yuan, Liangcai Gao, and Zhi Tang. 2021. MathBERT: A Pre-Trained Model for Mathematical Formula Understanding. arXiv preprint arXiv:2105.00377 (2021).
[27]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).
[28]
Anja Reusch, Maik Thiele, and Wolfgang Lehner. 2021. TU_DBS in the ARQMath Lab 2021. In Working Notes of CLEF.
[29]
Tetsuya Sakai and Noriko Kando. 2008. On Information Retrieval Metrics Designed for Evaluation with Incomplete Relevance Assessments. Information Retrieval (2008).
[30]
Petr Sojka and Martin L'ivs ka. 2011. Indexing and Searching Mathematics in Digital Libraries. In International Conference on Intelligent Computer Mathematics. Springer.
[31]
Chuan Wang, Nianwen Xue, and Sameer Pradhan. 2015. A Transition-Based Algorithm for AMR Parsing. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[32]
Weiwen Xu, Huihui Zhang, Deng Cai, and Wai Lam. 2021. Dynamic Semantic Graph Construction and Reasoning for Explainable Multi-hop Science Question Answering. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
[33]
Richard Zanibbi, Akiko Aizawa, Michael Kohlhase, Iadh Ounis, Goran Topic, and Kenny Davila. 2016. NTCIR-12 MathIR Task Overview. In Proceedings of the 16th NTCIR.
[34]
Richard Zanibbi, Douglas W Oard, Anurag Agarwal, and Behrooz Mansouri. 2020. Overview of ARQMath 2020: CLEF Lab on Answer Retrieval for Questions on Math. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer.
[35]
Zixuan Zhang and Heng Ji. 2021. Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[36]
Wei Zhong, Shaurya Rohatgi, Jian Wu, C Lee Giles, and Richard Zanibbi. 2020. Accelerating Substructure Similarity Search for Formula Retrieval. In European Conference on Information Retrieval. Springer.
[37]
Wei Zhong, Xinyu Zhang, Ji Xin, Jimmy Lin, and Richard Zanibbi. 2021. Approach Zero and Anserini at the CLEF-2021 ARQMath Track: Applying Substructure Search and BM25 on Operator Tree Path Tokens. In Woorking Notes of CLEF.

Cited By

View all
  • (2024)Assessing the Cross-linguistic Utility of Abstract Meaning RepresentationComputational Linguistics10.1162/coli_a_0050350:2(419-473)Online publication date: 1-Jun-2024
  • (2024)Using Large Language Models for Math Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657907(2693-2697)Online publication date: 10-Jul-2024
  • (2023)Clarifying Questions in Math Information RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605123(149-158)Online publication date: 9-Aug-2023
  • Show More Cited By

Index Terms

  1. Contextualized Formula Search Using Math Abstract Meaning Representation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
    October 2022
    5274 pages
    ISBN:9781450392365
    DOI:10.1145/3511808
    • General Chairs:
    • Mohammad Al Hasan,
    • Li Xiong
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. abstract meaning representation
    2. formula search
    3. math ir

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    CIKM '22
    Sponsor:

    Acceptance Rates

    CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)184
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Assessing the Cross-linguistic Utility of Abstract Meaning RepresentationComputational Linguistics10.1162/coli_a_0050350:2(419-473)Online publication date: 1-Jun-2024
    • (2024)Using Large Language Models for Math Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657907(2693-2697)Online publication date: 10-Jul-2024
    • (2023)Clarifying Questions in Math Information RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605123(149-158)Online publication date: 9-Aug-2023
    • (2023)Searching the ACL Anthology with Math Formulas and TextProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591803(3110-3114)Online publication date: 18-Jul-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media