short-paper

Public Access

Contextualized Formula Search Using Math Abstract Meaning Representation

Authors:

Behrooz Mansouri,

Douglas W. Oard,

Richard ZanibbiAuthors Info & Claims

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 4329 - 4333

https://doi.org/10.1145/3511808.3557567

Published: 17 October 2022 Publication History

Abstract

In math formula search, relevance is determined not only by the similarity of formulas in isolation, but also by their surrounding context. We introduce MathAMR, a new unified representation for sentences containing math. MathAMR generalizes Abstract Meaning Representation (AMR) graphs to include math formula operations and arguments. We then use Sentence-BERT to embed linearized MathAMR graphs for use in formula retrieval. In our first experiment, we compare MathAMR against raw text using the same formula representation (Operator Trees), and find that MathAMR produces more effective rankings. We then apply our MathAMR embeddings to reranking runs from the ARQMath-2 formula retrieval task, where in most cases effectiveness measures are improved. The strongest reranked run matches the best P$'$@10 for an original run, and exceeds the original runs in nDCG$'$@10.

References

[1]

Robin Avenoso, Behrooz Mansouri, and Richard Zanibbi. 2021. In XY-PHOC Symbol Location Embeddings for Math Formula Retrieval and Autocompletion.

[2]

Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract Meaning Representation for Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability With Discourse.

[3]

Michele Bevilacqua, Rexhina Blloshmi, and Roberto Navigli. 2021. One SPRING to Rule Them Both: Symmetric AMR Semantic Parsing and Generation Without a Complex Pipeline. In Proceedings of AAAI.

[4]

Claire Bonial, Julia Bonn, Kathryn Conger, Jena D Hwang, and Martha Palmer. 2014. Propbank: Semantics of New Predicate Types. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14).

[5]

Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a Similarity Metric Discriminatively, with Application to Face Verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). IEEE.

Digital Library

[6]

Pankaj Dadure, Partha Pakray, and Sivaji Bandyopadhyay. 2021. BERT-Based Embedding Model for Formula Retrieval. In Working Notes of CLEF.

[7]

Kenny Davila and Richard Zanibbi. 2017. Layout and Semantics: Combining Representations for Mathematical Formula Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[8]

Jeffrey Flanigan, Sam Thomson, Jaime G Carbonell, Chris Dyer, and Noah A Smith. 2014. A Discriminative Graph-Based Parser for the Abstract Meaning Representation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.

[9]

Dallas Fraser, Andrew Kane, and Frank Wm Tompa. 2018. Choosing Math Features for BM25 Ranking with Tangent-L. In Proceedings of the ACM Symposium on Document Engineering 2019.

Digital Library

[10]

Sahil Garg, Aram Galstyan, Ulf Hermjakob, and Daniel Marcu. 2016. Extracting Biomolecular Interactions using Semantic Parsing of Biomedical Text. In Thirtieth AAAI Conference on Artificial Intelligence.

[11]

Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-Hsuan Sung, László Lukács, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil. 2017. Efficient Natural Language Response Suggestion for Smart Reply. arXiv preprint arXiv:1705.00652 (2017).

[12]

Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, Salim Roukos, Alexander Gray, Ramón Fernandez Astudillo, Maria Chang, Cristina Cornelio, Saswati Dana, Achille Fokoue-Nkoutche, et al. 2021. Leveraging Abstract Meaning Representation for Knowledge Base Question Answering. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

[13]

Giovanni Yoko Kristianto, Goran Topic, and Akiko Aizawa. 2016. MCAT Math Retrieval System for NTCIR-12 MathIR Task. In NTCIR.

[14]

Kriste Krstovski and David M Blei. 2018. Equation Embeddings. arXiv preprint arXiv:1803.09123 (2018).

[15]

Irene Langkilde and Kevin Knight. 1998. Generation That Exploits Corpus-Based Statistical Knowledge. In COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics.

[16]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.

[17]

Kexin Liao, Logan Lebanoff, and Fei Liu. 2018. Abstract Meaning Representation for Multi-Document Summarization. In Proceedings of the 27th International Conference on Computational Linguistics.

[18]

Fei Liu, Jeffrey Flanigan, Sam Thomson, Norman Sadeh, and Noah A Smith. 2015. Toward Abstractive Summarization Using Semantic Representations. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[19]

Behrooz Mansouri, Douglas W Oard, and Richard Zanibbi. 2021a. DPRL Systems in the CLEF 2021 ARQMath Lab: Sentence-BERT for Answer Retrieval, Learning-to-Rank for Formula Retrieval. (2021).

[20]

Behrooz Mansouri, Shaurya Rohatgi, Douglas W Oard, Jian Wu, C Lee Giles, and Richard Zanibbi. 2019. Tangent-CFT: An Embedding Model for Mathematical Formulas. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval.

Digital Library

[21]

Behrooz Mansouri, Richard Zanibbi, and Douglas W Oard. 2021b. Learning to Rank for Mathematical Formula Retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.

Digital Library

[22]

Behrooz Mansouri, Richard Zanibbi, Douglas W Oard, and Anurag Agarwal. 2021c. Overview of ARQMath-2 (2021): Second CLEF Lab on Answer Retrieval for Questions on Math. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer.

Digital Library

[23]

Jonathan May and Jay Priyadarshi. 2017. SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017).

[24]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. Advances in Neural Information Processing Systems (2013).

Digital Library

[25]

Yin Ki Ng, Dallas Fraser, Besat Kassaie, and Frank Tompa. 2021. Dowsing for Answers to Math Questions: Ongoing Viability of Traditional MathIR. In Working Notes of CLEF.

[26]

Shuai Peng, Ke Yuan, Liangcai Gao, and Zhi Tang. 2021. MathBERT: A Pre-Trained Model for Mathematical Formula Understanding. arXiv preprint arXiv:2105.00377 (2021).

[27]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).

[28]

Anja Reusch, Maik Thiele, and Wolfgang Lehner. 2021. TU_DBS in the ARQMath Lab 2021. In Working Notes of CLEF.

[29]

Tetsuya Sakai and Noriko Kando. 2008. On Information Retrieval Metrics Designed for Evaluation with Incomplete Relevance Assessments. Information Retrieval (2008).

[30]

Petr Sojka and Martin L'ivs ka. 2011. Indexing and Searching Mathematics in Digital Libraries. In International Conference on Intelligent Computer Mathematics. Springer.

[31]

Chuan Wang, Nianwen Xue, and Sameer Pradhan. 2015. A Transition-Based Algorithm for AMR Parsing. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[32]

Weiwen Xu, Huihui Zhang, Deng Cai, and Wai Lam. 2021. Dynamic Semantic Graph Construction and Reasoning for Explainable Multi-hop Science Question Answering. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

[33]

Richard Zanibbi, Akiko Aizawa, Michael Kohlhase, Iadh Ounis, Goran Topic, and Kenny Davila. 2016. NTCIR-12 MathIR Task Overview. In Proceedings of the 16th NTCIR.

[34]

Richard Zanibbi, Douglas W Oard, Anurag Agarwal, and Behrooz Mansouri. 2020. Overview of ARQMath 2020: CLEF Lab on Answer Retrieval for Questions on Math. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer.

Digital Library

[35]

Zixuan Zhang and Heng Ji. 2021. Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[36]

Wei Zhong, Shaurya Rohatgi, Jian Wu, C Lee Giles, and Richard Zanibbi. 2020. Accelerating Substructure Similarity Search for Formula Retrieval. In European Conference on Information Retrieval. Springer.

[37]

Wei Zhong, Xinyu Zhang, Ji Xin, Jimmy Lin, and Richard Zanibbi. 2021. Approach Zero and Anserini at the CLEF-2021 ARQMath Track: Applying Substructure Search and BM25 on Operator Tree Path Tokens. In Woorking Notes of CLEF.

Cited By

Wein SSchneider N(2024)Assessing the Cross-linguistic Utility of Abstract Meaning RepresentationComputational Linguistics10.1162/coli_a_0050350:2(419-473)Online publication date: 1-Jun-2024
https://doi.org/10.1162/coli_a_00503
Mansouri BMaarefdoust RHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Using Large Language Models for Math Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657907(2693-2697)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657907
Mansouri BJahedibashiz ZYoshioka MKiseleva JAliannejadi M(2023)Clarifying Questions in Math Information RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605123(149-158)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605123
Show More Cited By

Index Terms

Contextualized Formula Search Using Math Abstract Meaning Representation
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Embedding Formulae and Text for Improved Math Retrieval
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Large data collections containing millions of math formulae are available online. Retrieving math expressions from these collections is challenging. The structural complexity of formulae requires specialized processing. When searching for mathematical ...
Multi-Stage Math Formula Search: Using Appearance-Based Similarity Metrics at Scale
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

When using a mathematical formula for search (query-by-expression), the suitability of retrieved formulae often depends more upon symbol identities and layout than deep mathematical semantics. Using a Symbol Layout Tree representation for formula ...
An integral formula for generalized Gegenbauer polynomials and Jacobi polynomials

The generalized Gegenbauer polynomials are orthogonal polynomials with respect to the weight function |x|^2^@m(1-x^2)^@l^-^1^/^2. An integral formula for these polynomials is proved, which serves as a transformation between h-harmonic polynomials ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

October 2022

5274 pages

ISBN:9781450392365

DOI:10.1145/3511808

General Chairs:
Mohammad Al Hasan
Indiana University Purdue University, Indianapolis, USA
,
Li Xiong
Emory University, Atlanta, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Science Foundation
Alfred P. Sloan Foundation

Conference

CIKM '22

Sponsor:

CIKM '22: The 31st ACM International Conference on Information and Knowledge Management

October 17 - 21, 2022

GA, Atlanta, USA

Acceptance Rates

CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
296
Total Downloads

Downloads (Last 12 months)184
Downloads (Last 6 weeks)11

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wein SSchneider N(2024)Assessing the Cross-linguistic Utility of Abstract Meaning RepresentationComputational Linguistics10.1162/coli_a_0050350:2(419-473)Online publication date: 1-Jun-2024
https://doi.org/10.1162/coli_a_00503
Mansouri BMaarefdoust RHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Using Large Language Models for Math Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657907(2693-2697)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657907
Mansouri BJahedibashiz ZYoshioka MKiseleva JAliannejadi M(2023)Clarifying Questions in Math Information RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605123(149-158)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605123
Amador BLangsenkamp MDey AShah AZanibbi RChen HDuh WHuang HKato MMothe JPoblete B(2023)Searching the ACL Anthology with Math Formulas and TextProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591803(3110-3114)Online publication date: 18-Jul-2023
https://doi.org/10.1145/3539618.3591803

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten