skip to main content
10.1145/3341981.3344235acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Tangent-CFT: An Embedding Model for Mathematical Formulas

Published: 26 September 2019 Publication History

Abstract

When searching for mathematical content, accurate measures of formula similarity can help with tasks such as document ranking, query recommendation, and result set clustering. While there have been many attempts at embedding words and graphs, formula embedding is in its early stages. We introduce a new formula embedding model that we use with two hierarchical representations, (1) Symbol Layout Trees (SLTs) for appearance, and (2) Operator Trees (OPTs) for mathematical content. Following the approach of graph embeddings such as DeepWalk, we generate tuples representing paths between pairs of symbols depth-first, embed tuples using the fastText n-gram embedding model, and then represent an SLT or OPT by its average tuple embedding vector. We then combine SLT and OPT embeddings, leading to state-of-the-art results for the NTCIR-12 formula retrieval task. Our fine-grained holistic vector representations allow us to retrieve many more partially similar formulas than methods using structural matching in trees. Combining our embedding model with structural matching in the Approach0 formula search engine produces state-of-the-art results for both fully and partially relevant results on the NTCIR-12 benchmark. Source code for our system is publicly available.

References

[1]
Akiko Aizawa, Michael Kohlhase, Iadh Ounis, and Moritz Schubotz. 2014. NTCIR-11 Math-2 Task Overview. In In Proceedings of the 11th NTCIR Conference .
[2]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics .
[3]
Chris Buckley and Ellen M Voorhees. 2004. Retrieval evaluation with incomplete information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval .
[4]
Kenny Davila and Richard Zanibbi. 2017. Layout and semantics: Combining representations for mathematical formula search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval .
[5]
Dallas Fraser, Andrew Kane, and Frank Wm Tompa. 2018. Choosing Math Features for BM25 Ranking with Tangent-L. In Proceedings of the ACM Symposium on Document Engineering 2018 .
[6]
Liangcai Gao, Zhuoren Jiang, Yue Yin, Ke Yuan, Zuoyu Yan, and Zhi Tang. 2017. Preliminary Exploration of Formula Embedding for Mathematical Information Retrieval: can mathematical formulae be embedded like a natural language?
[7]
Giovanni Yoko Kristianto, Goran Topic, and Akiko Aizawa. 2016. MCAT Math Retrieval System for NTCIR-12 MathIR Task. In NTCIR .
[8]
Kriste Krstovski and David M Blei. 2018. Equation Embeddings.
[9]
P Pavan Kumar, Arun Agarwal, and Chakravarthy Bhagvati. 2012. A structure based approach for mathematical expression retrieval. In International Workshop on Multi-disciplinary Trends in Artificial Intelligence .
[10]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning .
[11]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research .
[12]
Behrooz Mansouri, Douglas W. Oard, and Richard Zanibbi. 2019. Characterizing Searches for Mathematical Concepts. In Joint Conference on Digital Libraries .
[13]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space.
[14]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems .
[15]
Bhaskar Mitra and Nick Craswell. 2015. Query auto-completion for rare prefixes. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management .
[16]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM.
[17]
Petr Sojka and Martin L'ivs ka. 2011. The art of mathematics retrieval. In Proceedings of the 11th ACM Symposium on Document Engineering .
[18]
Abhinav Thanda, Ankit Agarwal, Kushal Singla, Aditya Prakash, and Abhishek Gupta. 2016. A Document Retrieval System for Math Queries. In NTCIR .
[19]
Richard Zanibbi, Akiko Aizawa, Michael Kohlhase, Iadh Ounis, Goran Topic, and Kenny Davila. 2016a. NTCIR-12 MathIR Task Overview. In NTCIR .
[20]
Richard Zanibbi and Dorothea Blostein. 2012. Recognition and retrieval of mathematical expressions. International Journal on Document Analysis and Recognition (IJDAR) .
[21]
Richard Zanibbi, Kenny Davila, Andrew Kane, and Frank Wm Tompa. 2016b. Multi-stage math formula search: Using appearance-based similarity metrics at scale. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval .
[22]
Wei Zhong and Richard Zanibbi. 2019. Structural Similarity Search for Formulas Using Leaf-Root Paths in Operator Subtrees. In European Conference on Information Retrieval .

Cited By

View all
  • (2025)Tagging knowledge concepts for math problems based on multi-label text classificationExpert Systems with Applications10.1016/j.eswa.2024.126232267(126232)Online publication date: Apr-2025
  • (2024)Mathematical Information Retrieval: A ReviewACM Computing Surveys10.1145/369995357:3(1-34)Online publication date: 9-Oct-2024
  • (2024)Advanced Mathematics Exercise Recommendation Based on Automatic Knowledge Extraction and Multilayer Knowledge GraphIEEE Transactions on Learning Technologies10.1109/TLT.2023.333366917(776-793)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Tangent-CFT: An Embedding Model for Mathematical Formulas

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICTIR '19: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval
    September 2019
    273 pages
    ISBN:9781450368810
    DOI:10.1145/3341981
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 September 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. formula embeddings
    2. math formula retrieval
    3. tree embeddings

    Qualifiers

    • Research-article

    Conference

    ICTIR '19
    Sponsor:

    Acceptance Rates

    ICTIR '19 Paper Acceptance Rate 20 of 41 submissions, 49%;
    Overall Acceptance Rate 235 of 527 submissions, 45%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)78
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Tagging knowledge concepts for math problems based on multi-label text classificationExpert Systems with Applications10.1016/j.eswa.2024.126232267(126232)Online publication date: Apr-2025
    • (2024)Mathematical Information Retrieval: A ReviewACM Computing Surveys10.1145/369995357:3(1-34)Online publication date: 9-Oct-2024
    • (2024)Advanced Mathematics Exercise Recommendation Based on Automatic Knowledge Extraction and Multilayer Knowledge GraphIEEE Transactions on Learning Technologies10.1109/TLT.2023.333366917(776-793)Online publication date: 2024
    • (2024)Extraction and Optimization of Mathematical Knowledge Graph from Handwritten Math in Lecture Videos2024 11th International Conference on Signal Processing and Integrated Networks (SPIN)10.1109/SPIN60856.2024.10511829(17-21)Online publication date: 21-Mar-2024
    • (2024)Retrieval and Sorting of Scientific Documents Based on Stacked Embedding and Hybrid Attention Model2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650167(1-8)Online publication date: 30-Jun-2024
    • (2024)A Robust Random Search Approach for Matching Formulas in Math Information Retrieval Systems2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI62512.2024.00133(921-928)Online publication date: 28-Oct-2024
    • (2024)Comparing Power Models for circuits design with Mathematical Language ProcessingProcedia Computer Science10.1016/j.procs.2024.05.097237:C(204-212)Online publication date: 24-Jul-2024
    • (2024)EBERT: A lightweight expression-enhanced large-scale pre-trained language model for mathematics educationKnowledge-Based Systems10.1016/j.knosys.2024.112118300(112118)Online publication date: Sep-2024
    • (2024)Mathematical Formulas-Based Scientific Literature Retrieval with Heterogeneous Network Semantic EnhancementWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_3(26-41)Online publication date: 2-Dec-2024
    • (2024)The Effectiveness of Graph Contrastive Learning on Mathematical Information RetrievalAdvances on Graph-Based Approaches in Information Retrieval10.1007/978-3-031-71382-8_5(60-72)Online publication date: 10-Oct-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media