skip to main content
10.1145/3342558.3345399acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Prediction of Mathematical Expression Declarations based on Spatial, Semantic, and Syntactic Analysis

Published: 23 September 2019 Publication History

Abstract

Mathematical expressions (ME) and words are carefully bonded together in most science, technology, engineering, and mathematics (STEM) documents. They respectively give quantitative and qualitative descriptions of a system model under discussion. This paper proposes a general model for finding the co-reference relations between words and MEs, based on which we developed a novel algorithm for predicting the natural language declarations of MEs--the ME-Dec. The prediction algorithm is applied in a three-level framework, where the first level is a customized tagger to identify the syntactic roles of MEs and the part-of-speech (POS) tags of words in the ME-word mixed sentences. The second level screens the ME-Dec candidates based on the hypothesis that most ME-Dec are noun phrases (NP). A shallow chunker is trained from the fuzzy process mining algorithm, which uses the labeled POS tag series in the NTCIR-10 dataset as input to mine for the frequent syntactic patterns of NP. In the third level, using distance, word stem, and POS tag respectively as the spatial, semantic, and syntactic features, the bonding model between MEs and ME-Dec candidates is trained on the NTCIR-10 training set. The final prediction results are made upon the majority votes of an ensemble of Naïve Bayesian classifiers based on the three features. Evaluation of the model on the NTCIR-10 test set, the proposed algorithm achieved 75% and 71% average F1 score in soft matching and strict matching, respectively, which outperforms the state-of-the-art solutions by a margin of 5-18%.1

References

[1]
Magdalena Wolska and Mihai Grigore. 2010. Symbol Declarations in Mathematical Writing. In Towards a Digital Mathematics Library. Paris, France, July 7-8th, 2010, pages 119--127.
[2]
Minh-Nghiem Quoc, Keisuke Yokoi, Yuichiroh Matsubayashi, and Akiko Aizawa. 2010. Mining Coreference Relations between Formulas and Text using Wikipedia. In NLPIX 2010. 69--74.
[3]
Giovanni Yoko Kristianto, Minh-Nghiem Quoc, Yuichiroh Matsubayashi, and Akiko Aizawa. 2012. Extracting Definitions of Mathematical Expressions in Scientific Papers. In Proc. of the 26th Annual Conference of JSAI, 2012.
[4]
Giovanni Yoko Kristianto and Akiko Aizawa. 2014. Extracting Textual Descriptions of Mathematical Expressions in Scientific Papers. D-Lib Magazine 20(11), 9.
[5]
Ulf Schöneberg and Wolfram Sperber. 2014. POS Tagging and Its Applications for Mathematics. In CICM 2014. 213--223.
[6]
Robert Pagel and Moritz Schubotz. 2014. Mathematical Language Processing Project. In CEUR Workshop 2014.
[7]
Moritz Schubotz, Alexey Grigorev, Marcus Leich, Howard S. Cohl, Norman Meuschke, Bela Gipp, Abdou S. Youssef, and Volker Markl. 2016. Semantification of Identifiers in Mathematics for Better Math Information Retrieval. In SIGIR 2016. 135--144.
[8]
Moritz Schubotz, Leonard Krämer, Norman Meuschke, Felix Hamborg, and Bela Gipp. 2017. Evaluating and Improving the Extraction of Mathematical Identifier Definitions. In CLEF 2017. 82--94.
[9]
Giovanni Yoko Kristianto, G. Topic and Akiko Aizawa. 2017. Utilizing Dependency Relationships between Math Expressions in Math IR. Information Retrieval Journal 20, 132--167.
[10]
Adwait Ratnaparkhi. 1996. A Maximum Entropy Model for Part-Of-Speech Tagging. In Proc. EMNLP 1996.
[11]
Thorsten Brants. 2000. TnT: A Statistical Part-of-Speech Tagger. In Proc. ANLP 2000, 224--231.
[12]
Michael Collins. 2002. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In Proc. EMNLP 2002. 1--8.
[13]
Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich Part-of-Speech Tagging with A Cyclic Dependency Network. In Proc. NAACL 2003, 173--180.
[14]
Xing Wang, Jason Lin, Ryan Vrecenar, and Jyh-Charn Liu. 2017. Syntactic Role Identification of Mathematical Expressions. In ICDIM 2017, 179--184.
[15]
Magdalena Wolska and Ivana Kruijff-Korbayová. 2004. Analysis of Mixed Natural and Symbolic Input in Mathematical Dialogs. In Proc. of ACL'04, 25.
[16]
Mohan Ganesalingam. 2010. The Language of Mathematics. Ph.D. Dissertation, University of Cambridge.
[17]
Katrin Fundel, Robert Küffner, and Ralf Zimmer. 2007. RelEx--Relation Extraction Using Dependency Parse Trees. Bioinformatics 23(3), 365--371.
[18]
Ivan Sag, Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger. 2002. Multiword Expressions: A Pain in the Neck for NLP. In Proc. of CICLing 2002, 1--15.
[19]
Murat Bayraktar, Bilge Say and Varol Akman. 1998. An Analysis of English Punctuation: The Special Case of Comma. International Journal of Corpus Linguistics 3(1), 33--57.
[20]
Preslav Nakov and Marti Hearst. 2005. Using the Web as An Implicit Training Set: Application to Structural Ambiguity Resolution. In HLT/EMNLP 2005, 835--842.
[21]
Miriam Goldberg. 1999. An Unsupervised Model for Statistically Determining Coordinate Phrase Attachment. In Proc. of ACL, 610--614.
[22]
Philip Resnik. 1999. Semantic Similarity in A Taxonomy: An Information-based Measure and Its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11(1), 95--130.
[23]
Andrew Viterbi. 1967. Error Bounds for Convolutional Codes and An Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory 13(2), 260--269.
[24]
Christian W. Günther and Wil MP Van Der Aalst. 2007. Fuzzy Mining--Adaptive Process Simplification Based on Multi-Perspective Metrics. In ICBPM 2007, 328--343.
[25]
Christian W. Günther and Anne Rozinat. 2012. Disco: Discover Your Processes. In BPM (Demos) 940, 40--44.
[26]
Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. In TeachNLP'02, 69--72.
[27]
Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In ACL, 55--60.
[28]
Giovanni Yoko Kristianto, Minh-Quoc Nghiem, Nobuo Inui, Goran Topić, and Akiko Aizawa. 2012. Annotating Mathematical Expression Definitions for Automatic Detection. In MIR 2012 Workshop.
[29]
Akiko Aizawa, Michael Kohlhase, and Iadh Ounis. 2013. NTCIR-10 Math Pilot Task Overview. In NTCIR.
[30]
Elsevier Open Access Corpus. https://github.com/elsevierlabs/OA-STM-Corpus.
[31]
ACL Web. https://aclweb.org/aclwiki/POS_Tagging_(State_of_the_art).
[32]
Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. 260.
[33]
Peter Willett. 2006. The Porter Stemming Algorithm: Then and Now. Program 40(3), 219--223.
[34]
Ann Taylor, Mitchell Marcus, and Beatrice Santorini. 2003. The Penn Treebank: An Overview. In Treebanks, Springer, Dordrecht, 5--22.

Cited By

View all
  • (2025)VARAT: Variable Annotation Tool for Documents on Manufacturing ProcessesJournal of Chemical Engineering of Japan10.1080/00219592.2025.245446158:1Online publication date: 5-Feb-2025
  • (2024)Data Augmentation Method Utilizing Template Sentences for Variable Definition ExtractionNatural Language Processing and Information Systems10.1007/978-3-031-70239-6_11(151-165)Online publication date: 25-Jun-2024
  • (2023)Simple algorithm for judging equivalence of differential-algebraic equation systemsScientific Reports10.1038/s41598-023-38254-y13:1Online publication date: 17-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '19: Proceedings of the ACM Symposium on Document Engineering 2019
September 2019
254 pages
ISBN:9781450368872
DOI:10.1145/3342558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2019

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Student Paper

Author Tags

  1. Co-reference
  2. Declaration extraction
  3. Mathematical expression

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DocEng '19
Sponsor:
DocEng '19: ACM Symposium on Document Engineering 2019
September 23 - 26, 2019
Berlin, Germany

Acceptance Rates

DocEng '19 Paper Acceptance Rate 30 of 77 submissions, 39%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)VARAT: Variable Annotation Tool for Documents on Manufacturing ProcessesJournal of Chemical Engineering of Japan10.1080/00219592.2025.245446158:1Online publication date: 5-Feb-2025
  • (2024)Data Augmentation Method Utilizing Template Sentences for Variable Definition ExtractionNatural Language Processing and Information Systems10.1007/978-3-031-70239-6_11(151-165)Online publication date: 25-Jun-2024
  • (2023)Simple algorithm for judging equivalence of differential-algebraic equation systemsScientific Reports10.1038/s41598-023-38254-y13:1Online publication date: 17-Jul-2023
  • (2022)A dive in white and grey shades of ML and non-ML literature: a multivocal analysis of mathematical expressionsArtificial Intelligence Review10.1007/s10462-022-10330-156:7(7047-7135)Online publication date: 6-Dec-2022
  • (2020)Mathematical Information RetrievalEvaluating Information Retrieval and Access Tasks10.1007/978-981-15-5554-1_12(169-185)Online publication date: 2-Sep-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media