skip to main content
10.1145/3197026.3197061acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
short-paper

Formula Ranking within an Article

Published:23 May 2018Publication History

ABSTRACT

With the ever-increasing volume of formulae on the Web, formula retrieval has drawn much attention from researchers. However, most of the existing researches on formula retrieval treat each formula within an article equally, while different formulae in the same article have different importance to the article. In this paper, we address the issue to rank formulae within an article based on their importance. To evaluate the importance of each formula within an article, a formula citation graph is firstly built in a large scale corpus. And the inter-articles features of formulae are extracted by the link topology analysis of formulae based on the graph. Then the word embedding model is explored to extract the inner-article features by mining the semantic relevance between a formula and the corresponding article. Finally, we leverage learning to rank technique for formulae ranking within an article based on those features. The experimental results demonstrate that the proposed features are helpful for formula ranking and our approach yields better performance compared with other state-of-the-art methods.

References

  1. Akiko Aizawa, Michael Kohlhase, Iadh Ounis, and Moritz Schubotz . 2014. NTCIR-11 Math-2 Task Overview. In NTCIR.Google ScholarGoogle Scholar
  2. Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender . 2005. Learning to rank using gradient descent. In ICML. ACM, 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Zhe Cao, Tao Qin, Tie Yan Liu, Ming Feng Tsai, and Hang Li . 2007. Learning to rank:from pairwise approach to listwise approach International Conference on Machine Learning. 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kenny Davila and Richard Zanibbi . 2017. Layout and Semantics: Combining Representations for Mathematical Formula Search International Conference on Research and Development in Information Retrieval. 1165--1168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated gain-based evaluation of IR techniques. TOIS (2002), 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zhuoren Jiang, Xiaozhong Liu, and Yan Chen . 2016. Recovering Uncaptured Citations in a Scholarly Network: A Two-step Citation Analysis to Estimate Publication Importance. J. Assoc. Inf. Sci. Technol. Vol. 67 (2016), 1722--1735. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Thorsten Joachims . 2002. Optimizing search engines using clickthrough data. SIGKDD. ACM, 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Giovanni Yoko Kristianto, Goran Topić, and Akiko Aizawa . 2017. Utilizing dependency relationships between math expressions in math IR. Information Retrieval Journal Vol. 20 (2017), 132--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean . 2013. Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems Vol. 26 (2013), 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bruce R Miller and Abdou Youssef . 2003. Technical aspects of the digital library of mathematical functions AAAI. Springer, 121--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hao Peng, Jing Liu, and Chin Yew Lin . 2016. News Citation Recommendation with Implicit and Explicit Semantics ACL. 388--398.Google ScholarGoogle Scholar
  12. Gerard Salton and Michael J McGill . 1986. Introduction to modern information retrieval. (1986). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bingjun Sun, Qingzhao Tan, Prasenjit Mitra, and C Lee Giles . 2007. Extraction and search of chemical formulae in text documents on the web WWW. ACM, 251--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Abhinav Thanda, Ankit Agarwal, Kushal Singla, Aditya Prakash, and Abhishek Gupta . 2016. A Document Retrieval System for Math Queries. In NTCIR.Google ScholarGoogle Scholar
  15. Yuehan Wang, Liangcai Gao, Simeng Wang, Zhi Tang, Xiaozhong Liu, and Ke Yuan . 2015. WikiMirs 3.0:A Hybrid MIR System Based on the Context, Structure and Importance of Formulae in a Document. In JCDL. 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ke Yuan, Liangcai Gao, Yuehan Wang, Xiaohan Yi, and Zhi Tang . 2016. A mathematical information retrieval system based on RankBoost JCDL. 259--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Richard Zanibbi, Akiko Aizawa, Michael Kohlhase, Iadh Ounis, Goran Topic, and Kenny Davila . 2016 a. NTCIR-12 MathIR Task Overview. In NTCIR.Google ScholarGoogle Scholar
  18. Richard Zanibbi, Kenny Davila, Andrew Kane, and Frank Wm Tompa . 2016 b. Multi-Stage Math Formula Search:Using Appearance-Based Similarity Metrics at Scale SIGIR. 145--154. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Formula Ranking within an Article

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries
      May 2018
      453 pages
      ISBN:9781450351782
      DOI:10.1145/3197026

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 May 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      JCDL '18 Paper Acceptance Rate26of71submissions,37%Overall Acceptance Rate415of1,482submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader