Skip to main content

Formula Citation Graph Based Mathematical Information Retrieval

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12821))

Included in the following conference series:

Abstract

Nowadays, with the quick availability and growth of formulae on the Web, the question of how to effectively retrieve the relevant documents about formulae, namely formula retrieval, has attracted much attention from the researchers of mathematical information retrieval (MIR). Existing MIR search engines have explored much information of formulae such as characters, layout structure, the formula context. However, little attention has been paid to the link or citation relations of formulae among different documents, while these relations are helpful for searching some related formulae whose appearances are not similar to the query formula. Therefore, in this paper, we design a Formula Citation Graph (FCG) to ‘dig out’ the link or citation relations between formulae. FCG has two main advantages: 1) The graph could generate the descriptive keywords of formulae to enrich the semantics of formula queries. 2) The graph is employed to balance the ranking results between the text and structure matching. The experimental results demonstrate that the link or citation relations among formulae are helpful for MIR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://ntcir-math.nii.ac.jp/.

References

  1. Aizawa, A., Kohlhase, M., Ounis, I., Schubotz, M.: Ntcir-11 math-2 task overview. In: Proceedings of the 11th NTCIR Conference, NII (2014)

    Google Scholar 

  2. Davila, K., Zanibbi, R.: Layout and semantics: Combining representations for mathematical formula search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1165–1168 (2017)

    Google Scholar 

  3. Gao, L., Jiang, Z., Yin, Y., Yuan, K., Yan, Z., Tang, Z.: Preliminary exploration of formula embedding for mathematical information retrieval: can mathematical formulae be embedded like a natural language? arXiv preprint arXiv:1707.05154 (2017)

  4. Järvelin, K., Kekäläinen, J.: Ir evaluation methods for retrieving highly relevant documents. In: ACM SIGIR Forum, vol. 51, pp. 243–250. ACM, New York (2017)

    Google Scholar 

  5. Jiang, Z., Gao, L., Yuan, K., Gao, Z., Tang, Z., Liu, X.: Mathematics content understanding for cyberlearning via formula evolution map. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 37–46 (2018)

    Google Scholar 

  6. Jiang, Z., Liu, X., Chen, Y.: Recovering uncaptured citations in a scholarly network: A two-step citation analysis to estimate publication importance. JAIST 67(7), 1722–1735 (2016)

    Google Scholar 

  7. Kamali, S., Tompa, F.W.: Retrieving documents with mathematical content. In: Proceedings of the 36th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 353–362 (2013)

    Google Scholar 

  8. Kristianto, G.Y., Aizawa, A., et al.: Extracting textual descriptions of mathematical expressions in scientific papers. D-Lib Mag. 20(11), 9 (2014)

    Google Scholar 

  9. Libbrecht, P., Melis, E.: Methods to access and retrieve mathematical content in ActiveMath. In: Iglesias, A., Takayama, N. (eds.) ICMS 2006. LNCS, vol. 4151, pp. 331–342. Springer, Heidelberg (2006). https://doi.org/10.1007/11832225_33

    Chapter  MATH  Google Scholar 

  10. Lin, X., Gao, L., Hu, X., Tang, Z., Xiao, Y., Liu, X.: A mathematics retrieval system for formulae in layout presentations. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 697–706 (2014)

    Google Scholar 

  11. Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Google Scholar 

  12. Mansouri, B., Rohatgi, S., Oard, D.W., Wu, J., Giles, C.L., Zanibbi, R.: Tangent-cft: an embedding model for mathematical formulas. In: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, pp. 11–18 (2019)

    Google Scholar 

  13. Miller, B.R., Youssef, A.: Technical aspects of the digital library of mathematical functions. AMAI 38(1–3), 121–136 (2003)

    MathSciNet  MATH  Google Scholar 

  14. Pagael, R., Schubotz, M.: Mathematical language processing project. arXiv preprint arXiv:1407.0167 (2014)

  15. Peng, S., Yuan, K., Gao, L., Tang, Z.: Mathbert: A pre-trained model for mathematical formula understanding. arXiv e-prints pp. arXiv-2105 (2021)

    Google Scholar 

  16. Perkiö, J., Buntine, W., Tirri, H.: A temporally adaptive content-based relevance ranking algorithm. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 647–648 (2005)

    Google Scholar 

  17. Quoc, M.N., Yokoi, K., Matsubayashi, Y., Aizawa, A.: Mining coreference relations between formulas and text using wikipedia. In: ICCL, p. 69 (2010)

    Google Scholar 

  18. Rûzicka, M., Sojka, P., Lıška, M.: Math indexer and searcher under the hood: Fine-tuning query expansion and unification strategies. In: NTCIR, pp. 7–10 (2016)

    Google Scholar 

  19. Schubotz, M., et al.: Semantification of identifiers in mathematics for better math information retrieval. In: SIGIR, pp. 135–144. ACM (2016)

    Google Scholar 

  20. Sojka, P., Líška, M.: Indexing and searching mathematics in digital libraries. In: Davenport, J.H., Farmer, W.M., Urban, J., Rabe, F. (eds.) CICM 2011. LNCS (LNAI), vol. 6824, pp. 228–243. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22673-1_16

    Chapter  MATH  Google Scholar 

  21. Spitzer, F.: Principles of Random Walk, vol. 34. Springer, New York (2013). https://doi.org/10.1007/978-1-4757-4229-9

  22. Thanda, A., Agarwal, A., Singla, K., Prakash, A., Gupta, A.: A document retrieval system for math queries. In: Proceedings of the 12th NTCIR Conference (2016)

    Google Scholar 

  23. Wang, Y., Gao, L., Wang, S., Tang, Z., Liu, X., Yuan, K.: Wikimirs 3.0: a hybrid mir system based on the context, structure and importance of formulae in a document. In: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 173–182 (2015)

    Google Scholar 

  24. Yokoi, K., Nghiem, M.Q., Matsubayashi, Y., Aizawa, A.: Contextual analysis of mathematical expressions for advanced mathematical search. Polibits 43, 81–86 (2011)

    Article  Google Scholar 

  25. Yuan, K.: Multi-dimensional formula feature modeling for mathematical information retrieval. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1381–1381 (2017)

    Google Scholar 

  26. Yuan, K., Gao, L., Jiang, Z., Tang, Z.: Formula ranking within an article. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 123–126 (2018)

    Google Scholar 

  27. Yuan, K., Gao, L., Wang, Y., Yi, X., Tang, Z.: A mathematical information retrieval system based on rankboost. In: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, pp. 259–260 (2016)

    Google Scholar 

  28. Yuan, K., He, D., Jiang, Z., Gao, L., Tang, Z., Giles, C.L.: Automatic generation of headlines for online math questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9490–9497 (2020)

    Google Scholar 

  29. Zanibbi, R., et al.: Ntcir-12 mathir task overview. In: Proceedings of the 12th NTCIR Conference, NII (2016)

    Google Scholar 

  30. Zanibbi, R., Davila, K., Kane, A., Tompa, F.W.: Multi-stage math formula search: using appearance-based similarity metrics at scale. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 145–154 (2016)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the projects of National Natural Science Foundation of China (No. 61876003), National Key R&D Program of China (2019YFB1406303), Guangdong Basic and Applied Basic Research Foundation (2019A1515010837) and the Fundamental Research Funds for the Central Universities, which is also a research achievement of Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liangcai Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yuan, K., Gao, L., Jiang, Z., Tang, Z. (2021). Formula Citation Graph Based Mathematical Information Retrieval. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12821. Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86549-8_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86548-1

  • Online ISBN: 978-3-030-86549-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics