Abstract
Nowadays, with the quick availability and growth of formulae on the Web, the question of how to effectively retrieve the relevant documents about formulae, namely formula retrieval, has attracted much attention from the researchers of mathematical information retrieval (MIR). Existing MIR search engines have explored much information of formulae such as characters, layout structure, the formula context. However, little attention has been paid to the link or citation relations of formulae among different documents, while these relations are helpful for searching some related formulae whose appearances are not similar to the query formula. Therefore, in this paper, we design a Formula Citation Graph (FCG) to ‘dig out’ the link or citation relations between formulae. FCG has two main advantages: 1) The graph could generate the descriptive keywords of formulae to enrich the semantics of formula queries. 2) The graph is employed to balance the ranking results between the text and structure matching. The experimental results demonstrate that the link or citation relations among formulae are helpful for MIR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Aizawa, A., Kohlhase, M., Ounis, I., Schubotz, M.: Ntcir-11 math-2 task overview. In: Proceedings of the 11th NTCIR Conference, NII (2014)
Davila, K., Zanibbi, R.: Layout and semantics: Combining representations for mathematical formula search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1165–1168 (2017)
Gao, L., Jiang, Z., Yin, Y., Yuan, K., Yan, Z., Tang, Z.: Preliminary exploration of formula embedding for mathematical information retrieval: can mathematical formulae be embedded like a natural language? arXiv preprint arXiv:1707.05154 (2017)
Järvelin, K., Kekäläinen, J.: Ir evaluation methods for retrieving highly relevant documents. In: ACM SIGIR Forum, vol. 51, pp. 243–250. ACM, New York (2017)
Jiang, Z., Gao, L., Yuan, K., Gao, Z., Tang, Z., Liu, X.: Mathematics content understanding for cyberlearning via formula evolution map. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 37–46 (2018)
Jiang, Z., Liu, X., Chen, Y.: Recovering uncaptured citations in a scholarly network: A two-step citation analysis to estimate publication importance. JAIST 67(7), 1722–1735 (2016)
Kamali, S., Tompa, F.W.: Retrieving documents with mathematical content. In: Proceedings of the 36th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 353–362 (2013)
Kristianto, G.Y., Aizawa, A., et al.: Extracting textual descriptions of mathematical expressions in scientific papers. D-Lib Mag. 20(11), 9 (2014)
Libbrecht, P., Melis, E.: Methods to access and retrieve mathematical content in ActiveMath. In: Iglesias, A., Takayama, N. (eds.) ICMS 2006. LNCS, vol. 4151, pp. 331–342. Springer, Heidelberg (2006). https://doi.org/10.1007/11832225_33
Lin, X., Gao, L., Hu, X., Tang, Z., Xiao, Y., Liu, X.: A mathematics retrieval system for formulae in layout presentations. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 697–706 (2014)
Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Mansouri, B., Rohatgi, S., Oard, D.W., Wu, J., Giles, C.L., Zanibbi, R.: Tangent-cft: an embedding model for mathematical formulas. In: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, pp. 11–18 (2019)
Miller, B.R., Youssef, A.: Technical aspects of the digital library of mathematical functions. AMAI 38(1–3), 121–136 (2003)
Pagael, R., Schubotz, M.: Mathematical language processing project. arXiv preprint arXiv:1407.0167 (2014)
Peng, S., Yuan, K., Gao, L., Tang, Z.: Mathbert: A pre-trained model for mathematical formula understanding. arXiv e-prints pp. arXiv-2105 (2021)
Perkiö, J., Buntine, W., Tirri, H.: A temporally adaptive content-based relevance ranking algorithm. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 647–648 (2005)
Quoc, M.N., Yokoi, K., Matsubayashi, Y., Aizawa, A.: Mining coreference relations between formulas and text using wikipedia. In: ICCL, p. 69 (2010)
Rûzicka, M., Sojka, P., Lıška, M.: Math indexer and searcher under the hood: Fine-tuning query expansion and unification strategies. In: NTCIR, pp. 7–10 (2016)
Schubotz, M., et al.: Semantification of identifiers in mathematics for better math information retrieval. In: SIGIR, pp. 135–144. ACM (2016)
Sojka, P., Líška, M.: Indexing and searching mathematics in digital libraries. In: Davenport, J.H., Farmer, W.M., Urban, J., Rabe, F. (eds.) CICM 2011. LNCS (LNAI), vol. 6824, pp. 228–243. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22673-1_16
Spitzer, F.: Principles of Random Walk, vol. 34. Springer, New York (2013). https://doi.org/10.1007/978-1-4757-4229-9
Thanda, A., Agarwal, A., Singla, K., Prakash, A., Gupta, A.: A document retrieval system for math queries. In: Proceedings of the 12th NTCIR Conference (2016)
Wang, Y., Gao, L., Wang, S., Tang, Z., Liu, X., Yuan, K.: Wikimirs 3.0: a hybrid mir system based on the context, structure and importance of formulae in a document. In: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 173–182 (2015)
Yokoi, K., Nghiem, M.Q., Matsubayashi, Y., Aizawa, A.: Contextual analysis of mathematical expressions for advanced mathematical search. Polibits 43, 81–86 (2011)
Yuan, K.: Multi-dimensional formula feature modeling for mathematical information retrieval. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1381–1381 (2017)
Yuan, K., Gao, L., Jiang, Z., Tang, Z.: Formula ranking within an article. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 123–126 (2018)
Yuan, K., Gao, L., Wang, Y., Yi, X., Tang, Z.: A mathematical information retrieval system based on rankboost. In: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, pp. 259–260 (2016)
Yuan, K., He, D., Jiang, Z., Gao, L., Tang, Z., Giles, C.L.: Automatic generation of headlines for online math questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9490–9497 (2020)
Zanibbi, R., et al.: Ntcir-12 mathir task overview. In: Proceedings of the 12th NTCIR Conference, NII (2016)
Zanibbi, R., Davila, K., Kane, A., Tompa, F.W.: Multi-stage math formula search: using appearance-based similarity metrics at scale. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 145–154 (2016)
Acknowledgements
This work is supported by the projects of National Natural Science Foundation of China (No. 61876003), National Key R&D Program of China (2019YFB1406303), Guangdong Basic and Applied Basic Research Foundation (2019A1515010837) and the Fundamental Research Funds for the Central Universities, which is also a research achievement of Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Yuan, K., Gao, L., Jiang, Z., Tang, Z. (2021). Formula Citation Graph Based Mathematical Information Retrieval. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12821. Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-86549-8_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86548-1
Online ISBN: 978-3-030-86549-8
eBook Packages: Computer ScienceComputer Science (R0)