Abstract
The web is a rich repository of mathematical information, the task of finding relevant documents in such collection is a laborious one. Although multiple approaches have been proposed to retrieve relevant documents for a queried formula, the poor values of evaluation measures depict existing limitations of such systems. To improve the performance of this systems, this paper proposes a novel approach of formula indexing by employing formula embedding and generalization techniques. The formula embedding and the generalization modules of the proposed system transform the formulas into the fixed-size vectors by counting the occurrence of different entities in formulas. Subsequently, the formula vectors are indexed by an indexer. The documents retrieved by both the modules have higher priorities in comparison to those retrieved by individual ones. The obtained results have been compared with the state-of-the-art existing approaches, and the comparison study reveals that the proposed approach gives better retrieval accuracy in terms of P\(\_\)5\(=\)0.522, P\(\_\)10\(=\)0.478, P\(\_\)15=0.352, and P\(\_\)20=0.289 measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asperti A, Padovani L, Coen CS, Guidi F, Schena I (2003) Mathematical knowledge management in helm. Ann Math Artif Intell 38(1–3):27–46
Ferreira D, Freitas A (2020) Natural language premise selection: finding supporting statements for mathematical text. arXiv:2004.14959
Gao L, Jiang Z, Yin Y, Yuan K, Yan Z, Tang Z (2017) Preliminary exploration of formula embedding for mathematical information retrieval: can mathematical formulae be embedded like a natural language? arXiv:1707.05154
Kristianto GY, Topic G, Aizawa A (2016) Mcat math retrieval system for ntcir-12 mathir task. In: Proceedings of the 12th NTCIR conference on evaluation of information access technologies, June 7–10, Tokyo, Japan, pp 323–330
Larson RR, Reynolds C, Gey FC (2013) The abject failure of keyword IR for mathematics search: Berkeley at NTCIR-10 math. In: Proceedings of the 10th NTCIR conference on evaluation of information access technologies, June 18–21, Tokyo, Japan, pp 662–666
Liska M, Sojka P, Ruzicka M (2015) Combining text and formula queries in math information retrieval: evaluation of query results merging strategies. In: Proceedings of the first international workshop on novel web search interfaces and systems, Oct 23. ACM, Melbourne, Australia, pp 7–9
Melis E, Büdenbender J, Goguadze G, Libbrecht P, Ullrich C et al (2003) Knowledge representation and management in activemath. Ann Math Artif Intell 38(1–3):47–64
Pathak A, Pakray P, Gelbukh A (2018) A formula embedding approach to math information retrieval. Computación y Sistemas 22(3)
Pathak A, Pakray P, Gelbukh A (2019) Binary vector transformation of math formula for mathematical information retrieval. J Intell Fuzzy Syst 36(5):4685–4695
Ruzicka M, Sojka P, Liska M (2014) Math indexer and searcher under the hood: history and development of a winning strategy. In: Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies, Dec 9–12, Tokyo, Japan, pp 127–134. Citeseer
Sojka P, Liska M (2011) The art of mathematics retrieval. In: Proceedings of the 11th ACM symposium on document engineering, pp 57–60
Sojka P, Ruzicka M (2018) Novotny: Mias: Math-aware retrieval in digital mathematical libraries. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 1923–1926
Stathopoulos Y, Teufel S (2016) Mathematical information retrieval based on type embedding and query expansion. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 2344–2355
Voorhees EM, Harman DK et al (2005) TREC: experiment and evaluation in information retrieval, vol 563–567. Computational Linguistics, MIT Press Cambridge
Zanibbi R, Aizawa A, Kohlhase M, Ounis I, Topic G, Davila K (2016) Ntcir-12 mathir task overview. In: Proceedings of the 12th NTCIR conference on evaluation of information access technologies, Tokyo, Japan, pp 299–308
Zhong W, Rohatgi S, Wu J, Giles CL, Zanibbi R (2020) Accelerating substructure similarity search for formula retrieval. In: European conference on information retrieval, pp 714–727. Springer (2020)
Acknowledgements
The authors would like to express gratitude to the Department of Computer Science and Engineering and Center for Natural Language Processing, National Institute of Technology, Silchar, India, for providing infrastructural facilities and support.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dadure, P., Pakray, P., Bandyopadhyay, S. (2022). A Fine-tuning Retrieval System for Mathematical Information. In: Giri, D., Raymond Choo, KK., Ponnusamy, S., Meng, W., Akleylek, S., Prasad Maity, S. (eds) Proceedings of the Seventh International Conference on Mathematics and Computing . Advances in Intelligent Systems and Computing, vol 1412. Springer, Singapore. https://doi.org/10.1007/978-981-16-6890-6_81
Download citation
DOI: https://doi.org/10.1007/978-981-16-6890-6_81
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6889-0
Online ISBN: 978-981-16-6890-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)