ABSTRACT
Mathematical formulae in structural formats such as MathML and LaTeX are becoming increasingly available. Moreover, repositories and websites, including ArXiv and Wikipedia, and growing numbers of digital libraries use these structural formats to present mathematical formulae. This presents an important new and challenging area of research, namely Mathematical Information Retrieval (MIR). In this paper, we propose WikiMirs, a tool to facilitate mathematical formula retrieval in Wikipedia. WikiMirs is aimed at searching for similar mathematical formulae based upon both textual and spatial similarities, using a new indexing and matching model developed for layout structures. A hierarchical generalization technique is proposed to generate sub-trees from presentation trees of mathematical formulae, and similarity is calculated based upon matching at different levels of these trees. Experimental results show that WikiMirs can efficiently support sub-structure matching and similarity matching of mathematical formulae. Moreover, WikiMirs obtains both higher accuracy and better ranked results over Wikipedia in comparison to Wikipedia Search and Egomath. We conclude that WikiMirs provides a new, alternative, and hopefully better service for users to search mathematical expressions within Wikipedia.
- http://dlmf.nist.gov/.Google Scholar
- http://egomath.projekty.ms.mff.cuni.cz/.Google Scholar
- http://search.mathweb.org/.Google Scholar
- https://mir.fi.muni.cz/mias/.Google Scholar
- http://www.latexsearch.com/.Google Scholar
- http://www.mathjax.org/.Google Scholar
- http://www.openmath.org/.Google Scholar
- http://www.w3.org/math/.Google Scholar
- A. Asperti, F. Guidi, C. Coen, E. Tassi, and S. Zacchiroli. A content based mathematical search engine: Whelp. Types for Proofs and Programs, pages 17--32, 2006. Google ScholarDigital Library
- Y. Hijikata, H. Hashimoto, and S. Nishida. Search mathematical formulas by mathematical formulas. Human Interface and the Management of Information. Designing Information Environments, pages 404--411, 2009. Google ScholarDigital Library
- M. Kohlhase and I. Sucan. A search engine for mathematical formulae. In Artificial Intelligence and Symbolic Computation, pages 241--253. Springer, 2006. Google ScholarDigital Library
- B. Miller and A. Youssef. Technical aspects of the digital library of mathematical functions. Annals of Mathematics and Artificial Intelligence, 38(1):121--136, 2003. Google ScholarDigital Library
- R. Miner and R. Munavalli. An approach to mathematical search through query formulation and data normalization. Towards Mechanized Mathematical Assistants, pages 342--355, 2007. Google ScholarDigital Library
- J. Misutka and L. Galambos. Extending full text search engine for mathematical content. Towards Digital Mathematics Library. Birmingham, United Kingdom, July 27th, 2008, pages 55--67, 2008.Google Scholar
- T. Nguyen, S. Hui, and K. Chang. A lattice-based approach for mathematical search using formal concept analysis. Expert Systems with Applications, 2011. Google ScholarDigital Library
- T. T. Nguyen, K. Chang, and S. C. Hui. A math-aware search engine for math question answering system. In X. wen Chen, G. Lebanon, H. Wang, and M. J. Zaki, editors, CIKM, pages 724--733. ACM, 2012. Google ScholarDigital Library
- T. Schellenberg, B. Yuan, and R. Zanibbi. Layout-based substitution tree indexing and retrieval for mathematical expressions. In Proceedings of SPIE, volume 8297, page 82970I, 2012.Google Scholar
- P. Sojka and M. Liska. Indexing and searching mathematics in digital libraries. Intelligent Computer Mathematics, pages 228--243, 2011. Google ScholarDigital Library
- R. Zanibbi and D. Blostein. Recognition and retrieval of mathematical expressions. International Journal on Document Analysis and Recognition, pages 1--27, 2012.Google ScholarDigital Library
- R. Zanibbi and B. Yuan. Keyword and image-based retrieval of mathematical expressions. In IS&T/SPIE Electronic Imaging, pages 78740I--78740I. International Society for Optics and Photonics, 2011.Google ScholarCross Ref
- J. Zhao, M.-Y. Kan, and Y. L. Theng. Math information retrieval: user requirements and prototype implementation. In Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, pages 187--196. ACM, 2008. Google ScholarDigital Library
Index Terms
- WikiMirs: a mathematical information retrieval system for wikipedia
Recommendations
Semantification of Identifiers in Mathematics for Better Math Information Retrieval
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalMathematical formulae are essential in science, but face challenges of ambiguity, due to the use of a small number of identifiers to represent an immense number of concepts. Corresponding to word sense disambiguation in Natural Language Processing, we ...
WikiMirs 3.0: A Hybrid MIR System Based on the Context, Structure and Importance of Formulae in a Document
JCDL '15: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital LibrariesNowadays, mathematical information is increasingly available in websites and repositories, such like ArXiv, Wikipedia and growing numbers of digital libraries. Mathematical formulae are highly structured and usually presented in layout presentations, ...
A Mathematical Information Retrieval System Based on RankBoost
JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital LibrariesMathematical Information Retrieval (MIR) systems are designed to help users to find related formulae and further understand the formulae in scientific documents. However, in existing MIR systems, nearly all the ranker models of MIR systems are based on ...
Comments