skip to main content
10.1145/2390148.2390157acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
poster

Exploiting semantic annotations in math information retrieval

Published: 02 November 2012 Publication History

Abstract

This paper describes exploitation of semantic annotations in the design and architecture of MIaS (Math Indexer and Searcher) system for mathematics retrieval. Basing on the claim that navigational and research search are `killer' applications for digital library such as the European Digital Mathematics Library, EuDML, we argue for an approach based on Natural Language Processing techniques as used in corpus management systems such as the Sketch Engine, that will reach web scalability and avoid inference problems. The main ideas are 1) to augment surface texts (including math formulae) with additional linked representations bearing semantic information (expanded formulae as text, canonicalized text and subformulae) for indexing, including support for indexing structural information (expressed as Content MathML or other tree structures) and 2) use semantic user preferences to order found documents.
The semantic enhancements of the MIaS system are being implemented as a math-aware search engine based on the state-of-the-art system Apache Lucene, with support for [MathML] tree indexing. Scalability issues have been checked against more than 400,000 arXiv documents.

References

[1]
Josef B. Baker, Alan P. Sexton, and Volker Sorge. MaxTract: Converting PDF to ŁaTeX, MathML and Text. In AISC/\hskip0ptDML/\hskip0ptMKM/\hskip0ptCalculemus,Vol. 7362 of LNAI, pp. 422--426. Springer, 2012.
[2]
Marco Baroni and Adam Kilgarriff. Large linguistically-\penalty-200 processed webcorpora for multiple languages. In Proc. of the 11th Conference of the EACL'06, pp. 87--90, Stroudsburg, PA, USA, 2006. ACL.
[3]
José Borbinha, Thierry Bouche, Aleksander Nowinski, and Petr Sojka. Project EuDML--A First Year Demonstration. In Proc. of 10th MKM 2011, Vol. 6824 of LNAI,pp. 281--284, Berlin, Germany, July 2011.Springer\discretionary-Verlag. http://dx.doi.org/10.1007/978-3-642-22673-1_21.
[4]
Allison J.B. Chaney and David M. Blei. Visualizing topic models. In Intl. AAAI Conference on Social Media and Weblogs,Department of Computer Science, Princeton University, Princeton, NJ, USA,March 2012.
[5]
Adam Kilgarriff, Pavel Rychlý, Pavel Smrz, and David Tugwell. The Sketch Engine. In Proc. of the 11th EURALEX International Congress,pp. 105--116, Lorient, France, 2004.
[6]
Martin Lívska, Petr Sojka, Michal R\ru\vzi\vcka, and Petr Mravec. Web Interface and Collection for Mathematical Retrieval: WebMIaS and MREC. In Proc. of DML 2011. Bertinoro, Italy, July 20--21st, 2011, pp. 77--84. Masaryk University, July 2011. http://hdl.handle.net/10338.dmlcz/702604.
[7]
Xiaochuan Ni, Jian-Tao Sun, Jian Hu, and Zheng Chen. Cross lingual text classification by mining multilingual topics from wikipedia. In Proc. of the 4th ACM international conference on Web search and data mining, WSDM ’11, pp. 375--384, New York, NY, USA, 2011. ACM.
[8]
Radimv Rehurek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. In Proc. of LREC 2010 workshop New Challenges for NLPFrameworks, pp. 45--50, Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en, software available at http://nlp.fi.muni.cz/projekty/gensim.
[9]
Petr Sojka and Martin Líska. Indexing and Searching Mathematics in Digital Libraries -- Architecture, Design and Scalability Issues. In Proc. of 10th MKM 2011,Vol. 6824 of LNAI, pp. 228--243, Berlin, Germany,2011. Springer\discretionary-Verlag. http://dx.doi.org/10.1007/978--3--642--22673--1_16.
[10]
Petr Sojka and Martin Líska. The Art of Mathematics Retrieval. In Proceedings of the ACM Conference on Document Engineering, DocEng 2011, pp. 57--60, Mountain View, CA, 2011. ACM. http://doi.acm.org/10.1145/2034691.2034703.
[11]
Masakazu Suzuki, Fumikazu Tamari, Ryoji Fukuda, Seiichi Uchida, and Toshihiro Kanahori. INFTY\,--\,An integrated OCR system for mathematical documents. In Proc. of ACM Symposium on Document Engineering 2003, pp.mbox95--104, Grenoble, France, 2003. ACM.

Cited By

View all
  • (2022)Scientific document retrieval using structure encoded string with trie indexingInformation Services and Use10.3233/ISU-22015542:2(241-259)Online publication date: 1-Jan-2022
  • (2019)A Critical Survey of Mathematical Search EnginesComputational Intelligence, Communications, and Business Analytics10.1007/978-981-13-8581-0_16(193-207)Online publication date: 26-Jun-2019
  • (2017)Utilizing dependency relationships between math expressions in math IRInformation Retrieval Journal10.1007/s10791-017-9296-820:2(132-167)Online publication date: 14-Mar-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESAIR '12: Proceedings of the fifth workshop on Exploiting semantic annotations in information retrieval
November 2012
28 pages
ISBN:9781450317177
DOI:10.1145/2390148

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. digital mathematics libraries
  2. information systems
  3. math indexing and retrieval
  4. mathematical content representation
  5. mias
  6. webmias

Qualifiers

  • Poster

Conference

CIKM'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 35 of 55 submissions, 64%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Scientific document retrieval using structure encoded string with trie indexingInformation Services and Use10.3233/ISU-22015542:2(241-259)Online publication date: 1-Jan-2022
  • (2019)A Critical Survey of Mathematical Search EnginesComputational Intelligence, Communications, and Business Analytics10.1007/978-981-13-8581-0_16(193-207)Online publication date: 26-Jun-2019
  • (2017)Utilizing dependency relationships between math expressions in math IRInformation Retrieval Journal10.1007/s10791-017-9296-820:2(132-167)Online publication date: 14-Mar-2017
  • (2016)A Mathematical Ontology for a Pertinent Research of Didactic ExercisesProceedings of the Mediterranean Conference on Information & Communication Technologies 201510.1007/978-3-319-30298-0_15(143-149)Online publication date: 16-Apr-2016
  • (2014)Exploiting textual descriptions and dependency graph for searching mathematical expressions in scientific papersNinth International Conference on Digital Information Management (ICDIM 2014)10.1109/ICDIM.2014.6991403(110-117)Online publication date: Sep-2014
  • (2012)Report on the fifth workshop on exploiting semantic annotations in information retrieval (ESAIR'12)ACM SIGIR Forum10.1145/2492189.249219647:1(38-45)Online publication date: 7-Jun-2012
  • (2012)Fifth workshop on exploiting semantic annotations in information retrievalProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398761(2772-2773)Online publication date: 29-Oct-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media