Skip to main content

Recognition of E-Born PDF Including Mathematical Formulas

  • Conference paper
  • First Online:
Computers Helping People with Special Needs (ICCHP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9758))

Abstract

A new method to recognize STEM contents in “e-born PDF,” which is produced originally from an electronic file such as a Microsoft-Word document, LaTeX system, etc., is developed. Character information (the character code, the font type and the coordinates on a page) extracted directly from a document is combined with analysis technologies in Math OCR. It improves recognition rate for STEM contents in e-born PDF remarkably, compared with ordinary image-based OCR approaches. This new method is actually implemented in our math OCR system (InftyReader).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. Int. J. Doc. Anal. Recogn. 15(4), 331–357 (2012)

    Article  Google Scholar 

  2. Publication List at InftyProject. http://www.inftyproject.org/en/articles_ocr.html

  3. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty - an integrated OCR system for mathematical documents. In: Proceedings of ACM Symposium on Document Engineering 2003, Grenoble, pp. 95–104 (2003)

    Google Scholar 

  4. InftyReader. http://www.sciaccess.net/en/InftyReader/

  5. Baker J. at Google Scholar. http://scholar.google.co.uk/citations?user=0g0TMuQAAAAJ&hl=en

  6. Baker, J.B., Sexton, A.P., Sorge, V.: MaxTract: converting PDF to LaTeX, MathML and text. In: Jeuring, J., Campbell, J.A., Carette, J., Reis, G.D., Sojka, P., Wenzel, M., Sorge, V. (eds.) CICM 2012. LNCS, vol. 7362, pp. 422–426. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. MathTract. http://www.cs.bham.ac.uk/research/groupings/reasoning/sdag/maxtract.php

  8. MathTract. http://researchblogs.cs.bham.ac.uk/math-access/

  9. Yu, B., Tian, X., Luo, W.: Extracting mathematical components directly from PDF documents for mathematical expression recognition and retrieval. In: Tan, Y., Shi, Y., Coello, C.A.C. (eds.) ICSI 2014, Part II. LNCS, vol. 8795, pp. 170–179. Springer, Heidelberg (2014)

    Google Scholar 

  10. Lin, X., Gao, L., Tang, Z., Baker, J., Sorge, V.: Mathematical formula identification and performance evaluation in PDF documents. Int. J. Doc. Anal. Recogn. 17, 239–255 (2014)

    Article  Google Scholar 

  11. PDFLib. https://www.pdflib.com/

  12. PDFMiner. http://www.unixuser.org/euske/python/pdfminer/

  13. Xpdf. http://www.foolabs.com/xpdf/download.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katsuhito Yamaguchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Suzuki, M., Yamaguchi, K. (2016). Recognition of E-Born PDF Including Mathematical Formulas. In: Miesenberger, K., Bühler, C., Penaz, P. (eds) Computers Helping People with Special Needs. ICCHP 2016. Lecture Notes in Computer Science(), vol 9758. Springer, Cham. https://doi.org/10.1007/978-3-319-41264-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41264-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41263-4

  • Online ISBN: 978-3-319-41264-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics