Abstract
A new method to recognize STEM contents in “e-born PDF,” which is produced originally from an electronic file such as a Microsoft-Word document, LaTeX system, etc., is developed. Character information (the character code, the font type and the coordinates on a page) extracted directly from a document is combined with analysis technologies in Math OCR. It improves recognition rate for STEM contents in e-born PDF remarkably, compared with ordinary image-based OCR approaches. This new method is actually implemented in our math OCR system (InftyReader).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. Int. J. Doc. Anal. Recogn. 15(4), 331–357 (2012)
Publication List at InftyProject. http://www.inftyproject.org/en/articles_ocr.html
Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty - an integrated OCR system for mathematical documents. In: Proceedings of ACM Symposium on Document Engineering 2003, Grenoble, pp. 95–104 (2003)
InftyReader. http://www.sciaccess.net/en/InftyReader/
Baker J. at Google Scholar. http://scholar.google.co.uk/citations?user=0g0TMuQAAAAJ&hl=en
Baker, J.B., Sexton, A.P., Sorge, V.: MaxTract: converting PDF to LaTeX, MathML and text. In: Jeuring, J., Campbell, J.A., Carette, J., Reis, G.D., Sojka, P., Wenzel, M., Sorge, V. (eds.) CICM 2012. LNCS, vol. 7362, pp. 422–426. Springer, Heidelberg (2012)
MathTract. http://www.cs.bham.ac.uk/research/groupings/reasoning/sdag/maxtract.php
Yu, B., Tian, X., Luo, W.: Extracting mathematical components directly from PDF documents for mathematical expression recognition and retrieval. In: Tan, Y., Shi, Y., Coello, C.A.C. (eds.) ICSI 2014, Part II. LNCS, vol. 8795, pp. 170–179. Springer, Heidelberg (2014)
Lin, X., Gao, L., Tang, Z., Baker, J., Sorge, V.: Mathematical formula identification and performance evaluation in PDF documents. Int. J. Doc. Anal. Recogn. 17, 239–255 (2014)
PDFLib. https://www.pdflib.com/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Suzuki, M., Yamaguchi, K. (2016). Recognition of E-Born PDF Including Mathematical Formulas. In: Miesenberger, K., Bühler, C., Penaz, P. (eds) Computers Helping People with Special Needs. ICCHP 2016. Lecture Notes in Computer Science(), vol 9758. Springer, Cham. https://doi.org/10.1007/978-3-319-41264-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-41264-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41263-4
Online ISBN: 978-3-319-41264-1
eBook Packages: Computer ScienceComputer Science (R0)