Skip to main content

Making Equations Accessible in Scientific Documents

  • Conference paper
  • First Online:
Computers Helping People with Special Needs (ICCHP-AAATE 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13341))

  • 1757 Accesses

Abstract

Unlike a standard text document, a STEM document not only consists of text information but different components such as tables, figures, captions, mathematical equations etc. This paper presents a novel technique to detect mathematical equations in PDF documents and convert those equations into a more accessible format such as . We use visual features of the document to detect the mathematical equations using object detection and subsequently apply heuristics to the generated bounding boxes to precisely cover the complete equation. These detections are passed to a tool called Maxtract which will rewrite the equations in .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baker, J.B., Sexton, A.P., Sorge, V.: A linear grammar approach to mathematical formula recognition from PDF. In: Carette, J., Dixon, L., Coen, C.S., Watt, S.M. (eds.) CICM 2009. LNCS (LNAI), vol. 5625, pp. 201–216. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02614-0_19

    Chapter  Google Scholar 

  2. Baker, J.B., Sexton, A.P., Sorge, V.: MaxTract: Converting PDF to LaTeX, MathML and Text. In: Jeuring, J., et al. (eds.) CICM 2012. LNCS (LNAI), vol. 7362, pp. 422–426. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31374-5_29

  3. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  4. Gao, L., Yi, X., Liao, Y., Jiang, Z., Yan, Z., Tang, Z.: A deep learning-based formula detection method for PDF documents. In: 14th International Conference on Document Analysis and Recognition, vol. 1, pp. 553–558. IEEE (2017)

    Google Scholar 

  5. ICDAR: https://zenodo.org/record/4757865#.Yf5E0nUzZH5

  6. Inoue, K., Miyazaki, R., Suzuki, M.: Optical recognition of printed mathematical documents. In: Proceedings of the Third Asian Technology Conference in Mathematics, pp. 280–289 (1998)

    Google Scholar 

  7. Kacem, A., Belaïd, A., Ahmed, M.B.: Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context. Int. J. Doc. Anal. Recogn. 4(2), 97–108 (2001)

    Article  Google Scholar 

  8. Mali, P., Kukkadapu, P., Mahdavi, M., Zanibbi, R.: ScanSSD: scanning single shot detector for mathematical formulas in PDF document images. arXiv preprint arXiv:2003.08005 (2020)

  9. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015)

    Google Scholar 

  10. Sorge, V., Bansal, A., Jadhav, N.M., Garg, H., Verma, A., Balakrishnan, M.: Towards generating web-accessible stem documents from PDF. In: Proceedings of the 17th International Web for All Conference, pp. 1–5 (2020)

    Google Scholar 

  11. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY: an integrated OCR system for mathematical documents. In: Proceedings of the 2003 ACM Symposium on Document Engineering, pp. 95–104 (2003)

    Google Scholar 

  12. Tesseract-Ocr: https://github.com/tesseract-ocr/tesseract

  13. Zhong, Y., et al.: 1st place solution for ICDAR 2021 competition on mathematical formula detection. arXiv preprint arXiv:2107.05534 (2021)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjeev Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Juyal, S., Sharma, S., Jadhav, N., Sorge, V., Balakrishnan, M. (2022). Making Equations Accessible in Scientific Documents. In: Miesenberger, K., Kouroupetroglou, G., Mavrou, K., Manduchi, R., Covarrubias Rodriguez, M., Penáz, P. (eds) Computers Helping People with Special Needs. ICCHP-AAATE 2022. Lecture Notes in Computer Science, vol 13341. Springer, Cham. https://doi.org/10.1007/978-3-031-08648-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08648-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08647-2

  • Online ISBN: 978-3-031-08648-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics