Skip to main content
Log in

Abstract.

Mathematical documents are analyzed from several viewpoints for the development of practical OCR for mathematical and other scientific documents. Specifically, four viewpoints are quantified using a large-scale database of mathematical documents, containing 690,000 manually ground-truthed characters: (i) the number of character categories, (ii) abnormal characters (e.g., touching characters), (iii) character size variation, and (iv) the complexity of the mathematical expressions. The result of these analyses clarifies the difficulties of recognizing mathematical documents and then suggests several promising directions to overcome them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hara S, Ohtake N, Higuchi M, Miyazaki N, Watanabe A, Kusunoki K, Sato H (2000) MathBraille; a system to transform LATEX documents into Braille. SIGCAPH Newslett 66:17-20

  2. Michler GO (2001) Report on the retrodigitization project "Archiv der Mathematik". Archiv der Mathematik 77:116-128

  3. Dennis K, Michler GO, Schneider G, Suzuki M (2003) Automatic reference linking in distributed digital libraries. In: Proceedings of the workshop of document image analysis and retrieval (DIAR-03)

  4. Blostein D, Grbavec A (1997) Recognition of mathematical notation. In: Bunke H, Wang PSP, Handbook of character recognition and document image analysis. World Scientific, Singapore, pp 557-582

  5. Chan K-F, Yeung D-Y (2000) Mathematical expression recognition: a survey. Int J Doc Anal Recog 3(1):3-15

    Google Scholar 

  6. Lee H-J, Wang J-S (1997) Design of a mathematical expression understanding system. Pattern Recog Lett 18(3):289-298

    Google Scholar 

  7. Okamoto M, Imai H, Takagi K (2001) Performance evaluation of a robust method for mathematical expression recognition. In: Proceedings of the international conference on document analysis and recognition, pp 121-128

  8. Mitra J, Garain U, Chaudhuri BB, Swamy K, Pal T (2003) Automatic understanding of structures in printed mathematical expressions. In: Proceedings of the international conference on document analysis and recognition, pp 540-544

  9. Chaudhuri BB, Garain U (2001) Extraction type-based meta-information from imaged documents. Int J Doc Anal Recog 3(3):138-149

    Google Scholar 

  10. Nagy G, Shelton G Jr (1966) Self-corrective character recognition system. IEEE Trans Inf Theory 12(2):215-222

    Google Scholar 

  11. Baird HS, Nagy G (1994) A self-correcting 100-font classifier. In: Document Recognition, Proceedings of SPIE, 2181:106-115

  12. Okamoto M, Sakaguchi S, Suzuki T (1999) Segmentation of touching characters in formulas. Document analysis systems: theory and practice. 3rd IAPR workshop, DAS'98, selected papers. Lecture notes in computer science, vol 1655. Springer, Berlin Heidelberg New York

  13. Nomura A, Michishita K, Uchida S, Suzuki M (2003) Detection and segmentation of touching characters in mathematical expressions. In: Proceedings of the international conference on document analysis and recognition, 1:126-130

  14. Ha J, Haralick RM, Phillips IT (1995) Understanding mathematical expressions from document images. In: Proceedings of the international conference on document analysis and recognition, pp 956-959

  15. Eto Y, Suzuki M (2001) Mathematical formula recognition using virtual link network. In: Proceedings of the international conference on document analysis and recognition, pp 762-767

  16. Zanibbi R, Blostein D, Cordy JR (2002) Recognizing handwritten mathematical expressions using tree transformation. IEEE Trans Pattern Anal Mach Intell 24(11):1455-1467

    Google Scholar 

  17. http://www.inftyproject.org

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Uchida.

Additional information

Received: 3 March 2004, Accepted: 5 January 2005, Published online: 29 June 2005

Correspondence to: S. Uchida

Rights and permissions

Reprints and permissions

About this article

Cite this article

Uchida, S., Nomura, A. & Suzuki, M. Quantitative analysis of mathematical documents. IJDAR 7, 211–218 (2005). https://doi.org/10.1007/s10032-005-0142-y

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-005-0142-y

Keywords:

Navigation