Skip to main content

CASIA-onDo: A New Database for Online Handwritten Document Analysis

  • Conference paper
  • First Online:
Pattern Recognition (ACPR 2021)

Abstract

In this paper we introduce an online handwritten document database (CASIA-onDo), serving as a standard database for the development and evaluation of methods in the field of online handwritten document layout analysis. It consists of 2,012 documents including a total of 841,159 online strokes. The database, covering Chinese and English languages, was produced by 200 writers. Six types of contents occur in the documents, namely text, formulas, diagrams, tables, figures, and lists. The distribution of different types is close to the actual situation. Benefiting from detailed annotations, CASIA-onDo can support different tasks of layout analysis under online or offline settings. Firstly, based on the semantic level annotation, it can be used for many classification tasks such as text/non-text classification, table/non-table classification, multi-class stroke classification and so on. Secondly, based on the instance level annotation, it can be used for segmentation tasks such as text line separation and formula segmentation. Thirdly, based on the various writing styles, it can be used for handwriting recognition and writer clustering tasks. In addition, we perform preliminary experiments to provide a benchmark on this database with a state-of-the-art method. More techniques can be evaluated on this challenging database in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.nlpr.ia.ac.cn/databases/CASIA-onDo/index.html.

References

  1. Guyon, I., Schomaker, L., Plamondon, R., et al.: UNIPEN project of on-line data exchange and recognizer benchmarks. In: 12th International Conference on Pattern Recognition, vol. 2, pp. 29–33. IEEE, New York (1994)

    Google Scholar 

  2. Viard-Gaudin, C., Lallican, P.M., Knerr, S., et al.: The IRESTE On/Off (IRONOFF) dual handwriting database. In: 5th International Conference on Document Analysis and Recognition, pp. 455–458. IEEE, New York (1999)

    Google Scholar 

  3. Liwicki, M., Bunke, H.: IAM-OnDB-an on-line English sentence database acquired from handwritten text on a whiteboard. In: 8th International Conference on Document Analysis and Recognition, pp. 956–961. IEEE, New York (2005)

    Google Scholar 

  4. Nakagawa, M., Onuma, M.: Online handwritten Japanese text recognition free from constrains on line direction and character orientation. In: 7-th International Conference on Document Analysis and Recognition, pp. 519–523. IEEE, New York (2003)

    Google Scholar 

  5. Jin, L., Gao, Y., Liu, G., et al.: SCUT-COUCH2009-a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation. Int. J. Doc. Anal. Recogn. 14(1), 53–64 (2011)

    Article  Google Scholar 

  6. Liu, C.L., Yin, F., Wang, D.H., et al.: CASIA online and offline Chinese handwriting databases. In: International Conference on Document Analysis and Recognition, pp. 37–41. IEEE (2011)

    Google Scholar 

  7. Awal, A.M., Feng, G., et al.: First experiments on a new online handwritten flowchart database. In: Document Recognition and Retrieval XVIII, vol. 7874, p. 78740A. International Society for Optics and Photonics, Bellingham (2011)

    Google Scholar 

  8. Bresler, M., Průša, D., Hlaváč, V.: Online recognition of sketched arrow-connected diagrams. Int. J. Doc. Anal. Recogn. (IJDAR) 19(3), 253–267 (2016). https://doi.org/10.1007/s10032-016-0269-z

    Article  Google Scholar 

  9. Bresler, M., Van Phan, T., Prusa, D., et al.: Recognition system for on-line sketched diagrams. In: 14th International Conference on Frontiers in Handwriting Recognition, pp. 563–568. IEEE, New York (2014)

    Google Scholar 

  10. Yun, X.L., Zhang, Y.M., Yin, F., et al.: Instance GNN: a learning framework for joint symbol segmentation and recognition in online handwritten diagrams. IEEE Trans. Multimedia (2021)

    Google Scholar 

  11. Mouchère, H., Zanibbi, R., Garain, U., Viard-Gaudin, C.: Advancing the state of the art for handwritten math recognition: the CROHME competitions, 2011–2014. Int. J. Doc. Anal. Recogn. (IJDAR) 19(2), 173–189 (2016). https://doi.org/10.1007/s10032-016-0263-5

    Article  Google Scholar 

  12. Mouchere, H., Viard-Gaudin, C., Zanibbi, R., et al.: ICFHR2014 competition on recognition of on-line handwritten mathematical expressions. In: 14th International Conference on Frontiers in Handwriting Recognition, pp. 791–796. IEEE, New York (2014)

    Google Scholar 

  13. Mouchère, H., Viard-Gaudin, C., Zanibbi, R., et al.: ICFHR2016 CROHME: competition on recognition of online handwritten mathematical expressions. In: 15th International Conference on Frontiers in Handwriting Recognition, pp. 607–612. IEEE, New York (2016)

    Google Scholar 

  14. Mochida, K., Nakagawa, M.: Separating figures, mathematical formulas and Japanese text from free handwriting in mixed online documents. Int. J. Pattern Recognit. Artif. Intell. 18(07), 1173–1187 (2004)

    Article  Google Scholar 

  15. Indermühle, E., Liwicki, M., Bunke, H.: IAMonDo-database: an online handwritten document database with non-uniform contents. In: 9th International Workshop on Document Analysis Systems, pp. 97–104. Association for Computing Machinery, New York (2010)

    Google Scholar 

  16. Delaye, A., Liu, C.L.: Contextual text/non-text stroke classification in online handwritten notes with conditional random fields. Pattern Recogn. 47(3), 959–968 (2014)

    Article  Google Scholar 

  17. Ye, J.Y., Zhang, Y.M., Liu, C.L.: Joint training of conditional random fields and neural networks for stroke classification in online handwritten document. In: International Conference on Pattern Recognition, pp. 3264–3269. IEEE (2016)

    Google Scholar 

  18. Indermühle, E., Frinken, V., Bunke, H.: Mode detection in online handwritten documents using BLSTM neural networks. In: International Conference on Frontiers in Handwriting Recognition, pp. 302–307. IEEE (2012)

    Google Scholar 

  19. Van Phan, T., Nakagawa, M.: Combination of global and local contexts for text/non-text classification in heterogeneous online handwritten documents. Pattern Recognit. 51, 112–124 (2016)

    Article  Google Scholar 

  20. Watt, S.M., Underhill, T., Chee, Y.M., et al.: Ink markup language (InkML). W3C Proposed Recommendation, vol. 10 (2011)

    Google Scholar 

  21. Ye, J.Y., Zhang, Y.M., Yang, Q., et al.: Contextual stroke classification in online handwritten documents with graph attention networks. In: International Conference on Document Analysis and Recognition, pp. 993–998. IEEE, New York (2019)

    Google Scholar 

  22. Ye, J.Y., Zhang, Y.M., Yang, Q., et al.: Contextual stroke classification in online handwritten documents with edge graph attention networks. SN Comput. Sci. 1, 1–13 (2020)

    Article  Google Scholar 

  23. Ye, J.Y., Zhang, Y.M., Yang, Q., et al.: Joint stroke classification and text line grouping in online handwritten documents with edge pooling attention networks. Pattern Recognit. 114, 107859 (2021)

    Google Scholar 

Download references

Acknowledgement

This work has been supported by the National Key Research and Development Program under Grand No. 2020AAA0109700 and the National Natural Science Foundation of China (NSFC) under Grant No. 61773376.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Ting Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, YT., Zhang, YM., Yun, XL., Yin, F., Liu, CL. (2022). CASIA-onDo: A New Database for Online Handwritten Document Analysis. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-02444-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-02443-6

  • Online ISBN: 978-3-031-02444-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics