Abstract
In this paper we introduce an online handwritten document database (CASIA-onDo), serving as a standard database for the development and evaluation of methods in the field of online handwritten document layout analysis. It consists of 2,012 documents including a total of 841,159 online strokes. The database, covering Chinese and English languages, was produced by 200 writers. Six types of contents occur in the documents, namely text, formulas, diagrams, tables, figures, and lists. The distribution of different types is close to the actual situation. Benefiting from detailed annotations, CASIA-onDo can support different tasks of layout analysis under online or offline settings. Firstly, based on the semantic level annotation, it can be used for many classification tasks such as text/non-text classification, table/non-table classification, multi-class stroke classification and so on. Secondly, based on the instance level annotation, it can be used for segmentation tasks such as text line separation and formula segmentation. Thirdly, based on the various writing styles, it can be used for handwriting recognition and writer clustering tasks. In addition, we perform preliminary experiments to provide a benchmark on this database with a state-of-the-art method. More techniques can be evaluated on this challenging database in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Guyon, I., Schomaker, L., Plamondon, R., et al.: UNIPEN project of on-line data exchange and recognizer benchmarks. In: 12th International Conference on Pattern Recognition, vol. 2, pp. 29–33. IEEE, New York (1994)
Viard-Gaudin, C., Lallican, P.M., Knerr, S., et al.: The IRESTE On/Off (IRONOFF) dual handwriting database. In: 5th International Conference on Document Analysis and Recognition, pp. 455–458. IEEE, New York (1999)
Liwicki, M., Bunke, H.: IAM-OnDB-an on-line English sentence database acquired from handwritten text on a whiteboard. In: 8th International Conference on Document Analysis and Recognition, pp. 956–961. IEEE, New York (2005)
Nakagawa, M., Onuma, M.: Online handwritten Japanese text recognition free from constrains on line direction and character orientation. In: 7-th International Conference on Document Analysis and Recognition, pp. 519–523. IEEE, New York (2003)
Jin, L., Gao, Y., Liu, G., et al.: SCUT-COUCH2009-a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation. Int. J. Doc. Anal. Recogn. 14(1), 53–64 (2011)
Liu, C.L., Yin, F., Wang, D.H., et al.: CASIA online and offline Chinese handwriting databases. In: International Conference on Document Analysis and Recognition, pp. 37–41. IEEE (2011)
Awal, A.M., Feng, G., et al.: First experiments on a new online handwritten flowchart database. In: Document Recognition and Retrieval XVIII, vol. 7874, p. 78740A. International Society for Optics and Photonics, Bellingham (2011)
Bresler, M., Průša, D., Hlaváč, V.: Online recognition of sketched arrow-connected diagrams. Int. J. Doc. Anal. Recogn. (IJDAR) 19(3), 253–267 (2016). https://doi.org/10.1007/s10032-016-0269-z
Bresler, M., Van Phan, T., Prusa, D., et al.: Recognition system for on-line sketched diagrams. In: 14th International Conference on Frontiers in Handwriting Recognition, pp. 563–568. IEEE, New York (2014)
Yun, X.L., Zhang, Y.M., Yin, F., et al.: Instance GNN: a learning framework for joint symbol segmentation and recognition in online handwritten diagrams. IEEE Trans. Multimedia (2021)
Mouchère, H., Zanibbi, R., Garain, U., Viard-Gaudin, C.: Advancing the state of the art for handwritten math recognition: the CROHME competitions, 2011–2014. Int. J. Doc. Anal. Recogn. (IJDAR) 19(2), 173–189 (2016). https://doi.org/10.1007/s10032-016-0263-5
Mouchere, H., Viard-Gaudin, C., Zanibbi, R., et al.: ICFHR2014 competition on recognition of on-line handwritten mathematical expressions. In: 14th International Conference on Frontiers in Handwriting Recognition, pp. 791–796. IEEE, New York (2014)
Mouchère, H., Viard-Gaudin, C., Zanibbi, R., et al.: ICFHR2016 CROHME: competition on recognition of online handwritten mathematical expressions. In: 15th International Conference on Frontiers in Handwriting Recognition, pp. 607–612. IEEE, New York (2016)
Mochida, K., Nakagawa, M.: Separating figures, mathematical formulas and Japanese text from free handwriting in mixed online documents. Int. J. Pattern Recognit. Artif. Intell. 18(07), 1173–1187 (2004)
Indermühle, E., Liwicki, M., Bunke, H.: IAMonDo-database: an online handwritten document database with non-uniform contents. In: 9th International Workshop on Document Analysis Systems, pp. 97–104. Association for Computing Machinery, New York (2010)
Delaye, A., Liu, C.L.: Contextual text/non-text stroke classification in online handwritten notes with conditional random fields. Pattern Recogn. 47(3), 959–968 (2014)
Ye, J.Y., Zhang, Y.M., Liu, C.L.: Joint training of conditional random fields and neural networks for stroke classification in online handwritten document. In: International Conference on Pattern Recognition, pp. 3264–3269. IEEE (2016)
Indermühle, E., Frinken, V., Bunke, H.: Mode detection in online handwritten documents using BLSTM neural networks. In: International Conference on Frontiers in Handwriting Recognition, pp. 302–307. IEEE (2012)
Van Phan, T., Nakagawa, M.: Combination of global and local contexts for text/non-text classification in heterogeneous online handwritten documents. Pattern Recognit. 51, 112–124 (2016)
Watt, S.M., Underhill, T., Chee, Y.M., et al.: Ink markup language (InkML). W3C Proposed Recommendation, vol. 10 (2011)
Ye, J.Y., Zhang, Y.M., Yang, Q., et al.: Contextual stroke classification in online handwritten documents with graph attention networks. In: International Conference on Document Analysis and Recognition, pp. 993–998. IEEE, New York (2019)
Ye, J.Y., Zhang, Y.M., Yang, Q., et al.: Contextual stroke classification in online handwritten documents with edge graph attention networks. SN Comput. Sci. 1, 1–13 (2020)
Ye, J.Y., Zhang, Y.M., Yang, Q., et al.: Joint stroke classification and text line grouping in online handwritten documents with edge pooling attention networks. Pattern Recognit. 114, 107859 (2021)
Acknowledgement
This work has been supported by the National Key Research and Development Program under Grand No. 2020AAA0109700 and the National Natural Science Foundation of China (NSFC) under Grant No. 61773376.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, YT., Zhang, YM., Yun, XL., Yin, F., Liu, CL. (2022). CASIA-onDo: A New Database for Online Handwritten Document Analysis. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-02444-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02443-6
Online ISBN: 978-3-031-02444-3
eBook Packages: Computer ScienceComputer Science (R0)