skip to main content
10.1145/1815330.1815343acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdasConference Proceedingsconference-collections
research-article

IAMonDo-database: an online handwritten document database with non-uniform contents

Published: 09 June 2010 Publication History

Abstract

In this paper we present a new database of online handwritten documents with different contents such as text, drawings, diagrams, formulas, tables, lists, and markings. It was designed to serve as a standard dataset for the development, training, testing and comparison of methods in the field of handwritten document analysis. The database can serve as a basis for layout analysis, and different segmentation and recognition tasks considering online or just offline information. Its size is 1,000 documents produced by approximately 200 writers including a total of 329,849 online strokes. Few constraints were imposed on the writers when creating the documents. Nonetheless, the database has a stable distribution of the different content types. A software tool was developed to allow easy access to the documents which are stored in InkML. In this paper we also present two experiments which show the challenge this database poses. They may figure as references for further research in this area.

References

[1]
M. Agrawal, K. Bali, and S. Madhvanath. Upx: A new xml representation for annotated datasets of online handwriting data. In Proc. 8th Int. Conf. on Document Analysis and Recognition, pages 1161--1165, 2005.
[2]
C. M. Bishop, M. Svensen, and G. E. Hinton. Distinguishing text from graphics in on-line handwritten ink. In Proc. 9th Int. Workshop on Frontiers in Handwriting Recognition, pages 142--147, Washington, DC, USA, 2004. IEEE Computer Society.
[3]
H. Bunke. Recognition of cursive Roman handwriting-past, present and future. In Proc. 7th Int. Conf. on Document Analysis and Recognition, Edinburgh, pages 448--459. IEEE, 2003.
[4]
Y.-M. Chee, M. Froumentin, and S. Watt, editors. Ink markup language (InkML). World Wide Web Consortium, 2006. http://www.w3.org/TR/2006/WD-InkML-20061023.
[5]
W. Francis and H. Kucera. Manual of information to accompany a standard sample of present-day edited American English for use with digital computers. Department of Linguistics, Brown University, 1979.
[6]
I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet. UNIPEN project of on-line data exchange and recognizer benchmarks. In Proc. 12th Int. Conf. on Pattern Recognition, volume 2, pages 29--33 vol. 2, Oct 1994.
[7]
J. Hull. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550--554, 1994.
[8]
A. K. Jain, A. M. Namboodiri, and J. Subrahmonia. Structure in on-line documents. In Proc. 6th Int. Conf. on Document Analysis and Recognition, pages 844--848, 2001.
[9]
D. Keysers, F. Shafait, and T. M. Breuel. Document image zone classification - a simple high-performance approach. In Proc. 2nd Int. Conf. on Computer Vision Theory and Applications, pages 44--51, 2007.
[10]
M. Liwicki and H. Bunke. IAM-OnDB -- an on-line English sentence database acquired from handwritten text on a whiteboard. In Proc. 8th Int. Conf. on Document Analysis and Recognition, volume 2, pages 956--961, 2005.
[11]
J. Lladós, E. Valveny, G. Sánchez, and E. Martí. Symbol recognition: Current advances and perspectives. In Proc. 4th Int. Workshop on Graphics Recognition Algorithms and Applications, pages 104--127, London, UK, 2001. Springer-Verlag.
[12]
U.-V. Marti and H. Bunke. The IAM-database: an English sentence database for offline handwriting recognition. Int. Journal on Document Analysis and Recognition, 5:39--46, 2002.
[13]
M. Nakagawa and M. Onuma. On-line handwritten japanese text recognition free from constrains on line direction and character orientation. In Proc. 7th Int. Conf. on Document Analysis and Recognition, pages 519--523, Edinburgh, Scotland, 2003.
[14]
T. A. Nartker, S. V. Rice, and S. E. Lumos. Software tools and test data for research and testing of page-reading ocr systems. In In International Symposium on Electronic Imaging Science and Technology, volume 1, pages 37--47. SPIE, 2005.
[15]
I. Phillips, J. Ha, R. Haralick, and D. Dori. The implementation methodology for a cd-rom english document database. In Proc. 2nd Int. Conf. on Document Analysis and Recognition, pages 484--487, Oct 1993.
[16]
F. Shafait, D. Keysers, and T. M. Breuel. Pixel-accurate representation and evaluation of page segmentation in document images. In Proc. 18th Int. Conf. on Pattern Recognition, volume 1, pages 872--875, 2006.
[17]
C. Viard-Gaudin, P. M. Lallican, P. Binter, and S. Knerr. The ireste on/off (ironoff) dual handwriting database. In Proc. 5th Int. Conf. on Document Analysis and Recognition, pages 455--458, Los Alamitos, CA, USA, 1999. IEEE Computer Society.
[18]
R. Zanibbi, D. Blostein, and R. Cordy. A survey of table recognition: Models, observations, transformations, and inferences. Int. Journal of Document Analysis and Recognition, 7(1):1--16, 2004.

Cited By

View all
  • (2024)Inkeraction: An Interaction Modality Powered by Ink Recognition and SynthesisProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642498(1-26)Online publication date: 11-May-2024
  • (2024)Transformer-based stroke relation encoding for online handwriting and sketchesPattern Recognition10.1016/j.patcog.2023.110131148(110131)Online publication date: Apr-2024
  • (2024)DSANet: dilated spatial attention network for the detection of text, non-text and touching components in unconstrained handwritten documentsNeural Computing and Applications10.1007/s00521-024-10013-836:27(16959-16976)Online publication date: 4-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
June 2010
490 pages
ISBN:9781605587738
DOI:10.1145/1815330
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2010

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

DAS '10

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)3
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Inkeraction: An Interaction Modality Powered by Ink Recognition and SynthesisProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642498(1-26)Online publication date: 11-May-2024
  • (2024)Transformer-based stroke relation encoding for online handwriting and sketchesPattern Recognition10.1016/j.patcog.2023.110131148(110131)Online publication date: Apr-2024
  • (2024)DSANet: dilated spatial attention network for the detection of text, non-text and touching components in unconstrained handwritten documentsNeural Computing and Applications10.1007/s00521-024-10013-836:27(16959-16976)Online publication date: 4-Jun-2024
  • (2023)Streaming Stroke Classification of Online HandwritingICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095877(1-5)Online publication date: 4-Jun-2023
  • (2023)IAMonSense: multi-level handwriting classification using spatiotemporal informationInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-023-00433-y26:3(303-319)Online publication date: 8-Jun-2023
  • (2023)Document Region ClassificationDocument Layout Analysis10.1007/978-981-99-4277-0_4(43-65)Online publication date: 1-Aug-2023
  • (2023)DSS: Synthesizing Long Digital Ink Using Data Augmentation, Style Encoding and Split GenerationDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41685-9_14(217-235)Online publication date: 19-Aug-2023
  • (2023)A Shallow Graph Neural Network with Innovative Node Updating for Online Handwritten Stroke ClassificationDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41685-9_1(3-19)Online publication date: 19-Aug-2023
  • (2022)Advances in online handwritten recognition in the last decadesComputer Science Review10.1016/j.cosrev.2022.10051546:COnline publication date: 1-Nov-2022
  • (2022)CASIA-onDo: A New Database for Online Handwritten Document AnalysisPattern Recognition10.1007/978-3-031-02444-3_13(174-188)Online publication date: 10-May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media