skip to main content
10.1145/1577802.1577822acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmocrConference Proceedingsconference-collections
research-article

Managing multilingual OCR project using XML

Published:25 July 2009Publication History

ABSTRACT

This paper presents an XML-based scheme for managing a large multilingual OCR project. In particular we describe how a new XML based tagging scheme has been exploited to achieve the objectives of the project. Managing a large multi-lingual OCR project involving multiple research groups, developing script specific and script independent technologies in a collaborative fashion is a challenging problem. In this paper, we present some of the software and data management strategies designed for the project aimed at developing OCR for 11 scripts of Indian origin for which mature OCR technology was not available.

References

  1. A. Bhaskarbhatla, S. Madhavanath, M. Pavan Kumar, A. Balasubramanian and C. V. Jawahar. Representation and Annotation of Online Handwritten Data. In Proc. of 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR), pages 136--141, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Agrawal, K. Bali, S. Madhvanath, and L. Vuurpijl. Upx: a new xml representation for annotated datasets of online handwriting data. In Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on, pages 1161--1165 Vol. 2, Aug.-1 Sept. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Breuel and U. Kaiserslautern. The hocr microformat for ocr workflow and results. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, volume 2, pages 1063--1067, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. V. Jawahar, Anand Kumar, A. Phaneendra, K. J. Jinesh. Building Data Sets for Indian Language OCR Research. Springer Series in Advances in Pattern Recognition, 2009.Google ScholarGoogle Scholar
  5. C. V. Jawahar and Anand Kumar. Content Level Annotation of Large Collection of Printed Document Images. In Proc. of International Conference on Document Analysis and Recognition (ICDAR), pages 799--803, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Ghosh, G. Harit, and S. Chaudhury. Ontology based interaction with multimedia collections. ICDL'06, International Conference on Digital Library, 2006.Google ScholarGoogle Scholar
  7. I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet. Unipen project of on-line data exchange and recognizer benchmarks. In Pattern Recognition, 1994. Vol. 2 -- Conference B: Computer Vision and Image Processing., Proceedings of the 12th IAPR International. Conference on, volume 2, pages 29--33 vol. 2, Oct 1994.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. W. Houlding. Xml -- an opportunity for <meaningful> data standards in the geosciences. Computers & Geosciences, 27(7):839--849, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. International Unipen foundation. The unipen project. http://www.unipen.org, 1994.Google ScholarGoogle Scholar
  10. A. Lear. Xml seen as integral to application integration. IT Professional, 1(5):12--16, Sep/Oct 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Mallik, P. Pasumarthi, and S. Chaudhury. Multimedia ontology learning for automatic annotation and video browsing. In MIR '08: Proceeding of the 1st ACM international conference on Multimedia information retrieval, pages 387--394, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W3C Mullti-modal Interaction Working Group. Ink markup language (inkml). http://www.w3.org/2002/mmi/ink, 2003.Google ScholarGoogle Scholar
  13. W3C Web Ontology Working Group. Web Ontological Language (OWL). http://www.w3.org/TR/owl-guide/, 2004.Google ScholarGoogle Scholar
  14. S. Wrede, J. Fritsch, C. Bauckhage, and G. Sagerer. An xml based framework for cognitive vision architectures. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 1, pages 757--760 Vol. 1, Aug. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Yoon and S. Kim. Schema extraction for multimedia xml document retrieval. In Web Information Systems Engineering, 2000. Proceedings of the First International Conference on, volume 2, pages 113--120 vol. 2, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Managing multilingual OCR project using XML

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        MOCR '09: Proceedings of the International Workshop on Multilingual OCR
        July 2009
        139 pages
        ISBN:9781605586984
        DOI:10.1145/1577802

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate17of34submissions,50%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader