skip to main content
10.1145/1364742.1364758acmconferencesArticle/Chapter ViewAbstractPublication PagesiwridlConference Proceedingsconference-collections
research-article

Document image analysis for digital libraries

Published:12 December 2006Publication History

ABSTRACT

Digital Libraries have many forms -- institutional libraries for information dissemination, document repositories for record-keeping, and personal digital libraries for organizing personal thoughts, knowledge, and course of action. Digital image content (scanned or otherwise) is a substantial component of all of these libraries. Processing and analyzing these images include tasks such as document layout understanding, character recognition, functional role labeling, image enhancement, indexing, organizing, restructuring, summarizing, cross linking, redaction, privacy management, and distribution.

At the Palo Alto Research Center, we conduct research on several aspects of document analysis for Digital Libraries ranging from raw image transformations to linguistic analysis to interactive sensemaking tools. I shall describe a few recent research activities in the realm of document image analysis or their use in digital libraries.

References

  1. List of digital library projects. http://en.wikipedia.org/wiki/List_of_digital_library_projects.Google ScholarGoogle Scholar
  2. Steven C. Bagley and Gary E. Kopec. Editing images of text. Communications of the ACM, 37(2):63--72, December 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Henry S. Baird. Document image defect models. In H. Bunke H. S. Baird and K. Yamamoto, editors, Structured Document Image Analysis, pages 546--556. Springer-Verlag, New York, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  4. Thomas M. Breuel, William C. Janssen, Kris Popat, and Henry S. Baird. Paper to PDA. In ICPR '02: Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 1, page 10476, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jindong Chen and Yizhou Wang. Exploiting Fisher kernels in decoding severely noisy document images. Personal communication, Palo Alto Research Center, 2006.Google ScholarGoogle Scholar
  6. Darrin L. Dimmick, Michael D. Garris, and Charles L. Wilson. Structured Forms Database. Technical Report Technical Report Special Database 2, SFRS, National Institutte of Standards and Technology, December 1991.Google ScholarGoogle Scholar
  7. Michael D. Garris and Darrin L. Dimmick. Form design for high accuracy Optical Character Recognition. IEEE Trans. PAMI, 18(6):653--656, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. D. Holub, M. Welling, and P. Perona. Combining generative models and Fisher kernels for object recognition. In Proceedings of the International Conference on Computer Vision (ICCV), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 10, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. William C. Janssen. Collaborative extensions for the UpLib system. In JCDL 2004: Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries, pages 239--240, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. William C. Janssen. Document icons and page thumbnails: Issues in construction of document thumbnails for page-image digital libraries. In ECDL 2004: Proceedings of the Eighth European Conference on Digital Libraries, pages 111--121, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. William C. Janssen and Kris Popat. UpLib: A universal personal digital library system. In DocEng 2003: Proceedings of the ACM symposium on Document Engineering, pages 234--242. ACM Press, November 2003. Google ScholarGoogle Scholar
  13. E. T. Jaynes. Probability Theory: The Logic of Science. Cambridge University Press, 2003.Google ScholarGoogle Scholar
  14. G. Kopec and P. Chou. Document image decoding using Markov source models. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-16:602--617, June 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gary E. Kopec. Multilevel character templates for document image decoding. In L. Vincent and J. Hull, editors, Document Recognition IV: Proc. SPIE, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  16. Gary E. Kopec and Anthony C. Kam. Separable source models for Document Image Decoding. In Luc M. Vincent and Henry S. Baird, editors, Proceedings of the International Society for Optical Engineering (SPIE): Document Recognition II, pages 84--97, San Jose, CA, 1995.Google ScholarGoogle Scholar
  17. G. Nagy, M. Krishnamoorthy, S. Seth, and M. Viswanathan. Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Trans. PAMI, 15(7):737--747, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. George Nagy and Prateek Sarkar. Document style census for OCR. In Proceedings of the First International Workshop on Document Image Analysis for Libraries, pages 134--147, Palo Alto, California, January 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tomohiro Nakai, Koichi Kise, and Masakazu Iwamura. Use of affine invariants in locally likely arrangement hashing for camera-based document image retrieval. In Proceedings of the 7th IAPR Workshop on Document Analysis Systems, Nelson, New Zealand, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Toni M. Rath, R. Manmatha, and Victor Lavrenko. A search engine for historical manuscript images. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 369--376, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Sarkar and H. S. Baird. Decoder banks: Versatility, automation, and high accuracy without supervised training. In Proceedings of the 17th International Conference on Pattern Recognition, pages 646--649, Cambridge, U.K., 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Sarkar and G. Nagy. Style consistent classification of isogenous patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1):88--98, January 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Prateek Sarkar. An iterative algorithm for optimal style conscious field classification. In Proceedings of the 16th International Conference on Pattern Recognition, pages 243--246, Quebec City, Canada, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Prateek Sarkar. Image classification: Classifying distributions of visual features. In Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Prateek Sarkar, Henry S. Baird, and John Henderson. Triage of OCR output using 'confidence' scores. In Proceedings of SPIE/IS&T 2002 Document Recognition & Retrieval IX Conf. (DR&R IX), San Jose, California, USA, January 20--25 2002.Google ScholarGoogle Scholar
  26. Prateek Sarkar, Henry S. Baird, and Xiaohu Zhang. Training on severely degraded text-line images. In Proceedings of the Seventh ICDAR, pages 38--43, Edinburgh, Scotland, August 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Prateek Sarkar and Eric Saund. Perceptual organization in semantic role labeling. In Proceedings of 2005 Symposium on Document Image Understanding Technology, College Park, Maryland, USA, November 2005.Google ScholarGoogle Scholar
  28. Eric Saund. Logic and MRF circuitry for labeling occluding and thinline visual contours. In Advances in Neural Information Processing Systems 18, 2005.Google ScholarGoogle Scholar
  29. Eric Saund, David Fleet, Daniel Larner, and James Mahoney. Perceptually-supported image editing of text and graphics. ACM Trans. Graph., 23(3):728--728, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Michael Shilman, Percy Liang, and Paul Viola. Learning non-generative grammatical models for document analysis. In ICCV '05: Proceedings of the Tenth IEEE International Conference on Computer Vision, pages 962--969, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kazem Taghva, Julie Borsack, Allen Condit, and Srinivas Erva. The effects of noisy data on text retrieval. Journal of the American Society for Information Science, 45(1):50--58, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Veeramachaneni and G. Nagy. Style context with second order statistics. IEEE Trans. PAMI, 27(1):14--22, January 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Paul A. Viola and Michael J. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. CVPR, pages 511--518, 2001.Google ScholarGoogle Scholar

Index Terms

  1. Document image analysis for digital libraries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader