skip to main content
10.1145/1646396.1646426acmconferencesArticle/Chapter ViewAbstractPublication PagescivrConference Proceedingsconference-collections
poster

Picture extraction from digitized historical manuscripts

Published:08 July 2009Publication History

ABSTRACT

In this work we propose a system for automatic document segmentation to extract graphical elements from historical manuscripts and then to identify significant pictures from them, removing floral and abstract decorations. The system performs a block based analysis by means of color and texture features. The Gradient Spatial Dependency Matrix, a new texture operator particularly effective for this task, is proposed. The feature vectors are processed by an embedding procedure which allows increased performance in later SVM classification. Results for both feature extraction and embedding based classification are reported, supporting the effectiveness of the proposal.

References

  1. J. Bourgain. On lipschitz embedding of finite metric spaces in Hilbert space. Israel Journal of Mathematics, 52(1): 46--52, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  2. N. Chen and D. Blostein. A survey of document image classification: problem statement, classifier architecture and performance evaluation. International Journal on Document Analysis and Recognition, 10(1): 1--16, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Diligenti, P. Frasconi, and M. Gori. Hidden Tree Markov Models for Document Image Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4): 519--523, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Faloutsos and K. Lin. FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 163--174. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Fataicha, M. Cheriet, J. Nie, and C. Suen. Content Analysis in Document Images: A Scale Space Approach. In Proceedings of the International Conference on Pattern Recognition, volume 3, pages 335--338. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Grana, D. Borghesani, S. Calderara, and R. Cucchiara. "inside the bible": Segmentation, annotation and retrieval for a new browsing experience. In ACM International Conference on Multimedia Information Retrieval, pages 379--386, Vancouver, Canada, Oct. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Grana, D. Borghesani, and R. Cucchiara. Describing Texture Directions with Von Mises Distributions. In Proceedings of the 19th International Conference on Pattern Recognition, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  8. C. Grana, R. Vezzani, and R. Cucchiara. Enhancing HSV Histograms with Achromatic Points Detection for Video Retrieval. In Proceedings of ACM International Conference on Image and Video Retrieval, pages 302--308, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Haralick, R. M. and Shanmugam, K. and Dinstein, I. Textural features for image classification. IEEE Transactions on Systems, Man and Cybernetics, 3(6): 610--621, 1973.Google ScholarGoogle Scholar
  10. G. Hjaltason and H. Samet. Properties of Embedding Methods for Similarity Searching in Metric Spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5): 530--549, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Hristescu and M. Farach. Cluster-preserving Embedding of Proteins. Technical report, Center for Discrete Mathematics and Theoretical Computer Science, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Hu, R. Kashi, and R. Wilfong. Document Classification Using Layout Analysis. In Proceedings of the International Workshop on Database and Expert Systems Applications, pages 556--560. IEEE Computer Society, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Jain and R. Dubes. Algorithms for clustering data. Prentice-Hall, Inc., 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning, pages 137--142. Springer Verlag, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Journet, J. Ramel, R. Mullot, and V. Eglin. Document image characterization using a multiresolution analysis of the texture: application to old documents. International Journal of Document Analysis and Recognition, 11(1): 9--18, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Kavallieratou. A Binarization Algorithm specialized on Document Images and Photos. In Proceedings of the 8th International Conference on Document Analysis and Recognition, pages 463--467. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Kitamoto, M. Onishi, T. Ikezaki, D. Deuff, E. Meyer, S. Sato, T. Muramatsu, R. Kamida, T. Yamamoto, and K. Ono. Digital Bleaching and Content Extraction for the Digital Archive of Rare Books. In Proceedings of the International Conference on Document Image Analysis for Libraries, pages 133--144. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Konidaris, B. Gatos, K. Ntzios, I. Pratikakis, S. Theodoridis, and S. Perantonis. Keyword-guided word spotting in historical printed documents using synthetic data and user feedback. International Journal on Document Analysis and Recognition, 9(2--4): 167--177, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Le Bourgeois and H. Emptoz. DEBORA: Digital accEss to BOoks of the RenAissance. International Journal of Document Analysis and Recognition, 9(2): 193--221, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Le Bourgeois, E. Trinh, B. Allier, V. Eglin, and H. Emptoz. Document Images Analysis Solutions for Digital libraries. In Proceedings of the International Workshop on Document Image Analysis for Libraries, pages 2--24. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Meng, N. Zheng, Y. Song, and Y. Zhang. Document Images Retrieval Based on Multiple Features Combination. In Proceedings of the International Conference on Document Analysis and Recognition, volume 1, pages 143--147. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Nagy. Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1): 38--62, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Nicolas, J. Dardenne, T. Paquet, and L. Heutte. Document Image Segmentation Using a 2D Conditional Random Field Model. In Proceedings of the International Conference on Document Analysis and Recognition, volume 1, pages 407--411, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Ogier and K. Tombre. Madonne: Document Image Analysis Techniques for Cultural Heritage Documents. In Digital Cultural Heritage, Proceedings of 1st EVA Conference, pages 107--114. Oesterreichische Computer Gesellschaft, 2006.Google ScholarGoogle Scholar
  25. J. Ramel, S. Busson, and M. Demonet. AGORA: the interactive document image analysis tool of the BVH project. In Proceedings of the International Conference on Document Image Analysis for Libraries, pages 145--155, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Wang, J. Wang, K. Lin, D. Shasha, B. Shapiro, and K. Zhang. An index structure for data mining and clustering. Knowledge and Information Systems, 2: 161--184, 2000.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Picture extraction from digitized historical manuscripts

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIVR '09: Proceedings of the ACM International Conference on Image and Video Retrieval
          July 2009
          383 pages
          ISBN:9781605584805
          DOI:10.1145/1646396

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 July 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader