Skip to main content
Log in

Efficiently extracting and classifying objects for analyzing color documents

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Conventional objects extraction method are not efficient for color document image with large graphics. For example, the projection profile and connected component based methods scanning the large graphics many times. To display the large graphics are extracted, conventional methods use rectangle to represent it. Thus, scanning into the large graphics is time-consuming. In this paper, a novel system for efficiently analyzing color documents is proposed to solve above mentioned problem. The proposed system includes color transformation, background color determination, objects extraction by top-down method, and objects classification without parameters. The proposed color document analysis system is efficient because it scans only background pixels such that the temporal complexity is O (NB), where NB is the total number of background color pixels. Results of this study demonstrate that this system is more effective and efficient than other methods. Moreover, the proposed algorithm can be run in an embedded environment (such as a mobile device) and processed in real-time system due to its simplicity and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen W.Y., Chen S.Y.: Adaptive page segmentation for color technical journal’s cover images. Image Vision Comput. 16, 855–877 (1998)

    Article  Google Scholar 

  2. Sobottka K., Kronenberg H., Perroud T., Bunke H.: Text extraction from colored book and journal covers. Int. J. Document Anal. Recognit. 2, 163–176 (2000)

    Google Scholar 

  3. Hase H., Shinokawa T., Yoneda M., Suen C.Y.: Character string extraction from color documents. Pattern Recognit. 34, 1349–1365 (2001)

    Article  MATH  Google Scholar 

  4. Strouthopoulos Papamarkos N., Atsalakis A.E.: Text extraction in complex color documents. Pattern Recognit. 35, 1743–1758 (2002)

    Article  MATH  Google Scholar 

  5. Tsai C.M., Lee H.J.: Binarization of color document images via luminance and saturation color features. IEEE Trans. Image Process. 11(4), 434–451 (2002)

    Article  Google Scholar 

  6. Tseng Y.H., Lee H.J.: Document image binarization by two-stage block extraction and background intensity determination. Pattern Anal. Appl. 11, 33–44 (2008)

    Article  MathSciNet  Google Scholar 

  7. Mao S., Kanungo T.: Software architecture of PSET: a page segmentation evaluation toolkit. IJDAR 4, 205–217 (2002)

    Article  Google Scholar 

  8. Shafait, F., Keysers, D., Breuel, T.M.: Performance Comparison of Six Algorithms for Page Segmentation. In: Spitz, A.L., Bunke, H. (eds.) 7th IAPR Workshop on Document Analysis Systems (DAS), vol. 3872, pp. 368–379 (2006)

  9. Wong K.Y., Casey R.G., Wahl F.M.: Document analysis system. IBM J. Res. Develop. 6, 647–656 (1982)

    Article  Google Scholar 

  10. Wahl F.M., Wong K.Y., Casey R.G.: Block segmentation and text extraction in mixed text/image documents. Comput. Vis. Graph. Image Process. 20, 375–390 (1982)

    Article  Google Scholar 

  11. Wang D., Shihari S.N.: Classification of newspaper image blocks using texture analysis. Comput. Vis. Graph. Image Process. 47, 327–352 (1989)

    Article  Google Scholar 

  12. Nagy G., Seth S., Viswanathan M.: A prototype document image analysis system for technical journals. IEEE Comput. 25(7), 10–22 (1992)

    Article  Google Scholar 

  13. Lee S.W., Ryu D.S.: Parameter-free geometric document layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1240–1256 (2001)

    Article  Google Scholar 

  14. Fletcher L.A., Kasturi R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10, 910–918 (1988)

    Article  Google Scholar 

  15. O’Gorman L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1162–1173 (1993)

    Article  Google Scholar 

  16. Kise K., Sato A., Iwata M.: Segmentation of page images using the area voronoi diagram. Comput. Vis. Image Underst. 70(3), 370–382 (1998)

    Article  Google Scholar 

  17. Lee K.H., Choy Y.C., Cho S.B.: Geometric structure analysis of document images: a knowledge-based approach. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1224–1240 (2000)

    Article  Google Scholar 

  18. Jain K., Yu B.: Document representation and its application to page decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 294–308 (1998)

    Article  Google Scholar 

  19. Otsu N.: A thresholding selection method from gray-scale histogram. IEEE Trans. Syst. Men Cybern. 9, 62–66 (1979)

    Article  Google Scholar 

  20. Tsai, C.M., Lee, H.J.: (2002) Object extraction from color document images via background color determination and foreground boundary detection. In: Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium

  21. ABBYY FineReader 9.0 Professional Edition. http://finereader.abbyy.com/

  22. PenPower Technology Ltd.: http://www.penpower.net/index_tc.html

  23. The National Palace Museum Monthly of Chinese Art, Taiwan National Palace Museum. http://www.npm.gov.tw/en/home.htm

  24. Gonzalez R.C., Woods R.E.: Digit. Image Process, 2nd edn. Prentice Hall, Englewood Cliffs (2002)

    Google Scholar 

  25. Floyd T.L.: Electronic Devices, Conventional Current Version, 8th edn. Pearson Education, Upper Saddle River (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun-Ming Tsai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsai, CM., Lee, HJ. Efficiently extracting and classifying objects for analyzing color documents. Machine Vision and Applications 22, 1–19 (2011). https://doi.org/10.1007/s00138-009-0215-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-009-0215-x

Keywords

Navigation