Efficiently extracting and classifying objects for analyzing color documents

Tsai, Chun-Ming; Lee, Hsi-Jian

doi:10.1007/s00138-009-0215-x

Efficiently extracting and classifying objects for analyzing color documents

Original Paper
Published: 01 September 2009

Volume 22, pages 1–19, (2011)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Chun-Ming Tsai¹ &
Hsi-Jian Lee²

155 Accesses
6 Citations
Explore all metrics

Abstract

Conventional objects extraction method are not efficient for color document image with large graphics. For example, the projection profile and connected component based methods scanning the large graphics many times. To display the large graphics are extracted, conventional methods use rectangle to represent it. Thus, scanning into the large graphics is time-consuming. In this paper, a novel system for efficiently analyzing color documents is proposed to solve above mentioned problem. The proposed system includes color transformation, background color determination, objects extraction by top-down method, and objects classification without parameters. The proposed color document analysis system is efficient because it scans only background pixels such that the temporal complexity is O (NB), where NB is the total number of background color pixels. Results of this study demonstrate that this system is more effective and efficient than other methods. Moreover, the proposed algorithm can be run in an embedded environment (such as a mobile device) and processed in real-time system due to its simplicity and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Chen W.Y., Chen S.Y.: Adaptive page segmentation for color technical journal’s cover images. Image Vision Comput. 16, 855–877 (1998)
Article Google Scholar
Sobottka K., Kronenberg H., Perroud T., Bunke H.: Text extraction from colored book and journal covers. Int. J. Document Anal. Recognit. 2, 163–176 (2000)
Google Scholar
Hase H., Shinokawa T., Yoneda M., Suen C.Y.: Character string extraction from color documents. Pattern Recognit. 34, 1349–1365 (2001)
Article MATH Google Scholar
Strouthopoulos Papamarkos N., Atsalakis A.E.: Text extraction in complex color documents. Pattern Recognit. 35, 1743–1758 (2002)
Article MATH Google Scholar
Tsai C.M., Lee H.J.: Binarization of color document images via luminance and saturation color features. IEEE Trans. Image Process. 11(4), 434–451 (2002)
Article Google Scholar
Tseng Y.H., Lee H.J.: Document image binarization by two-stage block extraction and background intensity determination. Pattern Anal. Appl. 11, 33–44 (2008)
Article MathSciNet Google Scholar
Mao S., Kanungo T.: Software architecture of PSET: a page segmentation evaluation toolkit. IJDAR 4, 205–217 (2002)
Article Google Scholar
Shafait, F., Keysers, D., Breuel, T.M.: Performance Comparison of Six Algorithms for Page Segmentation. In: Spitz, A.L., Bunke, H. (eds.) 7th IAPR Workshop on Document Analysis Systems (DAS), vol. 3872, pp. 368–379 (2006)
Wong K.Y., Casey R.G., Wahl F.M.: Document analysis system. IBM J. Res. Develop. 6, 647–656 (1982)
Article Google Scholar
Wahl F.M., Wong K.Y., Casey R.G.: Block segmentation and text extraction in mixed text/image documents. Comput. Vis. Graph. Image Process. 20, 375–390 (1982)
Article Google Scholar
Wang D., Shihari S.N.: Classification of newspaper image blocks using texture analysis. Comput. Vis. Graph. Image Process. 47, 327–352 (1989)
Article Google Scholar
Nagy G., Seth S., Viswanathan M.: A prototype document image analysis system for technical journals. IEEE Comput. 25(7), 10–22 (1992)
Article Google Scholar
Lee S.W., Ryu D.S.: Parameter-free geometric document layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1240–1256 (2001)
Article Google Scholar
Fletcher L.A., Kasturi R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10, 910–918 (1988)
Article Google Scholar
O’Gorman L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1162–1173 (1993)
Article Google Scholar
Kise K., Sato A., Iwata M.: Segmentation of page images using the area voronoi diagram. Comput. Vis. Image Underst. 70(3), 370–382 (1998)
Article Google Scholar
Lee K.H., Choy Y.C., Cho S.B.: Geometric structure analysis of document images: a knowledge-based approach. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1224–1240 (2000)
Article Google Scholar
Jain K., Yu B.: Document representation and its application to page decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 294–308 (1998)
Article Google Scholar
Otsu N.: A thresholding selection method from gray-scale histogram. IEEE Trans. Syst. Men Cybern. 9, 62–66 (1979)
Article Google Scholar
Tsai, C.M., Lee, H.J.: (2002) Object extraction from color document images via background color determination and foreground boundary detection. In: Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium
ABBYY FineReader 9.0 Professional Edition. http://finereader.abbyy.com/
PenPower Technology Ltd.: http://www.penpower.net/index_tc.html
The National Palace Museum Monthly of Chinese Art, Taiwan National Palace Museum. http://www.npm.gov.tw/en/home.htm
Gonzalez R.C., Woods R.E.: Digit. Image Process, 2nd edn. Prentice Hall, Englewood Cliffs (2002)
Google Scholar
Floyd T.L.: Electronic Devices, Conventional Current Version, 8th edn. Pearson Education, Upper Saddle River (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Taipei Municipal University of Education, No. 1, Ai-Kuo W. Road, Taipei, 100, Taiwan
Chun-Ming Tsai
Department of Medical Informatics, Tzu Chi University, Hualien, 970, Taiwan
Hsi-Jian Lee

Authors

Chun-Ming Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Hsi-Jian Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-Ming Tsai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsai, CM., Lee, HJ. Efficiently extracting and classifying objects for analyzing color documents. Machine Vision and Applications 22, 1–19 (2011). https://doi.org/10.1007/s00138-009-0215-x

Download citation

Received: 07 March 2008
Revised: 12 March 2009
Accepted: 20 July 2009
Published: 01 September 2009
Issue Date: January 2011
DOI: https://doi.org/10.1007/s00138-009-0215-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficiently extracting and classifying objects for analyzing color documents

Abstract

Access this article

Similar content being viewed by others

Fast I2SDBSCAN Based on Integral Volume of 3D Histogram: Application to Color Layer Separation in Document Images

A Hybrid Approach to Document Layout Analysis for Heterogeneous Document Images

Document Image Binarization Using Visibility Detection and Point Cloud Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficiently extracting and classifying objects for analyzing color documents

Abstract

Access this article

Similar content being viewed by others

Fast I2SDBSCAN Based on Integral Volume of 3D Histogram: Application to Color Layer Separation in Document Images

A Hybrid Approach to Document Layout Analysis for Heterogeneous Document Images

Document Image Binarization Using Visibility Detection and Point Cloud Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation