Abstract
Conventional objects extraction method are not efficient for color document image with large graphics. For example, the projection profile and connected component based methods scanning the large graphics many times. To display the large graphics are extracted, conventional methods use rectangle to represent it. Thus, scanning into the large graphics is time-consuming. In this paper, a novel system for efficiently analyzing color documents is proposed to solve above mentioned problem. The proposed system includes color transformation, background color determination, objects extraction by top-down method, and objects classification without parameters. The proposed color document analysis system is efficient because it scans only background pixels such that the temporal complexity is O (NB), where NB is the total number of background color pixels. Results of this study demonstrate that this system is more effective and efficient than other methods. Moreover, the proposed algorithm can be run in an embedded environment (such as a mobile device) and processed in real-time system due to its simplicity and efficiency.
Similar content being viewed by others
References
Chen W.Y., Chen S.Y.: Adaptive page segmentation for color technical journal’s cover images. Image Vision Comput. 16, 855–877 (1998)
Sobottka K., Kronenberg H., Perroud T., Bunke H.: Text extraction from colored book and journal covers. Int. J. Document Anal. Recognit. 2, 163–176 (2000)
Hase H., Shinokawa T., Yoneda M., Suen C.Y.: Character string extraction from color documents. Pattern Recognit. 34, 1349–1365 (2001)
Strouthopoulos Papamarkos N., Atsalakis A.E.: Text extraction in complex color documents. Pattern Recognit. 35, 1743–1758 (2002)
Tsai C.M., Lee H.J.: Binarization of color document images via luminance and saturation color features. IEEE Trans. Image Process. 11(4), 434–451 (2002)
Tseng Y.H., Lee H.J.: Document image binarization by two-stage block extraction and background intensity determination. Pattern Anal. Appl. 11, 33–44 (2008)
Mao S., Kanungo T.: Software architecture of PSET: a page segmentation evaluation toolkit. IJDAR 4, 205–217 (2002)
Shafait, F., Keysers, D., Breuel, T.M.: Performance Comparison of Six Algorithms for Page Segmentation. In: Spitz, A.L., Bunke, H. (eds.) 7th IAPR Workshop on Document Analysis Systems (DAS), vol. 3872, pp. 368–379 (2006)
Wong K.Y., Casey R.G., Wahl F.M.: Document analysis system. IBM J. Res. Develop. 6, 647–656 (1982)
Wahl F.M., Wong K.Y., Casey R.G.: Block segmentation and text extraction in mixed text/image documents. Comput. Vis. Graph. Image Process. 20, 375–390 (1982)
Wang D., Shihari S.N.: Classification of newspaper image blocks using texture analysis. Comput. Vis. Graph. Image Process. 47, 327–352 (1989)
Nagy G., Seth S., Viswanathan M.: A prototype document image analysis system for technical journals. IEEE Comput. 25(7), 10–22 (1992)
Lee S.W., Ryu D.S.: Parameter-free geometric document layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1240–1256 (2001)
Fletcher L.A., Kasturi R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10, 910–918 (1988)
O’Gorman L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1162–1173 (1993)
Kise K., Sato A., Iwata M.: Segmentation of page images using the area voronoi diagram. Comput. Vis. Image Underst. 70(3), 370–382 (1998)
Lee K.H., Choy Y.C., Cho S.B.: Geometric structure analysis of document images: a knowledge-based approach. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1224–1240 (2000)
Jain K., Yu B.: Document representation and its application to page decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 294–308 (1998)
Otsu N.: A thresholding selection method from gray-scale histogram. IEEE Trans. Syst. Men Cybern. 9, 62–66 (1979)
Tsai, C.M., Lee, H.J.: (2002) Object extraction from color document images via background color determination and foreground boundary detection. In: Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium
ABBYY FineReader 9.0 Professional Edition. http://finereader.abbyy.com/
PenPower Technology Ltd.: http://www.penpower.net/index_tc.html
The National Palace Museum Monthly of Chinese Art, Taiwan National Palace Museum. http://www.npm.gov.tw/en/home.htm
Gonzalez R.C., Woods R.E.: Digit. Image Process, 2nd edn. Prentice Hall, Englewood Cliffs (2002)
Floyd T.L.: Electronic Devices, Conventional Current Version, 8th edn. Pearson Education, Upper Saddle River (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tsai, CM., Lee, HJ. Efficiently extracting and classifying objects for analyzing color documents. Machine Vision and Applications 22, 1–19 (2011). https://doi.org/10.1007/s00138-009-0215-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-009-0215-x