Chinese Historic Image Threshold Using Adaptive K-means Cluster and Bradley’s

Huang, Zhi-Kai; Ma, Yong-Li; Lu, Li; Rao, Fan-Xing; Hou, Ling-Ying

doi:10.1007/978-3-319-42297-8_17

Chinese Historic Image Threshold Using Adaptive K-means Cluster and Bradley’s

Zhi-Kai Huang¹⁶,
Yong-Li Ma¹⁶,
Li Lu¹⁶,
Fan-Xing Rao¹⁶ &
…
Ling-Ying Hou¹⁶

Conference paper
First Online: 12 July 2016

3027 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9773))

Abstract

Resorting to extraction text techniques for Chinese heritage documents becomes an increasing need. Historic documents such as Chinese calligraphy usually were handwritten or scanned in low contrast so that an automatic optical character recognition procedure for document images analysis is difficult to apply. In this paper, we present a historic document image threshold based on a combination of Bradley’s algorithm and K-means. An adaptive K-means cluster as a pre-processing methods for document image has been used for automatically grouping the pixels of a document image into different homogeneous regions. In Bradley’s methods, every image’s pixel is set to black if its brightness is T percent lower than the average brightness of surrounding pixels in the window of the specified size, otherwise it is set to white. Finally, text bounding boxes are generated by concatenating neighboring word clusters with mathematical morphology method. Experimental results show that this algorithm is robust in dealing with non-uniform illuminated, low contrast historic document images in terms of both accuracy and efficiency.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Gupta, M.R., Jacobson, N.P., Garcia, E.K.: OCR binarization and image pre-processing for searching historical documents. Pattern Recogn. 40(2), 389–397 (2007)
Article MATH Google Scholar
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000)
Article Google Scholar
Yan, H.: Unified formulation of a class of image thresholding techniques. Pattern Recogn. 29(12), 2025–2032 (1996)
Article Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Otsu, N.: A threshold selection using gray level histograms. IEEE Trans. Syst. Man Cybernet. 9, 62–69 (1979)
Article Google Scholar
Bradley, D., Roth, G.: Adaptive thresholding using the integral image. J. Graph. GPU Game Tools 12(2), 13–21 (2007)
Article Google Scholar
Wellner, P.D.: Adaptive thresholding for the DigitalDesk. Xerox, EPC1993-110 (1993)
Google Scholar
Pappas, T.N.: An adaptive clustering algorithm for image segmentation. IEEE Trans. Signal Process. 40(4), 901–914 (1992)
Article Google Scholar
Chang, C.I., Du, Y., Wang, J., et al.: Survey and comparative analysis of entropy and relative entropy thresholding techniques. In: IEE Proceedings - Vision, Image and Signal Processing, IET, vol. 153(6), pp. 837–850 (2006)
Google Scholar
Sezgin, M.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13(1), 146–168 (2004)
Article MathSciNet Google Scholar
Huang, Z.K., Chau, K.W.: A new image thresholding method based on Gaussian mixture model. Appl. Math. Comput. 205(2), 899–907 (2008)
MathSciNet MATH Google Scholar
Hayes, B., Wilson, C.: A maximum entropy model of phonotactics and phonotactic learning. Linguist. Inq. 39(3), 379–440 (2008)
Article Google Scholar
Mori, M., Sawaki, M., Yamato, J.: Robust character recognition using adaptive feature extraction. In: 23rd International Conference on Image and Vision Computing New Zealand, IVCNZ 2008, pp. 1–6. IEEE (2008)
Google Scholar
http://www.lib.berkeley.edu/EAL/stone/rubbings.html
http://vc.lib.harvard.edu/vc/deliver/home?_collection=rubbings

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61472173), the grants from the Educational Commission of Jiangxi province of China, No. GJJ151134.

Author information

Authors and Affiliations

College of Mechanical and Electrical Engineering, Nanchang Institute of Technology, Nanchang, 330099, Jiangxi, China
Zhi-Kai Huang, Yong-Li Ma, Li Lu, Fan-Xing Rao & Ling-Ying Hou

Authors

Zhi-Kai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Li Ma
View author publications
You can also search for this author in PubMed Google Scholar
Li Lu
View author publications
You can also search for this author in PubMed Google Scholar
Fan-Xing Rao
View author publications
You can also search for this author in PubMed Google Scholar
Ling-Ying Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi-Kai Huang .

Editor information

Editors and Affiliations

Tongji University , Shanghai, China
De-Shuang Huang
Inha University , Incheon, Korea (Republic of)
Kyungsook Han
Liverpool John Moores University , Liverpool, United Kingdom
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, ZK., Ma, YL., Lu, L., Rao, FX., Hou, LY. (2016). Chinese Historic Image Threshold Using Adaptive K-means Cluster and Bradley’s. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-42297-8_17
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42296-1
Online ISBN: 978-3-319-42297-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics