Abstract
Resorting to extraction text techniques for Chinese heritage documents becomes an increasing need. Historic documents such as Chinese calligraphy usually were handwritten or scanned in low contrast so that an automatic optical character recognition procedure for document images analysis is difficult to apply. In this paper, we present a historic document image threshold based on a combination of Bradley’s algorithm and K-means. An adaptive K-means cluster as a pre-processing methods for document image has been used for automatically grouping the pixels of a document image into different homogeneous regions. In Bradley’s methods, every image’s pixel is set to black if its brightness is T percent lower than the average brightness of surrounding pixels in the window of the specified size, otherwise it is set to white. Finally, text bounding boxes are generated by concatenating neighboring word clusters with mathematical morphology method. Experimental results show that this algorithm is robust in dealing with non-uniform illuminated, low contrast historic document images in terms of both accuracy and efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Gupta, M.R., Jacobson, N.P., Garcia, E.K.: OCR binarization and image pre-processing for searching historical documents. Pattern Recogn. 40(2), 389–397 (2007)
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000)
Yan, H.: Unified formulation of a class of image thresholding techniques. Pattern Recogn. 29(12), 2025–2032 (1996)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Otsu, N.: A threshold selection using gray level histograms. IEEE Trans. Syst. Man Cybernet. 9, 62–69 (1979)
Bradley, D., Roth, G.: Adaptive thresholding using the integral image. J. Graph. GPU Game Tools 12(2), 13–21 (2007)
Wellner, P.D.: Adaptive thresholding for the DigitalDesk. Xerox, EPC1993-110 (1993)
Pappas, T.N.: An adaptive clustering algorithm for image segmentation. IEEE Trans. Signal Process. 40(4), 901–914 (1992)
Chang, C.I., Du, Y., Wang, J., et al.: Survey and comparative analysis of entropy and relative entropy thresholding techniques. In: IEE Proceedings - Vision, Image and Signal Processing, IET, vol. 153(6), pp. 837–850 (2006)
Sezgin, M.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13(1), 146–168 (2004)
Huang, Z.K., Chau, K.W.: A new image thresholding method based on Gaussian mixture model. Appl. Math. Comput. 205(2), 899–907 (2008)
Hayes, B., Wilson, C.: A maximum entropy model of phonotactics and phonotactic learning. Linguist. Inq. 39(3), 379–440 (2008)
Mori, M., Sawaki, M., Yamato, J.: Robust character recognition using adaptive feature extraction. In: 23rd International Conference on Image and Vision Computing New Zealand, IVCNZ 2008, pp. 1–6. IEEE (2008)
http://vc.lib.harvard.edu/vc/deliver/home?_collection=rubbings
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 61472173), the grants from the Educational Commission of Jiangxi province of China, No. GJJ151134.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, ZK., Ma, YL., Lu, L., Rao, FX., Hou, LY. (2016). Chinese Historic Image Threshold Using Adaptive K-means Cluster and Bradley’s. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-42297-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42296-1
Online ISBN: 978-3-319-42297-8
eBook Packages: Computer ScienceComputer Science (R0)