Abstract
In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles. Aiming at a robust and adaptive DLA approach, the authors first manage to find a set of representative filters and statistics to characterize typical texture patterns in document images, which is through a visual similarity testing process. Texture features are then extracted from these filters and passed into a dynamic clustering procedure, which is called visual similarity clustering. Finally, text contents are located from the clustered results. Benefit from this scheme, the algorithm demonstrates strong robustness and adaptability in a wide variety of documents, which previous traditional DLA approaches do not possess.
Similar content being viewed by others
References
Wong K Y, Casey R G, Wahl F M. Document analysis system. IBM Journal Res. Develop, 1982, 26(6): 647–656.
Nagy G, Seth S, Viswanathan M. A prototype document image analysis system for technical journals. IEEE Computer, 1992, 25(7): 10–22.
Drivas D, Amin A. Page segmentation and classification utilizing bottom-up approach. In Proc. the Third International Conference on Document Analysis and Recognition, Montreal, Aug. 14–16, 1995, pp.610–614.
Simon A, Pret J, Johnson A. A fast algorithm for bottom-up document layout analysis. IEEE Trans. Pattern Analysis and Machine Intelligence, 1997, 19(3): 273–276.
Jain A K, Zhong Y. Page segmentation using texture analysis. Pattern Recognition, 1996, 29(5): 743–770.
Jain A K, Bhattacharjee S. Text segment using Gabor filters for automatic document processing. Machine Vision and Applications, 1992, 5(3): 169–184.
Lee S-W, Ryu D-S. Parameter-free geometric document layout analysis. IEEE Trans. Pattern Analysis and Machine Intelligence, 2001, 23(11): 1240–1251.
Li J, Gray R M. Context-based multiscale classification of document images using wavelet coefficient distributions. IEEE Trans. Image Processing, 2000, 9(9): 1604–1616.
Wu V, Manmatha R, Riseman E M. TextFinder: An automatic system to detect and recognize text in images. IEEE Trans. Pattern Analysis and Machine Intelligence, 1999, 21(11): 1224–1229.
Julesz B. Visual pattern discrimination. IRE Trans. Information Theory, 1962, (IT-8): 84–92.
Zhu S C, Wu Y N, Mumford D. Minimax entropy principle and its application to texture modeling. Neural Computation, 1997, 9(8): 1627–1660.
Liu X, Wang D. Texture classification using spectral histograms. IEEE Trans. Image Processing, 2003, 12(6): 661–670.
Gabor D. Theory of communication. J. IEE., 1946, 93(26): 429–457.
Hubel D H, Wiesel T N. Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 1968, 195: 215–243.
Zhu S C, Wu Y N, Mumford D B. FRAME: Filters, random fields and maximum entropy — Towards a unified theory for texture modeling. International Journal of Computer Vision, 1998, 27(3): 1–20.
Zhu S C, Liu X W, Wu Y N. Exploring texture ensembles by efficient Markov chain Monte Carlo — Toward a “trichromacy” theory of texture. IEEE Trans. Pattern Analysis and Machine Intelligence, 2000, 22(6): 554–569.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the National Natural Science Foundation of China under Grant No. 60472002.
Rights and permissions
About this article
Cite this article
Wen, D., Ding, XQ. Visual Similarity Based Document Layout Analysis. J Comput Sci Technol 21, 459–465 (2006). https://doi.org/10.1007/s11390-006-0459-0
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/s11390-006-0459-0