Abstract
The paper presents a script classification method of the medieval documents originated from the Balkan region. It consists in a multi-step procedure which includes the text mapping according to typographical features, creation of equivalent image patterns, run-length pattern analysis in order to establish a feature vector and state-of-the art classification method Genetic Algorithms Image Clustering for Document Analysis (GA-ICDA) which successfully disseminates the documents written in different scripts. The proposed method is evaluated on custom oriented document databases, which include the handprinted or printed documents written in old Cyrillic, angular and round Glagolitic, ancient Latin and Greek scripts. The experiment demonstrates very good results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ghosh, D., Dube, T., Shivaprasad, A.: Script recognition - a review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)
Joshi, G.D., Garg, S., Sivaswamy, J.: A generalised framework for script identification. Int. J. Doc. Anal. Recogn. 10(2), 55–68 (2007)
Brodić, D., Milivojević, Z.N., Maluckov, Č.A.: An approach to the script discrimination in the Slavic documents. Soft Comput. 19(9), 2655–2665 (2015). doi:10.1007/s00500-014-1435-1
Brodić, D., Maluckov, Č.A., Milivojević, Z.N., Draganov, I.R.: Differentiation of the script using adjacent local binary patterns. In: Agre, G., Hitzler, P., Krisnadhi, A.A., Kuznetsov, S.O. (eds.) AIMSA 2014. LNCS, vol. 8722, pp. 162–169. Springer, Heidelberg (2014)
Zramdini, A.W., Ingold, R.: Optical font recognition using typographical features. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 877–882 (1998)
Galloway, M.M.: Texture analysis using gray level run lengths. Comput. Graph. Image Process. 4(2), 172–179 (1975)
Chu, A., Sehgal, C.M., Greenleaf, J.F.: Use of gray value distribution of run lengths for texture analysis. Pattern Recogn. Lett. 11(6), 415–419 (1990)
Dasarathy, B.R., Holder, E.B.: Image characterizations based on joint gray-level run-length distributions. Pattern Recogn. Lett. 12(8), 497–502 (1991)
Brodić, D., Amelio, A., Milivojević, Z.N.: Characterization and distinction between closely related south Slavic languages on the example of Serbian and Croatian. In: Azzopardi, G., Petkov, N., Yamagiwa, S. (eds.) CAIP 2015. LNCS, vol. 9256, pp. 654–666. Springer, Heidelberg (2015)
Amelio, A., Pizzuti, C.: A new evolutionary-based clustering framework for image databases. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2014. LNCS, vol. 8509, pp. 322–331. Springer, Heidelberg (2014)
Marti, R., Laguna, M., Glover, F., Campos, V.: Reducing the bandwidth of a sparse matrix with tabu search. Eur. J. Oper. Res. 135(2), 450–280 (2001)
Marinai, S., Marino, E., Soda, G.: Self-organizing maps for clustering in document image analysis, machine learning in document analysis and recognition. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition. LNCS (SCI), vol. 90, pp. 193–219. Springer, Heidelberg (2008)
Pu, Y., Shi, J., Guo, L.: A hierarchical method for clustering binary text image. In: Yuan, Y., Wu, X., Lu, Y. (eds.) ISCTCS 2012. CCIS, vol. 320, pp. 388–396. Springer, Heidelberg (2013)
Rigutini, L., Maggini, M.: A semi-supervised document clustering algorithm based on EM. In: Proceedings of the International Conference on 2005 IEEE/WIC/ACM on Web Intelligence, pp. 200–206 (2005)
Hu, X., Yoo, I.: A comprehensive comparison study of document clustering for a biomedical digital library medline. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 220–229 (2006)
De Vargas, R.R., Bedregal, B.R.C.: A way to obtain the quality of a partition by adjusted rand index. In: Workshop-School on Theoretical Computer Science, pp. 67–71 (2013)
Acknowledgments
This work was partially supported by the Grant of the Ministry of Science of the Republic Serbia within the project TR33037.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Brodić, D., Amelio, A., Milivojević, Z.N. (2015). Classification of the Scripts in Medieval Documents from Balkan Region by Run-Length Texture Analysis. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9489. Springer, Cham. https://doi.org/10.1007/978-3-319-26532-2_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-26532-2_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26531-5
Online ISBN: 978-3-319-26532-2
eBook Packages: Computer ScienceComputer Science (R0)