skip to main content
research-article

TAMIZHİ: Historical Tamil-Brahmi Script Recognition Using CNN and MobileNet

Authors Info & Claims
Published:14 July 2021Publication History
Skip Abstract Section

Abstract

Computational epigraphy is the study of an ancient script where the computer science and mathematical model is relatively built for epigraphy. The Tamil-Brahmi inscriptions are the most ancient of the extant written of the Tamil. The inscriptions furnish valuable information on many aspects of life in the ancient Tamil country from a period anterior to the literary age of Sangam. The recognition of the script and systematic analysis of the script is required. The recognition of this script is complex, containing various curves for a single character and the style of writing overlap with curves and lines. Generating corpus of the script is necessary, since it is the initial step for computational epigraphy. The archaeological department has supported the raw data that helped to develop a corpus of Tamizhi. In this article, we have implemented a convolution neural network in various ways, i.e., (i) Training the CNN model from scratch a Softmax classifier in a sequential model (ii) using MobileNet: Transfer learning paradigm from a pre-trained model on a Tamizhi dataset (iii) Building Model with CNN and SVM (iv) SVM for evaluation of best accuracy to recognize handwritten Brahmi characters. To train the CNN Model an extensive TAMIZHİ handwritten Brahmi Dataset of 1lakh and 90,000 isolated samples for the character has been created and deployed. The designed dataset consists of 9 vowels and 18 consonants and 209 class so researchers can use machine learning. MobileNet outperformed among all the models implemented with the accuracy of 68.3%, whereas other algorithm ranges from 58% to 67% with respect to the Tamizhi dataset. MobileNet model is trained and tested for the dataset of vowels (8 class), consonants (18 class), and consonants vowels (26 class) with the accuracy of 98.1%, 97.7%, 97.5%, respectively.

References

  1. Thiru. I. Mahadevan. 1970. Tami-Brahmi inscriptions. Lectures delivered at the seminar on archaeology, conducted by the Tamil Nadu state department of archaeology, under the auspices of Madurai University. The archeological library book.Google ScholarGoogle Scholar
  2. T. Sri. Sridhar. Tamil-Brahmi kalvettukal. Tamil Nadu State Department of Archaeology. The archeological library book.Google ScholarGoogle Scholar
  3. Mahadevan Iravatham. 2003. Early Tamil epigraphy. From the Earliest Times to the Sixth Century AD (2003).Google ScholarGoogle Scholar
  4. Rabby, A. K. M. Shahariar Azad, Sadeka Haque, Sheikh Abujar, and Syed Akhter Hossain. 2018. Ekushnet: Using convolutional neural network for Bangla handwritten recognition. Procedia Comput. Sci. 143 (2018), 603–610.Google ScholarGoogle ScholarCross RefCross Ref
  5. P. Rajan and S. Sridhar. 2017. Identification of ancient Tamil letters and its characters: Automatic date fixation based on contour-let technique. In Proceedings of the International Conference on Graphics and Signal Processing. 40–43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Papaodysseus Constantin, Panayiotis Rousopoulos, Fotios Giannopoulos, Solomon Zannos, Dimitris Arabadjis, Mihalis Panagopoulos, E. Kalfa, Christopher Blackwell, and Stephen Tracy. 2014. Identifying the writer of ancient inscriptions and byzantine codices. A novel approach. Comput. Vis. Image Underst. 121 (2014), 57–73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Idicula Sumam Mary. 2012. An online character recognition system to convert Grantha script to Malayalam. Arxiv Preprint ArXiv:1208.4316 (2012).Google ScholarGoogle Scholar
  8. Elleuch Mohamed, Najiba Tagougui, and Monji Kherallah. 2017. Optimization of DBN using regularization methods applied for recognizing Arabic handwritten script. Procedia Comput. Sci. 108 (2017), 2292–2297.Google ScholarGoogle ScholarCross RefCross Ref
  9. Chaudhari Shailesh and Ravi M. Gulati. 2016. Script identification using Gabor feature and SVM classifier. Procedia Comput. Sci. 79 (2016), 85–92.Google ScholarGoogle ScholarCross RefCross Ref
  10. Getu Siranesh. 2016. Ancient Ethiopic Manuscript Recognition Using Deep Learning Artificial Neural Network. Ph.D. Dissertation. Addis Ababa University.Google ScholarGoogle Scholar
  11. Sarkhel Ritesh, Nibaran Das, Aritra Das, Mahantapas Kundu, and Mita Nasipuri. 2017. A multi-scale deep quad tree–based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts. Pattern Recog. 71 (2017), 78–93.Google ScholarGoogle ScholarCross RefCross Ref
  12. Nguyen Cong Kha, Cuong Tuan Nguyen, and Nakagawa Masaki. 2017. Tens of thousands of nom character recognition by deep convolution neural networks. In Proceedings of the 4th International Workshop on Historical Document Imaging and Processing. 37–41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Das Nibaran, Kallol Acharya, Ram Sarkar, Subhadip Basu, Mahantapas Kundu, and Mita Nasipuri. 2014. A benchmark image database of isolated Bangla handwritten compound characters. Int. J. Docum. Anal. Recog. 17, 4 (2014), 413–431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jonathan J. Hull. 1994. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intel. 16, 5 (1994), 550–554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. V. Jawahar, Anand Kumar, A. Phaneendra, and K. J. Jinesh. 2009. Building datasets for Indian language OCR research. In Guide to OCR for Indic Scripts. Springer, London, 3–25.Google ScholarGoogle Scholar
  16. Liu Cheng-Lin, Fei Yin, Da-Han Wang, and Qiu-Feng Wang. 2011. CASIA online and offline Chinese handwriting databases. In Proceedings of the IEEE International Conference on Document Analysis and Recognition. 37–41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Su Tonghua, Tianwen Zhang, and Dejun Guan. 2007. Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text. Int. J. Doc. Anal. Recog. 10, 1 (2007), 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Vikas J. Dongre and Vijay H. Mankar. 2012. Development of comprehensive Devnagari numeral and character database for offline handwritten character recognition. Appl. Comput. Intell. Soft Comput. 2012 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Khan Haider Adnan, Abdullah Al Helal, and Khawza I. Ahmed. 2014. Handwritten Bangla digit recognition using sparse representation classifier. In Proceedings of the IEEE International Conference on Informatics, Electronics & Vision (ICIEV’14). 1–6.Google ScholarGoogle Scholar
  20. Agrawal Mudit, Ajay S. Bhaskarabhatla, and Sriganesh Madhvanath. 2004. Data collection for handwriting corpus creation in Indic scripts. In Proceedings of the International Conference on Speech and Language Technology and Oriental COCOSDA (ICSLT-COCOSDA’04).Google ScholarGoogle Scholar
  21. Chen Feiyang, Nan Chen, Hanyang Mao, and Hanlin Hu. 2018. Assessing four neural networks on handwritten digit recognition dataset (MNIST). Arxiv Preprint Arxiv:1811.08278 (2018).Google ScholarGoogle Scholar
  22. Sabri A. Mahmoud, Irfan Ahmad, Mohammad Alshayeb, Wasfi G. Al-Khatib, Mohammad Tanvir Parvez, Gernot A. Fink, Volker Märgner, and Haikal El Abed. 2012. Khatt: Arabic offline handwritten text database. In Proceedings of the IEEEInternational Conference on Frontiers in Handwriting Recognition. 449–454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dana H. Ballard. 1987. Generalizing the Hough transform to detect arbitrary shapes. In Readings in Computer Vision. Morgan Kaufmann, 714–725. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. LeCun Yann, Fu Jie Huang, and Leon Bottou. 2004. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chollet Francois. 2016. Building powerful image classification models using very little data. International Conference paper.Google ScholarGoogle Scholar
  26. Srivastava Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929–1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sharma Richa and Tarun Mudgal. 2019. Primitive feature-based optical character recognition of the Devanagari script. In Progress in Advanced Computing and Intelligent Engineering. Springer, Singapore, 249–259.Google ScholarGoogle Scholar
  28. Ghosh Rajib, Chirumavila Vamshi, and Prabhat Kumar. 2019. RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recog. 92 (2019), 203–218.Google ScholarGoogle ScholarCross RefCross Ref
  29. Roy Partha Pratim, Ayan Kumar Bhunia, Ayan Das, Prasenjit Dey, and Umapada Pal. 2016. HMM-based Indic handwritten word recognition using zone segmentation. Pattern Recog. 60 (2016), 1057–1075. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Soora Narasimha Reddy and Parag S. Deshpande. 2018. A novel local skew correction and segmentation approach for printed multilingual Indian documents. Alexandria Eng. J. 57, 3 (2018), 1609–1618.Google ScholarGoogle ScholarCross RefCross Ref
  31. Varghese K. Sonu, Ajay James, and Saravanan Chandran. 2016. A novel tri-stage recognition scheme for handwritten Malayalam character recognition. Procedia Technol. 24, 1 (2016), 1333–1340.Google ScholarGoogle ScholarCross RefCross Ref
  32. Zhangrila Louis Lady. 2018. Accuracy level of $p algorithm for Javanese script detection on Android-based application. Procedia Comput. Sci. 135 (2018), 416–424.Google ScholarGoogle ScholarCross RefCross Ref
  33. Raj V. Amrutha, R. L. Jyothi, and A. Anilkumar. 2017. Grantha script recognition from ancient palm leaves using histogram of orientation shape context. In Proceedings of the IEEE International Conference on Computing Methodologies and Communication (ICCMC’17). 790–794.Google ScholarGoogle Scholar
  34. Saleem Sajid, Fabian Hollaus, and Robert Sablatnig. 2014. Recognition of degraded ancient characters based on dense SIFT. In Proceedings of the 1st International Conference on Digital Access to Textual Cultural Heritage. 15–20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. R. Kavitha and C. Srimathi. 2019. Benchmarking on offline handwritten Tamil character recognition using convolutional neural networks. J. King Saud Univ.-Comput. Inf. Sci. (2019).Google ScholarGoogle Scholar
  36. P. B. Khanale and S. D. Chitnis. 2011. Handwritten Devanagari character recognition using artificial neural network. J. Artif. Intell. 4, 1 (2011), 55–62.Google ScholarGoogle ScholarCross RefCross Ref
  37. Khaled S. Younis. 2017. Arabic handwritten character recognition based on deep convolutional neural networks. Jordanian J. Comput. Inf. Technol. 3, 3 (2017), 186–200.Google ScholarGoogle ScholarCross RefCross Ref
  38. Samir Benbakreti and Aoued Boukelif. 2018. New approach for online Arabic manuscript recognition by deep belief network. (2018).Google ScholarGoogle Scholar
  39. Al-Aziz, Ahmad M. Abd, Mervat Gheith, and Ayman F. Sayed. 2011. Recognition for old Arabic manuscripts using spatial gray level dependence (SGLD). Egyptian Inf. J. 12, 1 (2011), 37–43.Google ScholarGoogle ScholarCross RefCross Ref
  40. Zhong Guoqiang and Mohamed Cheriet. 2015. Tensor representation learning based image patch analysis for text identification and recognition. Pattern Recog. 48, 4 (2015), 1211–1224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sural Shamik and P. K. Das. 1999. An MLP using Hough transform based fuzzy feature extraction for Bengali script recognition. Pattern Recog. Lett. 20, 8 (1999), 771–782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. T. Mane and U. V. Kulkarni. 2018. Visualizing and understanding customized convolutional neural network for recognition of handwritten Marathi numerals. Procedia Comput. Sci. 132 (2018), 1123–1137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hasan Md, Fatima Tuz Zohora Asha, and Talha Zubaer. 2019. Bangla handwritten character recognition using convolutional neural network. (2019).Google ScholarGoogle Scholar
  44. Soselia Davit, Magda Tsintsadze, Levan Shugliashvili, Irakli Koberidze, Shota Amashukeli, and Sandro Jijavadze. 2018. On Georgian handwritten character recognition. IFAC-Papers OnLine 51, 30 (2018), 161–165.Google ScholarGoogle ScholarCross RefCross Ref
  45. Guruprasad Prathima and Jharna Majumdar. 2016. Multimodal recognition framework: an accurate and powerful Nandinagari handwritten character recognition model. Procedia Comput. Sci. 89 (2016), 836–844.Google ScholarGoogle ScholarCross RefCross Ref
  46. Gautam Neha and Soo See Chai. 2017. Optical character recognition for Brahmi script using geometric method. J. Telecommun., Electron. Comput. Eng. 9, 3–11 (2017), 131–136.Google ScholarGoogle Scholar
  47. Supriana Iping and Albadr Nasution. 2013. Arabic character recognition system development. Procedia Technol. 11 (2013), 334–341.Google ScholarGoogle ScholarCross RefCross Ref
  48. Naz Saeeda, Saad Bin Ahmed, Riaz Ahmad, and Muhammad Imran Razzak. 2016. Zoning features and 2DLSTM for Urdu text-line recognition. In Proceedings of the International Conference on Knowledge-based and Intelligent Information & Engineering Systems. 16–22.Google ScholarGoogle Scholar
  49. Lehal Gurpreet Singh and Ankur Rana. 2013. Recognition of Nastalique Urdu ligatures. In Proceedings of the 4th International Workshop on Multilingual OCR. 1–5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Diem Markus and Robert Sablatnig. 2010. Recognizing characters of ancient manuscripts. In Computer Vision and Image Analysis of Art, 7531, 753106. International Society for Optics and Photonics.Google ScholarGoogle ScholarCross RefCross Ref
  51. K. C. Kamal, Zhendong Yin, Mingyang Wu, and Zhilu Wu. 2019. Depthwise separable convolution architectures for plant disease classification. Comput. Electron. Agri. 165 (2019), 104948.Google ScholarGoogle ScholarCross RefCross Ref
  52. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. ArXiv Preprint Arxiv:1704.04861 (2017).Google ScholarGoogle Scholar
  53. Easwaramoorthy Sathishkumar, F. Sophia, and A. Prathik. 2016. Biometric authentication using finger nails. In Proceedings of the IEEE International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS’16). 1–6.Google ScholarGoogle Scholar
  54. Easwaramoorthy Sathishkumar, Usha Moorthy, Chunduru Anil Kumar, S. Bharath Bhushan, and Vishnupriya Sadagopan. 2017. Content based image retrieval with enhanced privacy in cloud using Apache Spark. In Proceedings of the International Conference on Data Science Analytics and Applications. Springer Singapore, 114–128.Google ScholarGoogle Scholar

Index Terms

  1. TAMIZHİ: Historical Tamil-Brahmi Script Recognition Using CNN and MobileNet

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 3
      May 2021
      240 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3457152
      Issue’s Table of Contents

      Copyright © 2021 Association for Computing Machinery.

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 July 2021
      • Accepted: 1 May 2020
      • Revised: 1 April 2020
      • Received: 1 February 2020
      Published in tallip Volume 20, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)41
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format