research-article

TAMIZHİ: Historical Tamil-Brahmi Script Recognition Using CNN and MobileNet

Authors:
Dhivya S

Department of Information Technology, School of Information Technology & Engineering, VIT University, Vellore

Department of Information Technology, School of Information Technology & Engineering, VIT University, Vellore

0000-0002-6869-1560
View Profile

,
Usha Devi G

Department of Information Technology, School of Information Technology & Engineering, VIT University, Vellore

Department of Information Technology, School of Information Technology & Engineering, VIT University, Vellore
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20 Issue 3Article No.: 39pp 1–26https://doi.org/10.1145/3402891

Published:14 July 2021Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Computational epigraphy is the study of an ancient script where the computer science and mathematical model is relatively built for epigraphy. The Tamil-Brahmi inscriptions are the most ancient of the extant written of the Tamil. The inscriptions furnish valuable information on many aspects of life in the ancient Tamil country from a period anterior to the literary age of Sangam. The recognition of the script and systematic analysis of the script is required. The recognition of this script is complex, containing various curves for a single character and the style of writing overlap with curves and lines. Generating corpus of the script is necessary, since it is the initial step for computational epigraphy. The archaeological department has supported the raw data that helped to develop a corpus of Tamizhi. In this article, we have implemented a convolution neural network in various ways, i.e., (i) Training the CNN model from scratch a Softmax classifier in a sequential model (ii) using MobileNet: Transfer learning paradigm from a pre-trained model on a Tamizhi dataset (iii) Building Model with CNN and SVM (iv) SVM for evaluation of best accuracy to recognize handwritten Brahmi characters. To train the CNN Model an extensive TAMIZHİ handwritten Brahmi Dataset of 1lakh and 90,000 isolated samples for the character has been created and deployed. The designed dataset consists of 9 vowels and 18 consonants and 209 class so researchers can use machine learning. MobileNet outperformed among all the models implemented with the accuracy of 68.3%, whereas other algorithm ranges from 58% to 67% with respect to the Tamizhi dataset. MobileNet model is trained and tested for the dataset of vowels (8 class), consonants (18 class), and consonants vowels (26 class) with the accuracy of 98.1%, 97.7%, 97.5%, respectively.

References

Thiru. I. Mahadevan. 1970. Tami-Brahmi inscriptions. Lectures delivered at the seminar on archaeology, conducted by the Tamil Nadu state department of archaeology, under the auspices of Madurai University. The archeological library book.Google Scholar
T. Sri. Sridhar. Tamil-Brahmi kalvettukal. Tamil Nadu State Department of Archaeology. The archeological library book.Google Scholar
Mahadevan Iravatham. 2003. Early Tamil epigraphy. From the Earliest Times to the Sixth Century AD (2003).Google Scholar
Rabby, A. K. M. Shahariar Azad, Sadeka Haque, Sheikh Abujar, and Syed Akhter Hossain. 2018. Ekushnet: Using convolutional neural network for Bangla handwritten recognition. Procedia Comput. Sci. 143 (2018), 603–610.Google ScholarCross Ref
P. Rajan and S. Sridhar. 2017. Identification of ancient Tamil letters and its characters: Automatic date fixation based on contour-let technique. In Proceedings of the International Conference on Graphics and Signal Processing. 40–43. Google ScholarDigital Library
Papaodysseus Constantin, Panayiotis Rousopoulos, Fotios Giannopoulos, Solomon Zannos, Dimitris Arabadjis, Mihalis Panagopoulos, E. Kalfa, Christopher Blackwell, and Stephen Tracy. 2014. Identifying the writer of ancient inscriptions and byzantine codices. A novel approach. Comput. Vis. Image Underst. 121 (2014), 57–73. Google ScholarDigital Library
Idicula Sumam Mary. 2012. An online character recognition system to convert Grantha script to Malayalam. Arxiv Preprint ArXiv:1208.4316 (2012).Google Scholar
Elleuch Mohamed, Najiba Tagougui, and Monji Kherallah. 2017. Optimization of DBN using regularization methods applied for recognizing Arabic handwritten script. Procedia Comput. Sci. 108 (2017), 2292–2297.Google ScholarCross Ref
Chaudhari Shailesh and Ravi M. Gulati. 2016. Script identification using Gabor feature and SVM classifier. Procedia Comput. Sci. 79 (2016), 85–92.Google ScholarCross Ref
Getu Siranesh. 2016. Ancient Ethiopic Manuscript Recognition Using Deep Learning Artificial Neural Network. Ph.D. Dissertation. Addis Ababa University.Google Scholar
Sarkhel Ritesh, Nibaran Das, Aritra Das, Mahantapas Kundu, and Mita Nasipuri. 2017. A multi-scale deep quad tree–based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts. Pattern Recog. 71 (2017), 78–93.Google ScholarCross Ref
Nguyen Cong Kha, Cuong Tuan Nguyen, and Nakagawa Masaki. 2017. Tens of thousands of nom character recognition by deep convolution neural networks. In Proceedings of the 4th International Workshop on Historical Document Imaging and Processing. 37–41. Google ScholarDigital Library
Das Nibaran, Kallol Acharya, Ram Sarkar, Subhadip Basu, Mahantapas Kundu, and Mita Nasipuri. 2014. A benchmark image database of isolated Bangla handwritten compound characters. Int. J. Docum. Anal. Recog. 17, 4 (2014), 413–431. Google ScholarDigital Library
Jonathan J. Hull. 1994. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intel. 16, 5 (1994), 550–554. Google ScholarDigital Library
C. V. Jawahar, Anand Kumar, A. Phaneendra, and K. J. Jinesh. 2009. Building datasets for Indian language OCR research. In Guide to OCR for Indic Scripts. Springer, London, 3–25.Google Scholar
Liu Cheng-Lin, Fei Yin, Da-Han Wang, and Qiu-Feng Wang. 2011. CASIA online and offline Chinese handwriting databases. In Proceedings of the IEEE International Conference on Document Analysis and Recognition. 37–41. Google ScholarDigital Library
Su Tonghua, Tianwen Zhang, and Dejun Guan. 2007. Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text. Int. J. Doc. Anal. Recog. 10, 1 (2007), 27. Google ScholarDigital Library
Vikas J. Dongre and Vijay H. Mankar. 2012. Development of comprehensive Devnagari numeral and character database for offline handwritten character recognition. Appl. Comput. Intell. Soft Comput. 2012 (2012). Google ScholarDigital Library
Khan Haider Adnan, Abdullah Al Helal, and Khawza I. Ahmed. 2014. Handwritten Bangla digit recognition using sparse representation classifier. In Proceedings of the IEEE International Conference on Informatics, Electronics & Vision (ICIEV’14). 1–6.Google Scholar
Agrawal Mudit, Ajay S. Bhaskarabhatla, and Sriganesh Madhvanath. 2004. Data collection for handwriting corpus creation in Indic scripts. In Proceedings of the International Conference on Speech and Language Technology and Oriental COCOSDA (ICSLT-COCOSDA’04).Google Scholar
Chen Feiyang, Nan Chen, Hanyang Mao, and Hanlin Hu. 2018. Assessing four neural networks on handwritten digit recognition dataset (MNIST). Arxiv Preprint Arxiv:1811.08278 (2018).Google Scholar
Sabri A. Mahmoud, Irfan Ahmad, Mohammad Alshayeb, Wasfi G. Al-Khatib, Mohammad Tanvir Parvez, Gernot A. Fink, Volker Märgner, and Haikal El Abed. 2012. Khatt: Arabic offline handwritten text database. In Proceedings of the IEEEInternational Conference on Frontiers in Handwriting Recognition. 449–454. Google ScholarDigital Library
Dana H. Ballard. 1987. Generalizing the Hough transform to detect arbitrary shapes. In Readings in Computer Vision. Morgan Kaufmann, 714–725. Google ScholarDigital Library
LeCun Yann, Fu Jie Huang, and Leon Bottou. 2004. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04). Google ScholarDigital Library
Chollet Francois. 2016. Building powerful image classification models using very little data. International Conference paper.Google Scholar
Srivastava Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929–1958. Google ScholarDigital Library
Sharma Richa and Tarun Mudgal. 2019. Primitive feature-based optical character recognition of the Devanagari script. In Progress in Advanced Computing and Intelligent Engineering. Springer, Singapore, 249–259.Google Scholar
Ghosh Rajib, Chirumavila Vamshi, and Prabhat Kumar. 2019. RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recog. 92 (2019), 203–218.Google ScholarCross Ref
Roy Partha Pratim, Ayan Kumar Bhunia, Ayan Das, Prasenjit Dey, and Umapada Pal. 2016. HMM-based Indic handwritten word recognition using zone segmentation. Pattern Recog. 60 (2016), 1057–1075. Google ScholarDigital Library
Soora Narasimha Reddy and Parag S. Deshpande. 2018. A novel local skew correction and segmentation approach for printed multilingual Indian documents. Alexandria Eng. J. 57, 3 (2018), 1609–1618.Google ScholarCross Ref
Varghese K. Sonu, Ajay James, and Saravanan Chandran. 2016. A novel tri-stage recognition scheme for handwritten Malayalam character recognition. Procedia Technol. 24, 1 (2016), 1333–1340.Google ScholarCross Ref
Zhangrila Louis Lady. 2018. Accuracy level of $p algorithm for Javanese script detection on Android-based application. Procedia Comput. Sci. 135 (2018), 416–424.Google ScholarCross Ref
Raj V. Amrutha, R. L. Jyothi, and A. Anilkumar. 2017. Grantha script recognition from ancient palm leaves using histogram of orientation shape context. In Proceedings of the IEEE International Conference on Computing Methodologies and Communication (ICCMC’17). 790–794.Google Scholar
Saleem Sajid, Fabian Hollaus, and Robert Sablatnig. 2014. Recognition of degraded ancient characters based on dense SIFT. In Proceedings of the 1st International Conference on Digital Access to Textual Cultural Heritage. 15–20. Google ScholarDigital Library
B. R. Kavitha and C. Srimathi. 2019. Benchmarking on offline handwritten Tamil character recognition using convolutional neural networks. J. King Saud Univ.-Comput. Inf. Sci. (2019).Google Scholar
P. B. Khanale and S. D. Chitnis. 2011. Handwritten Devanagari character recognition using artificial neural network. J. Artif. Intell. 4, 1 (2011), 55–62.Google ScholarCross Ref
Khaled S. Younis. 2017. Arabic handwritten character recognition based on deep convolutional neural networks. Jordanian J. Comput. Inf. Technol. 3, 3 (2017), 186–200.Google ScholarCross Ref
Samir Benbakreti and Aoued Boukelif. 2018. New approach for online Arabic manuscript recognition by deep belief network. (2018).Google Scholar
Al-Aziz, Ahmad M. Abd, Mervat Gheith, and Ayman F. Sayed. 2011. Recognition for old Arabic manuscripts using spatial gray level dependence (SGLD). Egyptian Inf. J. 12, 1 (2011), 37–43.Google ScholarCross Ref
Zhong Guoqiang and Mohamed Cheriet. 2015. Tensor representation learning based image patch analysis for text identification and recognition. Pattern Recog. 48, 4 (2015), 1211–1224. Google ScholarDigital Library
Sural Shamik and P. K. Das. 1999. An MLP using Hough transform based fuzzy feature extraction for Bengali script recognition. Pattern Recog. Lett. 20, 8 (1999), 771–782. Google ScholarDigital Library
D. T. Mane and U. V. Kulkarni. 2018. Visualizing and understanding customized convolutional neural network for recognition of handwritten Marathi numerals. Procedia Comput. Sci. 132 (2018), 1123–1137.Google ScholarDigital Library
Hasan Md, Fatima Tuz Zohora Asha, and Talha Zubaer. 2019. Bangla handwritten character recognition using convolutional neural network. (2019).Google Scholar
Soselia Davit, Magda Tsintsadze, Levan Shugliashvili, Irakli Koberidze, Shota Amashukeli, and Sandro Jijavadze. 2018. On Georgian handwritten character recognition. IFAC-Papers OnLine 51, 30 (2018), 161–165.Google ScholarCross Ref
Guruprasad Prathima and Jharna Majumdar. 2016. Multimodal recognition framework: an accurate and powerful Nandinagari handwritten character recognition model. Procedia Comput. Sci. 89 (2016), 836–844.Google ScholarCross Ref
Gautam Neha and Soo See Chai. 2017. Optical character recognition for Brahmi script using geometric method. J. Telecommun., Electron. Comput. Eng. 9, 3–11 (2017), 131–136.Google Scholar
Supriana Iping and Albadr Nasution. 2013. Arabic character recognition system development. Procedia Technol. 11 (2013), 334–341.Google ScholarCross Ref
Naz Saeeda, Saad Bin Ahmed, Riaz Ahmad, and Muhammad Imran Razzak. 2016. Zoning features and 2DLSTM for Urdu text-line recognition. In Proceedings of the International Conference on Knowledge-based and Intelligent Information & Engineering Systems. 16–22.Google Scholar
Lehal Gurpreet Singh and Ankur Rana. 2013. Recognition of Nastalique Urdu ligatures. In Proceedings of the 4th International Workshop on Multilingual OCR. 1–5. Google ScholarDigital Library
Diem Markus and Robert Sablatnig. 2010. Recognizing characters of ancient manuscripts. In Computer Vision and Image Analysis of Art, 7531, 753106. International Society for Optics and Photonics.Google ScholarCross Ref
K. C. Kamal, Zhendong Yin, Mingyang Wu, and Zhilu Wu. 2019. Depthwise separable convolution architectures for plant disease classification. Comput. Electron. Agri. 165 (2019), 104948.Google ScholarCross Ref
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. ArXiv Preprint Arxiv:1704.04861 (2017).Google Scholar
Easwaramoorthy Sathishkumar, F. Sophia, and A. Prathik. 2016. Biometric authentication using finger nails. In Proceedings of the IEEE International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS’16). 1–6.Google Scholar
Easwaramoorthy Sathishkumar, Usha Moorthy, Chunduru Anil Kumar, S. Bharath Bhushan, and Vishnupriya Sadagopan. 2017. Content based image retrieval with enhanced privacy in cloud using Apache Spark. In Proceedings of the International Conference on Data Science Analytics and Applications. Springer Singapore, 114–128.Google Scholar

Index Terms

TAMIZHİ: Historical Tamil-Brahmi Script Recognition Using CNN and MobileNet
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Off-line cursive handwritten Tamil character recognition

In spite of several advancements in technologies pertaining to Optical character recognition, handwriting continues to persist as means of documenting information for day-to-day life. The process of segmentation and recognition pose quiets a lot of ...
Read More
Bangla Handwritten Digit Recognition Using Deep Convolutional Neural Network
ICCA 2020: Proceedings of the International Conference on Computing Advancements

Handwritten Bangla digit recognition is one of the most challenging computer vision problems due to its diverse shapes and writing style. Recently deep learning based convolutional neural network known as deep CNN finds wide-spread applications in ...
Read More
Gujarati Script Recognition
Abstract
Character recognition is the extraction of printed or handwritten text from images into machine-readable format. The extracted text can be easily edited, modified and efficiently stored. While there are several Optical Character Recognition (OCR) ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20, Issue 3
May 2021
240 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3457152
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 July 2021
- Accepted: 1 May 2020
- Revised: 1 April 2020
- Received: 1 February 2020
Published in tallip Volume 20, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
TAMIZHİ
Brahmi
convolutional neural network (CNN)
MobileNet
handwritten script recognition
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 207
  Total Downloads
- Downloads (Last 12 months)41
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

TAMIZHİ: Historical Tamil-Brahmi Script Recognition Using CNN and MobileNet

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Off-line cursive handwritten Tamil character recognition

Bangla Handwritten Digit Recognition Using Deep Convolutional Neural Network

Gujarati Script Recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

TAMIZHİ: Historical Tamil-Brahmi Script Recognition Using CNN and MobileNet

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Off-line cursive handwritten Tamil character recognition

Bangla Handwritten Digit Recognition Using Deep Convolutional Neural Network

Gujarati Script Recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media