ABSTRACT
While being studied for several decades, Optical Character Recognition (OCR) has still been attracting considerable attention from researchers. Previous studies tend to focus on visual features of optical texts, such as texture, shape, and color to build OCR models. However, linguistic features, an important factor for OCR, has not been extensively investigated, especially for Vietnamese-OCR scanned documents. Therefore, we introduce a method to improve the performance of Vietnamese OCR by combining both visual and linguistic features of the optical text. The proposed method consists of (i) a domain-specific dictionary and (ii) a modified natural language processing model termed ABCNet, employed at the training and inference step, to determine the best candidate for the visual appearance of the text. Moreover, our method can easily be integrated with existing OCR methods to further increase their performance. Experimental results on a newly collected dataset show that the proposed method achieves an accuracy of 83.61% and a F1 score of 84.1%.
- Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, and R. Manmatha. 2021. DocFormer: End-to-End Transformer for Document Understanding. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 973–983. https://doi.org/10.1109/ICCV48922.2021.00103Google Scholar
- Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings(2015), 1–15. arxiv:1409.0473Google Scholar
- Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. 2020. Text Recognition in the Wild: A Survey. arxiv:2005.03492 [cs.CV]Google Scholar
- Mengmeng Cui, Wei Wang, Jinjin Zhang, and Liang Wang. 2021. Representation and Correlation Enhanced Encoder-Decoder Framework for Scene Text Recognition. arxiv:2106.06960 [cs.CV]Google Scholar
- T. Dao, Thanh-Hai Tran, Thi-Lan Le, H. Vu, Viet-Tung Nguyen, Dang-Khoa Mac, Ngoc-Diep Do, and Thanh-Thuy Pham. 2016. Indoor navigation assistance system for visually impaired people using multimodal technologies. 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV) (2016), 1–6.Google ScholarCross Ref
- Duc Phan Van Hoai, Huu-Thanh Duong, and Vinh Truong Hoang. 2021. Text recognition for Vietnamese identity card based on deep features network. International Journal on Document Analysis and Recognition (IJDAR) 24 (2021), 123–131.Google ScholarDigital Library
- Van-Nam Hoang, Thanh-Huong Nguyen, Thi-Lan Le, Thi-Thanh Hai Tran, Tan-Phu Vuong, and Nicolas Vuillerme. 2015. Obstacle detection and warning for visually impaired people based on electrode matrix and mobile Kinect. In 2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS). 54–59. https://doi.org/10.1109/NICS.2015.7302222Google ScholarCross Ref
- Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651(2016).Google Scholar
- Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, and Hongfu Liu. 2021. SelfDoc: Self-Supervised Document Representation Learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5648–5656. https://doi.org/10.1109/CVPR46437.2021.00560Google Scholar
- Yuliang* Liu, Hao* Chen, Chunhua Shen, Tong He, Lianwen Jin, and Liangwei Wang. 2020. ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network. arXiv preprint arXiv:2002.10200(2020).Google Scholar
- Vladimir Loginov. 2021. Why You Should Try the Real Data for the Scene Text Recognition. arxiv:2107.13938 [cs.CV]Google Scholar
- S. Manoharan. 2019. A SMART IMAGE PROCESSING ALGORITHM FOR TEXT RECOGNITION, INFORMATION EXTRACTION AND VOCALIZATION FOR THE VISUALLY CHALLENGED.Google Scholar
- Jie Mei, Aminul Islam, Abidalrahman Moh’d, Yajing Wu, and Evangelos Milios. 2018. Statistical learning for OCR error correction. Information Processing and Management 54, 6 (2018), 874–887. https://doi.org/10.1016/j.ipm.2018.06.001Google ScholarCross Ref
- Hoai Viet Nguyen, Linh Bao Doan, Hoang Viet Trinh, Hoang Huy Phan, and Ta Minh Thanh. 2021. MC-OCR Challenge 2021: Towards Document Understanding for Unconstrained Mobile-Captured Vietnamese Receipts. In 2021 RIVF International Conference on Computing and Communication Technologies (RIVF). IEEE, 1–5.Google ScholarCross Ref
- Nguyen Nguyen, Thu Nguyen, Vinh Tran, Triet Tran, Thanh Ngo, Thien Nguyen, and Minh Hoai. 2021. Dictionary-guided Scene Text Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Khanh Nguyen-Trong. 2022. An End-to-End Method to Extract Information from Vietnamese ID Card Images. International Journal of Advanced Computer Science and Applications 13, 3(2022).Google ScholarCross Ref
- Khanh Nguyen-Trong, Hoai Nam Vu, Ngon Nguyen Trung, and Cuong Pham. 2021. Gesture Recognition Using Wearable Sensors With Bi-Long Short-Term Memory Convolutional Neural Networks. IEEE Sensors Journal 21, 13 (2021), 15065–15079. https://doi.org/10.1109/JSEN.2021.3074642Google ScholarCross Ref
- Bui Hai Phong, Thang Manh Hoang, and Thi-Lan Le. 2022. An end-to-end framework for the detection of mathematical expressions in scientific document images. Expert Syst. J. Knowl. Eng. 39, 1 (2022). https://doi.org/10.1111/exsy.12800Google Scholar
- M. Rajesh, Bindhu K. Rajan, Ajay Roy, K. Almaria Thomas, Ancy Thomas, T. Bincy Tharakan, and C. Dinesh. 2017. Text recognition and face detection aid for visually impaired person using Raspberry PI. In 2017 International Conference on Circuit,Power and Computing Technologies (ICCPCT). 1–5. https://doi.org/10.1109/ICCPCT.2017.8074355Google ScholarCross Ref
- K. Shanmugam and B. Vanathi. 2019. Hardcopy Text Recognition and Vocalization for Visually Impaired and Illiterates in Bilingual Language. Springer International Publishing, Cham, 151–163. https://doi.org/10.1007/978-3-030-02674-5_11Google Scholar
- Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao. 2016. Detecting Text in Natural Image with Connectionist Text Proposal Network. arxiv:1609.03605 [cs.CV]Google Scholar
- Bao Hieu Tran, Duc Viet Hoang, Nguyen Manh Hiep, Pham Ngoc Bao Anh, Hoang Gia Bao, Nguyen Duc Anh, Bui Hai Phong, Thanh-Hung Nguyen, Phi Le Nguyen, and Thi-Lan Le. 2021. MC-OCR Challenge 2021: A Multi-modal Approach for Mobile-Captured Vietnamese Receipts Recognition. In RIVF International Conference on Computing and Communication Technologies, RIVF 2021, Hanoi, Vietnam, August 19-21, 2021. IEEE, 1–6. https://doi.org/10.1109/RIVF51545.2021.9642088Google Scholar
- Dat Tran-Anh, Nam Hoai Vu, Khanh Nguyen-Trong, and Cuong Pham. 2022. Multi-task learning neural networks for breath sound detection and classification in pervasive healthcare. Pervasive and Mobile Computing 86 (2022), 101685. https://doi.org/10.1016/j.pmcj.2022.101685Google ScholarDigital Library
- Gauri Vaidya, Ketki Vaidya, and Kishor Bhosale. 2020. Text Recognition System for Visually Impaired using Portable Camera. In 2020 International Conference on Convergence to Digital World - Quo Vadis (ICCDW). 1–4. https://doi.org/10.1109/ICCDW45521.2020.9318706Google ScholarCross Ref
- Duc Phan Van Hoai, Huu Thanh Duong, and Vinh Truong Hoang. 2021. Text recognition for Vietnamese identity card based on deep features network. International Journal on Document Analysis and Recognition 24, 1-2(2021), 123–131. https://doi.org/10.1007/s10032-021-00363-7Google ScholarDigital Library
- Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. 2020. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1192–1200. https://doi.org/10.1145/3394486.3403172Google ScholarDigital Library
- Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, and Rong Xiao. 2020. PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks. (4 2020). http://arxiv.org/abs/2004.07464Google Scholar
- Yingying Zhu, Cong Yao, and Xiang Bai. 2016. Scene Text Detection and Recognition: Recent Advances and Future Trends. Front. Comput. Sci. 10, 1 (Feb. 2016), 19–36. https://doi.org/10.1007/s11704-015-4488-0Google ScholarDigital Library
Index Terms
- Improving text recognition by combining visual and linguistic features of text
Recommendations
An optical character recognition system for printed Telugu text
Telugu is one of the oldest and popular languages of India, spoken by more than 66 million people, especially in South India. Not much work has been reported on the development of optical character recognition (OCR) systems for Telugu text. Therefore, ...
Urdu Nasta'liq text recognition system based on multi-dimensional recurrent neural network and statistical features
Character recognition for cursive script like Arabic, handwritten English and French is a challenging task which becomes more complicated for Urdu Nasta'liq text due to complexity of this script over Arabic. Recurrent neural network (RNN) has proved ...
Comments