ABSTRACT
The images with fixed layouts, such as images from ID cards, driving licenses, and invoices can be recognized from prior knowledge[1]-[7]. However, The non-immobilized images, such as product labels at ports, is very difficult to be extracted structured data information from tag images because the formats and contents of tags in different countries and different product vary widely[8]. The process is complex and the error rate is high.
This paper combines the characteristics of the Cross-Border Products label, overall format complex and simple local structure (top-to-down and left-to-right), and proposes a method for identifying and structuring port commodity label information. The method mainly establishes a template library of keyword and data unit information of commodity labels according to the port commodity classification and then separates the keyword and the data information from the multi-line text with accurate location information recognized by the OCR engine. Finally, the keyword and data are structured according to the local layout pattern between the keyword and the data, and the structured Cross-Border product information is obtained.
- Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, (2016)"Multi-oriented text detection with fully convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4159--4167.Google Scholar
- S. Tian, Y. Pan, C. Huang, S. Lu, K. Yu, and C. Lim Tan, (,2015) "Text flow: A unifed text detection system in natural scene images," in Proceedings of the IEEE international conference on computer vision, pp. 4651--4659.Google Scholar
- M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, (2016)"Reading text in the wild with convolutional neural networks," International Journal of Computer Vision, vol. 116, no. 1, pp. 1--20.Google ScholarDigital Library
- P. He, W. Huang, Y. Qiao, C. C. Loy, and X. Tang, (2016) "Reading scene text in deep convolutional sequences.," in AAAI, vol. 16, pp. 3501--3508.Google Scholar
- T. He, W. Huang, Y. Qiao, and J. Yao, (2016)"Accurate text localization innatural image with cascaded convolutional text network," arXiv preprint arXiv:1603.09423.Google Scholar
- A. Gupta, A. Vedaldi, and A. Zisserman, (2016) "Synthetic data for textlocalisation in natural images," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315--2324.Google Scholar
- W. Huang, Y. Qiao, and X. Tang, (2014)"Robust scene text detection with convolution neural network induced mser trees," in European Conference on Computer Vision, pp. 497--511, Springer.Google Scholar
- J.-Y. Ramel, M. Crucianu, N. Vincent, C. Faure (2006). Detection, Extraction and Representation of Tables. Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR'03).Google Scholar
- Y. Shinyama and S. Sekine, (2006)"Preemptive information extraction using unrestricted relation discovery," in Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 304--311, Association for Computational Linguistics.Google Scholar
- H. Dejean, (2015). "Extracting structured data from unstructured document with incomplete resources". in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pp. 271--275, IEEE, 2015.Google ScholarDigital Library
- D. A. Ferrucci, (2012)"Introduction to "this is watson"," IBM Journalof Research and Development, vol. 56, no. 3.4, pp. 1--1.Google Scholar
- A. Arasu and H. Garcia-Molina, (2003) "Extracting structured data from web pages," in Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 337--348, ACM.Google Scholar
- B. Liu, S. Zhang, Z. Hong and X. Ye, (2018) A Horizontal Tilt Correction Method for Ship License Numbers Recognition, Journal of Physics: Conference Series, IOP Publishing.Google Scholar
- M. Busta, L. Neumann, and J. Matas, (2015) "Fastext: Efcient unconstrained scene text detector," in Proceedings of the IEEE International Conference on Computer Vision, pp. 1206--1214.Google Scholar
- Y. Ye, S. Zhu, J. Wang, Q. Du, Y. Yang, D. Tu, L. Wang and J. Luo (2018). A unifed scheme of text localization and structured data extraction for joint OCR and data mining. 2018 IEEE International Conference on Big Data (Big Data).Google ScholarCross Ref
Index Terms
- Tag Information Recognition Approaches and Algorithms for Cross-Border Products Checking
Recommendations
Handwriting Recognition in Indian Regional Scripts: A Survey of Offline Techniques
Offline handwriting recognition in Indian regional scripts is an interesting area of research as almost 460 million people in India use regional scripts. The nine major Indian regional scripts are Bangla (for Bengali and Assamese languages), Gujarati, ...
Two template matching approaches to Arabic, Amharic and Latin isolated characters recognition
With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The ...
Character and numeral recognition for non-Indic and Indic scripts: a survey
AbstractA collection of different scripts is employed in writing languages throughout the world. Character and numeral recognition of a particular script is a key area in the field of pattern recognition. In this paper, we have presented a comprehensive ...
Comments