Abstract
In this article we consider a problem of machine-readable zone (MRZ) detection in document images on mobile devices. MRZ recognition is actively used for fast and reliable automatic personal data extraction from passports, IDs and visas. However, due to the low computing power and limited battery life of most mobile devices, the requirements for the complexity of the used models increase significantly. We present a state-of-the-art MRZ detection approach based on YOLO-MRZ—extremely fast, compact and accurate deep learning model. We consider the MRZ as a graphical object and use the object detection approach to find it. Proposed YOLO-MRZ is 83 times faster than Tiny YOLO v3, weights only 1 MB and well suited for embedded systems and mobile devices: It achieved 62 FPS on the Apple iPhone SE (2020). We address the small-scale MRZ detection problem with two-stage approach in which the YOLO-MRZ model is run twice: If the detected MRZ bounding box is too small or does not meet geometric criteria, we construct the ROI image based on it and run the same detector in the ROI. To assess the quality, we have tested it on 4 public datasets: SyntheticMRZ, MIDV-500, MIDV-2019 and MIDV-2020. Our approach outperforms all other solutions by a wide margin.











Similar content being viewed by others
Data Availability
Lists of files with training/test partitions of images in datasets are available upon request.
Code Availability
Not available.
References
ICAO Doc 9303 (Eighth Edition) Part 3: Specifications Common to all MRTDs, Machine Readable Travel Documents. International Civil Aviation Organization (2021)
Kwon, Y.-B., Kim, J.-H.: Recognition Based Verification for the Machine Readable Travel Documents. International Workshop on Graphics Recognition (GREC 2007). Curitiba, Brazil (2007)
Hassan, A.B., Fadlalla, Y.A.: A survey on techniques of detecting identity documents forgery. In: Sudan Conference on Computer Science and Information Technology (SCCSIT) pp. 1–5 (2017). https://doi.org/10.1109/SCCSIT.2017.8293052
Arlazarov, V.V., Zhukovskiy, A.E., Krivtsov, V.E., Nikolaev, D.P., Polevoy, D.V.: Analiz osobennostey ispolzovaniya statsionarnykh i mobilnykh malorazmernykh tsifrovykh video kamer dlya raspoznavaniya dokumentov. J Inf Technol Comput Syste (ITiVS) pp. 71–81 (2014) In Russian
Arlazarov, V.V., Bulatov, K., Chernov, T., Arlazarov, V.L.: MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream. Comput. Opt. 43, 818–824 (2019). https://doi.org/10.18287/2412-6179-2019-43-5-818-824
Bulatov, K., Matalov, D., Arlazarov, V.V.: MIDV-2019: Challenges of the Modern Mobile-Based Document OCR. ICMV 2019, pp. 114332N1–114332N6. Bellingham, Washington 98227-0010 USA (2020). https://doi.org/10.1117/12.2558438
Bulatov, K.B., Emelyanova, E.V., Tropin, D.V., Skoryukina, N.S., Chernyshova, Y.S., Sheshkus, A.V., Usilin, S.A., Ming, Z., Burie, J.-C., Luqman, M.M., Arlazarov, V.V.: MIDV-2020: a comprehensive benchmark dataset for identity document analysis. Comput. Opt. 46, 252–270 (2022). https://doi.org/10.18287/2412-6179-CO-1006
Samarin, A., Malykh, V., Kalaidin, P.: Verification method using limited image area. Proc. Inst. Syst. Anal. Rus. Acad. Sci. (2020). https://doi.org/10.14357/20790279200102
Petrova, O., Bulatov, K.: Methods of Machine-Readable Zone Recognition Results Post-Processing. ICMV 2018, pp. 110411H1–110411H7. Bellingham, Washington 98227-0010 USA (2019). https://doi.org/10.1117/12.2522792
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. 2018; cite arxiv:1804.02767; Comment: Tech Report
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: an efficient and accurate scene text detector. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2642–2651 (2017). https://doi.org/10.1109/cvpr.2017.283
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 9357–9366 (2019). https://doi.org/10.1109/CVPR.2019.00959
Ye, J., Chen, Z., Liu, J., Du, B.: TextFuseNet: scene text detection with richer fused features. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 pp. 516–522 (2020). https://doi.org/10.24963/ijcai.2020/72
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: SEED: semantics enhanced encoder-decoder framework for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 13525–13534 (2020). https://doi.org/10.1109/CVPR42600.2020.01354
Zhang, X., Zhu, B., Yao, X., Sun, Q., Li, R., Yu, B.: Context-based contrastive learning for scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence 36, 3353–3361 (2022). https://doi.org/10.1609/aaai.v36i3.20245
Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021). https://doi.org/10.24963/ijcai.2020/72
He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In:L IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 5020–5029 (2018). https://doi.org/10.1109/CVPR.2018.00527
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. 43, 532–548 (2021). https://doi.org/10.1109/TPAMI.2019.2937086
Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019). https://doi.org/10.1109/ICCV.2019.00922
Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: PAN++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5349–5367 (2022). https://doi.org/10.1109/TPAMI.2021.3077555
Liu, Y., James, H., Gupta, O., Raviv, D.: MRZ code extraction from visa and passport documents using convolutional neural networks. Int. J. Doc. Anal. Recognit. 25, 29–39 (2022). https://doi.org/10.1007/s10032-021-00384-2
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018). https://doi.org/10.1109/tpami.2017.2699184
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/CVPR.2018.00474
Hartl, A., Arth, C., Schmalstieg, D.: Real-time detection and recognition of machine-readable zones with mobile devices. In: VISAPP 2015 - 10th International Conference on Computer Vision Theory and Applications; VISIGRAPP, Proceedings 3, 79–87 (2015). https://doi.org/10.5220/0005294700790087
Kolmakov, S.I., Skoryukina, N.S., Arlazarov, V.V.: Machine-readable zones detection in images captured by mobile devices’ cameras. Pattern Recognit. Image Anal. 30, 489–495 (2020). https://doi.org/10.1134/S105466182003013X
Savelyev, B.I., Skoryukina, N.S., Arlazarov, V.V.: A method for machine-readable zones location based on a combination the Hough transform and feature points. Bull. South Ural State Univ. Ser. Math. Model. Program. Comput. Softw. 15, 100–110 (2022). https://doi.org/10.14529/mmp220208
Ilyuhin, S., Sheshkus, A., Arlazarov, V., Nikolaev, D.: Hough encoder for machine readable zone localization. Pattern Recognit. Image Anal. (2022). https://doi.org/10.1134/S1054661822040150
Andriyanov, N.A., Dementiev, V.E., Tashlinskii, A.: Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks. Comput. Opt. (2022). https://doi.org/10.18287/2412-6179-CO-922
Arlazarov, V.V., Voysyat, J.S., Matalov, D.P., Nikolaev, D.P., Usilin, S.A.: Evolution of the Viola–Jones object detection method: a survey. Bull. South Ural State Univ. Ser. Math. Model. Program. Comput. Softw. 14, 5–23 (2021). https://doi.org/10.14529/mmp210401
Lee, H., Kwak, N.: Character recognition for the machine reader zone of electronic identity cards. In: IEEE International Conference on Image Processing (ICIP) pp. 387–391 (2015). https://doi.org/10.1109/ICIP.2015.7350826
Gayer, A.V., Chernyshova, Y.S., Sheshkus, A.V. Effective real-time augmentation of training dataset for the neural networks learning. ICMV 2018. pp. 110411I1–110411I7. Bellingham, Washington 98227-0010 USA (2019). https://doi.org/10.1117/12.2522969
Smith, R. An overview of the tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) pp. 629–633 (2007), https://doi.org/10.1109/ICDAR.2007.4376991
Du, Y., Li, C., Guo, R., Cui, C., Liu, W., Zhou, J., Lu, B., Yang, Y., Liu, Q., Hu, X., Yu, D., Ma, Y.: PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System (2021). abs/2109.03144
Tretyakov, K.: PassportEye: extraction of machine-readable zoneinformation from passports, visas and id-cards via OCR (2016). https://github.com/konstantint/PassportEye, Accessed 20 Oct 2022
Kostro, D.; Zasso, M. MRZ-Detection (2020). https://github.com/image-js/mrz-detection, Accessed 20 Oct 2022
doubango.org, UltimateMRZ (2020). https://github.com/DoubangoTelecom/ultimateMRZ-SDK, Accessed 20 Oct 2022
Acknowledgements
We would like to thank Elena Limonova for profiling the YOLO-MRZ run time on mobile devices. We also thank Natalya Skoryukina for providing the source code for method [28] for measuring its quality on MIDV datasets.
Funding
No funds, grants or other supports were received.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gayer, A., Ershova, D. & Arlazarov, V.V. An accurate approach to real-time machine-readable zone detection with mobile devices. IJDAR 26, 321–334 (2023). https://doi.org/10.1007/s10032-023-00435-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-023-00435-w