Skip to main content
Log in

An accurate approach to real-time machine-readable zone detection with mobile devices

  • Special Issue Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In this article we consider a problem of machine-readable zone (MRZ) detection in document images on mobile devices. MRZ recognition is actively used for fast and reliable automatic personal data extraction from passports, IDs and visas. However, due to the low computing power and limited battery life of most mobile devices, the requirements for the complexity of the used models increase significantly. We present a state-of-the-art MRZ detection approach based on YOLO-MRZ—extremely fast, compact and accurate deep learning model. We consider the MRZ as a graphical object and use the object detection approach to find it. Proposed YOLO-MRZ is 83 times faster than Tiny YOLO v3, weights only 1 MB and well suited for embedded systems and mobile devices: It achieved 62 FPS on the Apple iPhone SE (2020). We address the small-scale MRZ detection problem with two-stage approach in which the YOLO-MRZ model is run twice: If the detected MRZ bounding box is too small or does not meet geometric criteria, we construct the ROI image based on it and run the same detector in the ROI. To assess the quality, we have tested it on 4 public datasets: SyntheticMRZ, MIDV-500, MIDV-2019 and MIDV-2020. Our approach outperforms all other solutions by a wide margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

Lists of files with training/test partitions of images in datasets are available upon request.

Code Availability

Not available.

References

  1. ICAO Doc 9303 (Eighth Edition) Part 3: Specifications Common to all MRTDs, Machine Readable Travel Documents. International Civil Aviation Organization (2021)

  2. Kwon, Y.-B., Kim, J.-H.: Recognition Based Verification for the Machine Readable Travel Documents. International Workshop on Graphics Recognition (GREC 2007). Curitiba, Brazil (2007)

  3. Hassan, A.B., Fadlalla, Y.A.: A survey on techniques of detecting identity documents forgery. In: Sudan Conference on Computer Science and Information Technology (SCCSIT) pp. 1–5 (2017). https://doi.org/10.1109/SCCSIT.2017.8293052

  4. Arlazarov, V.V., Zhukovskiy, A.E., Krivtsov, V.E., Nikolaev, D.P., Polevoy, D.V.: Analiz osobennostey ispolzovaniya statsionarnykh i mobilnykh malorazmernykh tsifrovykh video kamer dlya raspoznavaniya dokumentov. J Inf Technol Comput Syste (ITiVS) pp. 71–81 (2014) In Russian

  5. Arlazarov, V.V., Bulatov, K., Chernov, T., Arlazarov, V.L.: MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream. Comput. Opt. 43, 818–824 (2019). https://doi.org/10.18287/2412-6179-2019-43-5-818-824

    Article  Google Scholar 

  6. Bulatov, K., Matalov, D., Arlazarov, V.V.: MIDV-2019: Challenges of the Modern Mobile-Based Document OCR. ICMV 2019, pp. 114332N1–114332N6. Bellingham, Washington 98227-0010 USA (2020). https://doi.org/10.1117/12.2558438

  7. Bulatov, K.B., Emelyanova, E.V., Tropin, D.V., Skoryukina, N.S., Chernyshova, Y.S., Sheshkus, A.V., Usilin, S.A., Ming, Z., Burie, J.-C., Luqman, M.M., Arlazarov, V.V.: MIDV-2020: a comprehensive benchmark dataset for identity document analysis. Comput. Opt. 46, 252–270 (2022). https://doi.org/10.18287/2412-6179-CO-1006

    Article  Google Scholar 

  8. Samarin, A., Malykh, V., Kalaidin, P.: Verification method using limited image area. Proc. Inst. Syst. Anal. Rus. Acad. Sci. (2020). https://doi.org/10.14357/20790279200102

    Article  Google Scholar 

  9. Petrova, O., Bulatov, K.: Methods of Machine-Readable Zone Recognition Results Post-Processing. ICMV 2018, pp. 110411H1–110411H7. Bellingham, Washington 98227-0010 USA (2019). https://doi.org/10.1117/12.2522792

  10. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  11. Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690

  12. Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. 2018; cite arxiv:1804.02767; Comment: Tech Report

  13. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: an efficient and accurate scene text detector. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2642–2651 (2017). https://doi.org/10.1109/cvpr.2017.283

  14. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 9357–9366 (2019). https://doi.org/10.1109/CVPR.2019.00959

  15. Ye, J., Chen, Z., Liu, J., Du, B.: TextFuseNet: scene text detection with richer fused features. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 pp. 516–522 (2020). https://doi.org/10.24963/ijcai.2020/72

  16. Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: SEED: semantics enhanced encoder-decoder framework for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 13525–13534 (2020). https://doi.org/10.1109/CVPR42600.2020.01354

  17. Zhang, X., Zhu, B., Yao, X., Sun, Q., Li, R., Yu, B.: Context-based contrastive learning for scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence 36, 3353–3361 (2022). https://doi.org/10.1609/aaai.v36i3.20245

  18. Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021). https://doi.org/10.24963/ijcai.2020/72

  19. He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In:L IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 5020–5029 (2018). https://doi.org/10.1109/CVPR.2018.00527

  20. Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. 43, 532–548 (2021). https://doi.org/10.1109/TPAMI.2019.2937086

    Article  Google Scholar 

  21. Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019). https://doi.org/10.1109/ICCV.2019.00922

  22. Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: PAN++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5349–5367 (2022). https://doi.org/10.1109/TPAMI.2021.3077555

    Article  Google Scholar 

  23. Liu, Y., James, H., Gupta, O., Raviv, D.: MRZ code extraction from visa and passport documents using convolutional neural networks. Int. J. Doc. Anal. Recognit. 25, 29–39 (2022). https://doi.org/10.1007/s10032-021-00384-2

    Article  Google Scholar 

  24. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018). https://doi.org/10.1109/tpami.2017.2699184

    Article  Google Scholar 

  25. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/CVPR.2018.00474

  26. Hartl, A., Arth, C., Schmalstieg, D.: Real-time detection and recognition of machine-readable zones with mobile devices. In: VISAPP 2015 - 10th International Conference on Computer Vision Theory and Applications; VISIGRAPP, Proceedings 3, 79–87 (2015). https://doi.org/10.5220/0005294700790087

  27. Kolmakov, S.I., Skoryukina, N.S., Arlazarov, V.V.: Machine-readable zones detection in images captured by mobile devices’ cameras. Pattern Recognit. Image Anal. 30, 489–495 (2020). https://doi.org/10.1134/S105466182003013X

    Article  Google Scholar 

  28. Savelyev, B.I., Skoryukina, N.S., Arlazarov, V.V.: A method for machine-readable zones location based on a combination the Hough transform and feature points. Bull. South Ural State Univ. Ser. Math. Model. Program. Comput. Softw. 15, 100–110 (2022). https://doi.org/10.14529/mmp220208

  29. Ilyuhin, S., Sheshkus, A., Arlazarov, V., Nikolaev, D.: Hough encoder for machine readable zone localization. Pattern Recognit. Image Anal. (2022). https://doi.org/10.1134/S1054661822040150

    Article  Google Scholar 

  30. Andriyanov, N.A., Dementiev, V.E., Tashlinskii, A.: Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks. Comput. Opt. (2022). https://doi.org/10.18287/2412-6179-CO-922

  31. Arlazarov, V.V., Voysyat, J.S., Matalov, D.P., Nikolaev, D.P., Usilin, S.A.: Evolution of the Viola–Jones object detection method: a survey. Bull. South Ural State Univ. Ser. Math. Model. Program. Comput. Softw. 14, 5–23 (2021). https://doi.org/10.14529/mmp210401

    Article  MATH  Google Scholar 

  32. Lee, H., Kwak, N.: Character recognition for the machine reader zone of electronic identity cards. In: IEEE International Conference on Image Processing (ICIP) pp. 387–391 (2015). https://doi.org/10.1109/ICIP.2015.7350826

  33. Gayer, A.V., Chernyshova, Y.S., Sheshkus, A.V. Effective real-time augmentation of training dataset for the neural networks learning. ICMV 2018. pp. 110411I1–110411I7. Bellingham, Washington 98227-0010 USA (2019). https://doi.org/10.1117/12.2522969

  34. Smith, R. An overview of the tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) pp. 629–633 (2007), https://doi.org/10.1109/ICDAR.2007.4376991

  35. Du, Y., Li, C., Guo, R., Cui, C., Liu, W., Zhou, J., Lu, B., Yang, Y., Liu, Q., Hu, X., Yu, D., Ma, Y.: PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System (2021). abs/2109.03144

  36. Tretyakov, K.: PassportEye: extraction of machine-readable zoneinformation from passports, visas and id-cards via OCR (2016). https://github.com/konstantint/PassportEye, Accessed 20 Oct 2022

  37. Kostro, D.; Zasso, M. MRZ-Detection (2020). https://github.com/image-js/mrz-detection, Accessed 20 Oct 2022

  38. doubango.org, UltimateMRZ (2020). https://github.com/DoubangoTelecom/ultimateMRZ-SDK, Accessed 20 Oct 2022

Download references

Acknowledgements

We would like to thank Elena Limonova for profiling the YOLO-MRZ run time on mobile devices. We also thank Natalya Skoryukina for providing the source code for method [28] for measuring its quality on MIDV datasets.

Funding

No funds, grants or other supports were received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Gayer.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gayer, A., Ershova, D. & Arlazarov, V.V. An accurate approach to real-time machine-readable zone detection with mobile devices. IJDAR 26, 321–334 (2023). https://doi.org/10.1007/s10032-023-00435-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-023-00435-w

Keywords