An accurate approach to real-time machine-readable zone detection with mobile devices

Gayer, Alexander; Ershova, Daria; Arlazarov, Vladimir V.

doi:10.1007/s10032-023-00435-w

An accurate approach to real-time machine-readable zone detection with mobile devices

Special Issue Paper
Published: 18 May 2023

Volume 26, pages 321–334, (2023)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Alexander Gayer^1,2^na1,
Daria Ershova^1,3^na1 &
Vladimir V. Arlazarov^1,2

562 Accesses
2 Citations
Explore all metrics

Abstract

In this article we consider a problem of machine-readable zone (MRZ) detection in document images on mobile devices. MRZ recognition is actively used for fast and reliable automatic personal data extraction from passports, IDs and visas. However, due to the low computing power and limited battery life of most mobile devices, the requirements for the complexity of the used models increase significantly. We present a state-of-the-art MRZ detection approach based on YOLO-MRZ—extremely fast, compact and accurate deep learning model. We consider the MRZ as a graphical object and use the object detection approach to find it. Proposed YOLO-MRZ is 83 times faster than Tiny YOLO v3, weights only 1 MB and well suited for embedded systems and mobile devices: It achieved 62 FPS on the Apple iPhone SE (2020). We address the small-scale MRZ detection problem with two-stage approach in which the YOLO-MRZ model is run twice: If the detected MRZ bounding box is too small or does not meet geometric criteria, we construct the ROI image based on it and run the same detector in the ROI. To assess the quality, we have tested it on 4 public datasets: SyntheticMRZ, MIDV-500, MIDV-2019 and MIDV-2020. Our approach outperforms all other solutions by a wide margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Ultra-lightweight Approach for Machine Readable Zone Detection via Semantic Segmentation and Fast Hough Transform

Hough Encoder for Machine Readable Zone Localization

Article 26 December 2022

Public Social Distance Monitoring System Using Object Detection YOLO Deep Learning Algorithm

Article 23 September 2023

Data Availability

Lists of files with training/test partitions of images in datasets are available upon request.

Code Availability

Not available.

References

ICAO Doc 9303 (Eighth Edition) Part 3: Specifications Common to all MRTDs, Machine Readable Travel Documents. International Civil Aviation Organization (2021)
Kwon, Y.-B., Kim, J.-H.: Recognition Based Verification for the Machine Readable Travel Documents. International Workshop on Graphics Recognition (GREC 2007). Curitiba, Brazil (2007)
Hassan, A.B., Fadlalla, Y.A.: A survey on techniques of detecting identity documents forgery. In: Sudan Conference on Computer Science and Information Technology (SCCSIT) pp. 1–5 (2017). https://doi.org/10.1109/SCCSIT.2017.8293052
Arlazarov, V.V., Zhukovskiy, A.E., Krivtsov, V.E., Nikolaev, D.P., Polevoy, D.V.: Analiz osobennostey ispolzovaniya statsionarnykh i mobilnykh malorazmernykh tsifrovykh video kamer dlya raspoznavaniya dokumentov. J Inf Technol Comput Syste (ITiVS) pp. 71–81 (2014) In Russian
Arlazarov, V.V., Bulatov, K., Chernov, T., Arlazarov, V.L.: MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream. Comput. Opt. 43, 818–824 (2019). https://doi.org/10.18287/2412-6179-2019-43-5-818-824
Article Google Scholar
Bulatov, K., Matalov, D., Arlazarov, V.V.: MIDV-2019: Challenges of the Modern Mobile-Based Document OCR. ICMV 2019, pp. 114332N1–114332N6. Bellingham, Washington 98227-0010 USA (2020). https://doi.org/10.1117/12.2558438
Bulatov, K.B., Emelyanova, E.V., Tropin, D.V., Skoryukina, N.S., Chernyshova, Y.S., Sheshkus, A.V., Usilin, S.A., Ming, Z., Burie, J.-C., Luqman, M.M., Arlazarov, V.V.: MIDV-2020: a comprehensive benchmark dataset for identity document analysis. Comput. Opt. 46, 252–270 (2022). https://doi.org/10.18287/2412-6179-CO-1006
Article Google Scholar
Samarin, A., Malykh, V., Kalaidin, P.: Verification method using limited image area. Proc. Inst. Syst. Anal. Rus. Acad. Sci. (2020). https://doi.org/10.14357/20790279200102
Article Google Scholar
Petrova, O., Bulatov, K.: Methods of Machine-Readable Zone Recognition Results Post-Processing. ICMV 2018, pp. 110411H1–110411H7. Bellingham, Washington 98227-0010 USA (2019). https://doi.org/10.1117/12.2522792
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. 2018; cite arxiv:1804.02767; Comment: Tech Report
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: an efficient and accurate scene text detector. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2642–2651 (2017). https://doi.org/10.1109/cvpr.2017.283
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 9357–9366 (2019). https://doi.org/10.1109/CVPR.2019.00959
Ye, J., Chen, Z., Liu, J., Du, B.: TextFuseNet: scene text detection with richer fused features. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 pp. 516–522 (2020). https://doi.org/10.24963/ijcai.2020/72
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: SEED: semantics enhanced encoder-decoder framework for scene text recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 13525–13534 (2020). https://doi.org/10.1109/CVPR42600.2020.01354
Zhang, X., Zhu, B., Yao, X., Sun, Q., Li, R., Yu, B.: Context-based contrastive learning for scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence 36, 3353–3361 (2022). https://doi.org/10.1609/aaai.v36i3.20245
Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021). https://doi.org/10.24963/ijcai.2020/72
He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In:L IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 5020–5029 (2018). https://doi.org/10.1109/CVPR.2018.00527
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. 43, 532–548 (2021). https://doi.org/10.1109/TPAMI.2019.2937086
Article Google Scholar
Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019). https://doi.org/10.1109/ICCV.2019.00922
Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C.: PAN++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5349–5367 (2022). https://doi.org/10.1109/TPAMI.2021.3077555
Article Google Scholar
Liu, Y., James, H., Gupta, O., Raviv, D.: MRZ code extraction from visa and passport documents using convolutional neural networks. Int. J. Doc. Anal. Recognit. 25, 29–39 (2022). https://doi.org/10.1007/s10032-021-00384-2
Article Google Scholar
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018). https://doi.org/10.1109/tpami.2017.2699184
Article Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/CVPR.2018.00474
Hartl, A., Arth, C., Schmalstieg, D.: Real-time detection and recognition of machine-readable zones with mobile devices. In: VISAPP 2015 - 10th International Conference on Computer Vision Theory and Applications; VISIGRAPP, Proceedings 3, 79–87 (2015). https://doi.org/10.5220/0005294700790087
Kolmakov, S.I., Skoryukina, N.S., Arlazarov, V.V.: Machine-readable zones detection in images captured by mobile devices’ cameras. Pattern Recognit. Image Anal. 30, 489–495 (2020). https://doi.org/10.1134/S105466182003013X
Article Google Scholar
Savelyev, B.I., Skoryukina, N.S., Arlazarov, V.V.: A method for machine-readable zones location based on a combination the Hough transform and feature points. Bull. South Ural State Univ. Ser. Math. Model. Program. Comput. Softw. 15, 100–110 (2022). https://doi.org/10.14529/mmp220208
Ilyuhin, S., Sheshkus, A., Arlazarov, V., Nikolaev, D.: Hough encoder for machine readable zone localization. Pattern Recognit. Image Anal. (2022). https://doi.org/10.1134/S1054661822040150
Article Google Scholar
Andriyanov, N.A., Dementiev, V.E., Tashlinskii, A.: Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks. Comput. Opt. (2022). https://doi.org/10.18287/2412-6179-CO-922
Arlazarov, V.V., Voysyat, J.S., Matalov, D.P., Nikolaev, D.P., Usilin, S.A.: Evolution of the Viola–Jones object detection method: a survey. Bull. South Ural State Univ. Ser. Math. Model. Program. Comput. Softw. 14, 5–23 (2021). https://doi.org/10.14529/mmp210401
Article MATH Google Scholar
Lee, H., Kwak, N.: Character recognition for the machine reader zone of electronic identity cards. In: IEEE International Conference on Image Processing (ICIP) pp. 387–391 (2015). https://doi.org/10.1109/ICIP.2015.7350826
Gayer, A.V., Chernyshova, Y.S., Sheshkus, A.V. Effective real-time augmentation of training dataset for the neural networks learning. ICMV 2018. pp. 110411I1–110411I7. Bellingham, Washington 98227-0010 USA (2019). https://doi.org/10.1117/12.2522969
Smith, R. An overview of the tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) pp. 629–633 (2007), https://doi.org/10.1109/ICDAR.2007.4376991
Du, Y., Li, C., Guo, R., Cui, C., Liu, W., Zhou, J., Lu, B., Yang, Y., Liu, Q., Hu, X., Yu, D., Ma, Y.: PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System (2021). abs/2109.03144
Tretyakov, K.: PassportEye: extraction of machine-readable zoneinformation from passports, visas and id-cards via OCR (2016). https://github.com/konstantint/PassportEye, Accessed 20 Oct 2022
Kostro, D.; Zasso, M. MRZ-Detection (2020). https://github.com/image-js/mrz-detection, Accessed 20 Oct 2022
doubango.org, UltimateMRZ (2020). https://github.com/DoubangoTelecom/ultimateMRZ-SDK, Accessed 20 Oct 2022

Download references

Acknowledgements

We would like to thank Elena Limonova for profiling the YOLO-MRZ run time on mobile devices. We also thank Natalya Skoryukina for providing the source code for method [28] for measuring its quality on MIDV datasets.

Funding

No funds, grants or other supports were received.

Author information

Alexander Gayer and Daria Ershova have contributed equally to this work.

Authors and Affiliations

Smart Engines Service LLC, pr. 60-letiya Oktyabrya, 9, Moscow, Russia, 117312
Alexander Gayer, Daria Ershova & Vladimir V. Arlazarov
Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Vavilova 44/2, Moscow, Russia, 119333
Alexander Gayer & Vladimir V. Arlazarov
Lomonosov Moscow State University, Kolmogorova 1, Moscow, Russia, 119234
Daria Ershova

Authors

Alexander Gayer
View author publications
You can also search for this author inPubMed Google Scholar
Daria Ershova
View author publications
You can also search for this author inPubMed Google Scholar
Vladimir V. Arlazarov
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Alexander Gayer.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gayer, A., Ershova, D. & Arlazarov, V.V. An accurate approach to real-time machine-readable zone detection with mobile devices. IJDAR 26, 321–334 (2023). https://doi.org/10.1007/s10032-023-00435-w

Download citation

Received: 13 November 2022
Revised: 18 February 2023
Accepted: 14 April 2023
Published: 18 May 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10032-023-00435-w

Keywords

Part of a collection:

ICDAR 2023

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An accurate approach to real-time machine-readable zone detection with mobile devices

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Ultra-lightweight Approach for Machine Readable Zone Detection via Semantic Segmentation and Fast Hough Transform

Hough Encoder for Machine Readable Zone Localization

Public Social Distance Monitoring System Using Object Detection YOLO Deep Learning Algorithm

Data Availability

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now