Skip to main content
Log in

Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Graphic-rich texts are common in posters. In a movie poster, information, such as movie title, tag lines, and names of the actors, director, and production house, is available. Graphic-rich texts in movie titles represent not only sentiments but also their genre. Understanding the poster requires graphic-rich text recognition. Prior to that, one requires text localization, so background and foreground graffiti can be well segmented. In this paper, we propose a transfer learning-based approach for graphic-rich text localization, which was tuned by introducing reverse augmentation and rotated/inclined rectangle drawing technique. A convolution neural network-based model is then applied to identify their corresponding scripts. In our experiments, on a newly developed dataset (available upon request) that is composed of movie posters with multiple scripts of 1154 images, we achieved an average accuracy of 99.30%. Our results outperformed previously developed tools that are relying on handcrafted features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. https://www.imdb.com.

  2. https://urc.ucdavis.edu/sites/g/files/dgvnsk3561/files/inline-files/General%20Poster%20Design%20Principles%20-%20Handout.pdf.

References

  1. Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 176–181 (1997)

    Article  Google Scholar 

  2. Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-script line identification from Indian documents. In: Proceedings of Seventh International Conference on Document Analysis and Recognition, pp. 880–884. IEEE (2003)

  3. Banashree, N.P., Andhre, D., Vasanta, R., Satyanarayana, P.S.: OCR for script identification of Hindi (Devnagari) numerals using error diffusion Halftoning Algorithm with neural classifier. In: Proceedings of World Academy of Science Engineering and Technology, vol. 20, pp. 46–50 (2007)

  4. Ma, H., Doermann, D.: Adaptive Hindi OCR using generalized Hausdorff image comparison. ACM Trans. Asian Lang. Inf. Process. 2(3), 193–218 (2003)

    Article  Google Scholar 

  5. Santosh, K. C., Wendling, L.: Graphical symbol recognition. In: Wiley Encyclopedia of Electrical and Electronics Engineering, pp. 1–22 (1999)

  6. Obaidullah, S.M., Halder, C., Santosh, K.C., Das, N., Roy, K.: PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multimed. Tools Appl. 77(2), 1643–1678 (2018)

    Article  Google Scholar 

  7. Obaidullah, S. M., Halder, C., Das, N., Roy, K.: An approach for automatic Indic script identification from handwritten document images. In: Advanced Computing and Systems for Security, pp. 37–51. Springer, New Delhi, (2016)

  8. Lu, S., Chen, T., Tian, S., Lim, J.H., Tan, C.L.: Scene text extraction based on edges and support vector regression. Int. J. Doc. Anal. Recognit. 18(2), 125–135 (2015)

    Article  Google Scholar 

  9. Shi, B., Bai, X., Yao, C.: Script identification in the wild via discriminative convolutional neural network. Pattern Recognit. 52, 448–458 (2016)

    Article  Google Scholar 

  10. Gomez, L., Karatzas, D.: A fine-grained approach to scene text script identification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 192–197. IEEE (2016)

  11. Gomez, L., Nicolaou, A., Karatzas, D.: Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit. 67, 85–96 (2017)

    Article  Google Scholar 

  12. Bhunia, A.K., Konwer, A., Bhunia, A.K., Bhowmick, A., Roy, P.P., Pal, U.: Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recognit. 85, 172–184 (2019)

    Article  Google Scholar 

  13. Yi, C., Tian, Y.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image Process. 21(9), 4256–4268 (2012)

    Article  MathSciNet  Google Scholar 

  14. Shivakumara, P., Yuan, Z., Zhao, D., Lu, T., Tan, C.L.: New gradient-spatial-structural features for video script identification. Comput. Vis. Image Underst. 130, 35–53 (2015)

    Article  Google Scholar 

  15. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE (2010)

  16. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)

  17. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: European Conference on Computer Vision, pp. 512–528. Springer, Cham (2014)

  18. Huang, L., Wang, Y., Bai, T.: Recognizing art work image from natural type: a deep adaptive depiction fusion method. Vis. Comput. 1–12 (2020)

  19. Yang, H., Min, K.: Classification of basic artistic media based on a deep convolutional approach. Vis. Comput. 36(3), 559–578 (2020)

    Article  MathSciNet  Google Scholar 

  20. Busch, A., Boles, W.W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1720–1732 (2005)

    Article  Google Scholar 

  21. Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-script line identification from Indian documents. In: Proceedings of Seventh International Conference on Document Analysis and Recognition, pp. 880–884. IEEE (2003)

  22. Huynh-Thu, Q., Ghanbari, M.: Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 44(13), 800–801 (2008)

    Article  Google Scholar 

  23. Köksoy, O.: Multiresponse robust design: mean square error (MSE) criterion. Appl. Math. Comput. 175(2), 1716–1729 (2006)

    MathSciNet  MATH  Google Scholar 

  24. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 34(2), 107–116 (2013)

    Article  Google Scholar 

  25. Deguillaume, F., Voloshynovskiy, S. V., Pun, T.: Method for the estimation and recovering from general affine transforms in digital watermarking applications. In: Security and watermarking of multimedia contents IV, vol. 4675, pp. 313–322. International Society for Optics and Photonics (2002)

  26. Luisier, F., Blu, T., Unser, M.: Image denoising in mixed Poisson–Gaussian noise. IEEE Trans. Image Process. 20(3), 696–708 (2010)

    Article  MathSciNet  Google Scholar 

  27. Khmag, A., Al-haddad, S.A.R., Kamarudin, N.: Natural image noise level estimation based on local statistics for blind noise reduction. Vis. Comput. 34(4), 575–587 (2018)

    Article  Google Scholar 

  28. Van Opbroek, A., Ikram, M.A., Vernooij, M.W., De Bruijne, M.: Transfer learning improves supervised image segmentation across imaging protocols. IEEE Trans. Med. Imaging 34(5), 1018–1030 (2014)

    Article  Google Scholar 

  29. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)

  30. He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 745–753 (2017)

  31. Zhang, D., Han, X., Deng, C.: Review on the research and practice of deep learning and reinforcement learning in smart grids. CSEE J. Power Energy Syst. 4(3), 362–370 (2018)

    Article  Google Scholar 

  32. Ghosh, M., Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Das, N., Roy, K.: Identifying the presence of graphical texts in scene images using CNN. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 86–91. IEEE (2019)

  33. Agarwal, M., Maheshwari, R.P.: HOG feature and vocabulary tree for content-based image retrieval. Int. J. Signal Imaging Syst. Eng. 3(4), 246–254 (2010)

    Article  Google Scholar 

  34. Wang, G.D., Zhang, P.L., Ren, G.Q., Kou, X.: Texture feature extraction method fused with LBP and GLCM. Comput. Eng. 38, 199–201 (2012)

    Google Scholar 

  35. Jolliffe, I.: Principal Component Analysis, pp. 1094–1096. Springer, Berlin (2011)

    Google Scholar 

  36. Obaidullah, S.M., Mondal, A., Das, N., Roy, K.: Script identification from printed Indian document images and performance evaluation using different classifiers. Appl. Comput. Intell. Soft Comput. 2014, 896128 (2014)

    Google Scholar 

  37. Aksoy, S., Koperski, K., Tusk, C., Marchisio, G., Tilton, J.C.: Learning Bayesian classifiers for scene classification with a visual grammar. IEEE Trans. Geosci. Remote Sens. 43(3), 581–589 (2005)

    Article  Google Scholar 

  38. Rimey, R.D., Brown, C.M.: Where to look next using a Bayes net: an overview. In: Proceedings of 1992 DARPA Image Understanding Workshop, pp. 927–932 (1992)

  39. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE Conference on computer vision and pattern recognition, pp. 1794–1801. IEEE (2009)

  40. Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.C., Liu, C.L. Ogier, J.M.: ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1582–1587. IEEE (2019)

  41. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  42. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

  43. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  44. Bouguelia, M.R., Nowaczyk, S., Santosh, K.C., Verikas, A.: Agreeing to disagree: active learning with noisy labels without crowdsourcing. Int. J. Mach. Learn. Cybernet. 9(8), 1307–1319 (2018)

    Article  Google Scholar 

  45. Cai, L., Nguyen, B.P., Chui, C.K., Ong, S.H.: A two-level clustering approach for multidimensional transfer function specification in volume visualization. Vis. Comput. 33(2), 163–177 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaushik Roy.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest. The authors declare that there are no human participants and/or animals involved in this study. The authors also declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, M., Roy, S.S., Mukherjee, H. et al. Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition. Vis Comput 38, 1645–1664 (2022). https://doi.org/10.1007/s00371-021-02094-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02094-6

Keywords

Navigation