Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition

Ghosh, Mridul; Roy, Sayan Saha; Mukherjee, Himadri; Obaidullah, Sk Md; Santosh, K. C.; Roy, Kaushik

doi:10.1007/s00371-021-02094-6

Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition

Original article
Published: 26 March 2021

Volume 38, pages 1645–1664, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Mridul Ghosh¹,
Sayan Saha Roy²,
Himadri Mukherjee³,
Sk Md Obaidullah⁴,
K. C. Santosh⁵ &
…
Kaushik Roy³

1213 Accesses
14 Citations
Explore all metrics

Abstract

Graphic-rich texts are common in posters. In a movie poster, information, such as movie title, tag lines, and names of the actors, director, and production house, is available. Graphic-rich texts in movie titles represent not only sentiments but also their genre. Understanding the poster requires graphic-rich text recognition. Prior to that, one requires text localization, so background and foreground graffiti can be well segmented. In this paper, we propose a transfer learning-based approach for graphic-rich text localization, which was tuned by introducing reverse augmentation and rotated/inclined rectangle drawing technique. A convolution neural network-based model is then applied to identify their corresponding scripts. In our experiments, on a newly developed dataset (available upon request) that is composed of movie posters with multiple scripts of 1154 images, we achieved an average accuracy of 99.30%. Our results outperformed previously developed tools that are relying on handcrafted features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 18

Script Identification of Movie Titles from Posters

MOPO-HBT: A movie poster dataset for title extraction and recognition

Article 06 December 2023

Mridul Ghosh, Sayan Saha Roy, … Kaushik Roy

LayoutGCN: A Lightweight Architecture for Visually Rich Document Understanding

Notes

References

Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 176–181 (1997)
Article Google Scholar
Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-script line identification from Indian documents. In: Proceedings of Seventh International Conference on Document Analysis and Recognition, pp. 880–884. IEEE (2003)
Banashree, N.P., Andhre, D., Vasanta, R., Satyanarayana, P.S.: OCR for script identification of Hindi (Devnagari) numerals using error diffusion Halftoning Algorithm with neural classifier. In: Proceedings of World Academy of Science Engineering and Technology, vol. 20, pp. 46–50 (2007)
Ma, H., Doermann, D.: Adaptive Hindi OCR using generalized Hausdorff image comparison. ACM Trans. Asian Lang. Inf. Process. 2(3), 193–218 (2003)
Article Google Scholar
Santosh, K. C., Wendling, L.: Graphical symbol recognition. In: Wiley Encyclopedia of Electrical and Electronics Engineering, pp. 1–22 (1999)
Obaidullah, S.M., Halder, C., Santosh, K.C., Das, N., Roy, K.: PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multimed. Tools Appl. 77(2), 1643–1678 (2018)
Article Google Scholar
Obaidullah, S. M., Halder, C., Das, N., Roy, K.: An approach for automatic Indic script identification from handwritten document images. In: Advanced Computing and Systems for Security, pp. 37–51. Springer, New Delhi, (2016)
Lu, S., Chen, T., Tian, S., Lim, J.H., Tan, C.L.: Scene text extraction based on edges and support vector regression. Int. J. Doc. Anal. Recognit. 18(2), 125–135 (2015)
Article Google Scholar
Shi, B., Bai, X., Yao, C.: Script identification in the wild via discriminative convolutional neural network. Pattern Recognit. 52, 448–458 (2016)
Article Google Scholar
Gomez, L., Karatzas, D.: A fine-grained approach to scene text script identification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 192–197. IEEE (2016)
Gomez, L., Nicolaou, A., Karatzas, D.: Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit. 67, 85–96 (2017)
Article Google Scholar
Bhunia, A.K., Konwer, A., Bhunia, A.K., Bhowmick, A., Roy, P.P., Pal, U.: Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recognit. 85, 172–184 (2019)
Article Google Scholar
Yi, C., Tian, Y.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image Process. 21(9), 4256–4268 (2012)
Article MathSciNet Google Scholar
Shivakumara, P., Yuan, Z., Zhao, D., Lu, T., Tan, C.L.: New gradient-spatial-structural features for video script identification. Comput. Vis. Image Underst. 130, 35–53 (2015)
Article Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE (2010)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: European Conference on Computer Vision, pp. 512–528. Springer, Cham (2014)
Huang, L., Wang, Y., Bai, T.: Recognizing art work image from natural type: a deep adaptive depiction fusion method. Vis. Comput. 1–12 (2020)
Yang, H., Min, K.: Classification of basic artistic media based on a deep convolutional approach. Vis. Comput. 36(3), 559–578 (2020)
Article MathSciNet Google Scholar
Busch, A., Boles, W.W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1720–1732 (2005)
Article Google Scholar
Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-script line identification from Indian documents. In: Proceedings of Seventh International Conference on Document Analysis and Recognition, pp. 880–884. IEEE (2003)
Huynh-Thu, Q., Ghanbari, M.: Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 44(13), 800–801 (2008)
Article Google Scholar
Köksoy, O.: Multiresponse robust design: mean square error (MSE) criterion. Appl. Math. Comput. 175(2), 1716–1729 (2006)
MathSciNet MATH Google Scholar
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 34(2), 107–116 (2013)
Article Google Scholar
Deguillaume, F., Voloshynovskiy, S. V., Pun, T.: Method for the estimation and recovering from general affine transforms in digital watermarking applications. In: Security and watermarking of multimedia contents IV, vol. 4675, pp. 313–322. International Society for Optics and Photonics (2002)
Luisier, F., Blu, T., Unser, M.: Image denoising in mixed Poisson–Gaussian noise. IEEE Trans. Image Process. 20(3), 696–708 (2010)
Article MathSciNet Google Scholar
Khmag, A., Al-haddad, S.A.R., Kamarudin, N.: Natural image noise level estimation based on local statistics for blind noise reduction. Vis. Comput. 34(4), 575–587 (2018)
Article Google Scholar
Van Opbroek, A., Ikram, M.A., Vernooij, M.W., De Bruijne, M.: Transfer learning improves supervised image segmentation across imaging protocols. IEEE Trans. Med. Imaging 34(5), 1018–1030 (2014)
Article Google Scholar
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 745–753 (2017)
Zhang, D., Han, X., Deng, C.: Review on the research and practice of deep learning and reinforcement learning in smart grids. CSEE J. Power Energy Syst. 4(3), 362–370 (2018)
Article Google Scholar
Ghosh, M., Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Das, N., Roy, K.: Identifying the presence of graphical texts in scene images using CNN. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 86–91. IEEE (2019)
Agarwal, M., Maheshwari, R.P.: HOG feature and vocabulary tree for content-based image retrieval. Int. J. Signal Imaging Syst. Eng. 3(4), 246–254 (2010)
Article Google Scholar
Wang, G.D., Zhang, P.L., Ren, G.Q., Kou, X.: Texture feature extraction method fused with LBP and GLCM. Comput. Eng. 38, 199–201 (2012)
Google Scholar
Jolliffe, I.: Principal Component Analysis, pp. 1094–1096. Springer, Berlin (2011)
Google Scholar
Obaidullah, S.M., Mondal, A., Das, N., Roy, K.: Script identification from printed Indian document images and performance evaluation using different classifiers. Appl. Comput. Intell. Soft Comput. 2014, 896128 (2014)
Google Scholar
Aksoy, S., Koperski, K., Tusk, C., Marchisio, G., Tilton, J.C.: Learning Bayesian classifiers for scene classification with a visual grammar. IEEE Trans. Geosci. Remote Sens. 43(3), 581–589 (2005)
Article Google Scholar
Rimey, R.D., Brown, C.M.: Where to look next using a Bayes net: an overview. In: Proceedings of 1992 DARPA Image Understanding Workshop, pp. 927–932 (1992)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE Conference on computer vision and pattern recognition, pp. 1794–1801. IEEE (2009)
Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.C., Liu, C.L. Ogier, J.M.: ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1582–1587. IEEE (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Bouguelia, M.R., Nowaczyk, S., Santosh, K.C., Verikas, A.: Agreeing to disagree: active learning with noisy labels without crowdsourcing. Int. J. Mach. Learn. Cybernet. 9(8), 1307–1319 (2018)
Article Google Scholar
Cai, L., Nguyen, B.P., Chui, C.K., Ong, S.H.: A two-level clustering approach for multidimensional transfer function specification in volume visualization. Vis. Comput. 33(2), 163–177 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Shyampur Siddheswari Mahavidyalaya, Howrah, India
Mridul Ghosh
Department of Radio Physics and Electronics, Calcutta University, Kolkata, India
Sayan Saha Roy
Department of Computer Science, West Bengal State University, Kolkata, India
Himadri Mukherjee & Kaushik Roy
Department of Computer Science, and Engineering, Aliah University, Kolkata, India
Sk Md Obaidullah
Department of Computer Science, University of South Dakota, Vermillion, SD, USA
K. C. Santosh

Authors

Mridul Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Sayan Saha Roy
View author publications
You can also search for this author in PubMed Google Scholar
Himadri Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Sk Md Obaidullah
View author publications
You can also search for this author in PubMed Google Scholar
K. C. Santosh
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kaushik Roy.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest. The authors declare that there are no human participants and/or animals involved in this study. The authors also declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, M., Roy, S.S., Mukherjee, H. et al. Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition. Vis Comput 38, 1645–1664 (2022). https://doi.org/10.1007/s00371-021-02094-6

Download citation

Accepted: 11 February 2021
Published: 26 March 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s00371-021-02094-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition

Abstract

Access this article

Similar content being viewed by others

Script Identification of Movie Titles from Posters

MOPO-HBT: A movie poster dataset for title extraction and recognition

LayoutGCN: A Lightweight Architecture for Visually Rich Document Understanding

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Script Identification of Movie Titles from Posters

MOPO-HBT: A movie poster dataset for title extraction and recognition

LayoutGCN: A Lightweight Architecture for Visually Rich Document Understanding

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation