Abstract
A new technique for text detection inside a complex graphical background, its extraction, and enhancement to be easily recognized using the optical character recognition (OCR). The technique uses a deep neural network for feature extraction and classifying the text as containing text or not. An error handling and correction (EHC) technique is used to resolve classification errors. A multiple frame integration (MFI) algorithm is introduced to extract the graphical text from its background. Text enhancement is done by adjusting the contrast, minimize noise, and increasing the pixels resolution. A standalone software Component-Off-The-Shelf (COTS) is used to recognize the text characters and qualify the system performance. Generalization for multilingual text is done with the proposed solution. A newly created dataset containing videos with different languages is collected for this purpose to be used as a benchmark. A new HMVGG16 convolutional neural network (CNN) is used for frame classification as text containing or non-text containing, has accuracy equals to 98%. The introduced system weighted average caption extraction accuracy equals to 96.15%. The correctly detected characters (CDC) average recognition accuracy using the Abbyy SDK OCR engine equals 97.75%.
Similar content being viewed by others
References
Alves W, Hashimoto R (2010) Text regions extracted from scene images by ultimate attribute opening and decision tree classification. In: Proceedings of the 23rd Sibgrapi conference on graphics, patterns, and images
Audithan S, Chandrasekaran RM (2009) Document text extraction from document images using Haar discrete wavelet transform. Eur J Sci Res 36(04):502–512
Cho H, Sung M, Jun B (2016) Canny text detector: fast and robust scene text localization algorithm. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3566–3573
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: advances in neural information processing systems, pp 379–387
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142
Gomez L, Karatzas D (2017) Text proposals: a text specific selective search algorithm for word spotting in the wild. Pattern Recogn 70:60–74
Gorinski P, Lapata M (2018) What’s this movie about? A joint neural network architecture for movie content analysis. In: University of Edinburgh, Proceedings of NAACL-HLT, pp 1770–1781
Grover S, Arora K, Mitra S (2009) Text extraction from document images using edge information. In: IEEE India Council Conference
Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localization in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324
Haq I, Muhammad K, Hussain T, Kwon S, Sodanil M, Baik S, Lee M (2019) Movie scene segmentation using object detection and set theory. Int J Distrib Sens Netw 15(6)
He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He T, Huang W, Qiao Y, Yao J (2016b) Text attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541
He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: Computer vision and pattern recognition, Cornell University, arXiv:1709.00138
Hesham M, Hani B, Fouad N, Amer E (2018) Smart trailer: automatic generation of movie trailer using only subtitles. In: First international workshop on deep and representation learning (IWDRL), IEEE, pp 26–30
Hoang T, Tabbone S (2010) Text extraction from graphical document images using sparse representation. In: Proceedings of the 9th IAPR international workshop on document analysis systems, pp 143–150
https://pixabay.com/vectors/bitcoin-money-cryptocurrency-4851383/. Accessed 28 Sept 2020
https://www.dreamstime.com/photos-images/autonomous-car.html. Accessed 28 Sept 2020
https://www.freepik.com/premium-photo/engineer-check-control-welding-robotics-automatic-arms-machine_5284742.htm. Accessed 28 Sept 2020
https://www.robots.ox.ac.uk/~vgg/software/textspot/. Accessed 10 June 2020
Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced MSER trees. In: European conference on computer vision, Springer, Zurich, pp 497–511
Indermühle E, Liwicki M, Bunke H (2010) IAMonDo-database: an online handwritten document database with non-uniform contents. In: Proceedings of the 9th IAPR international workshop on document analysis systems (DAS ’10), pp 97–104
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20
Jung K, Kim E (2004) Automatic text extraction for content-based image indexing. In: Proceedings of PAKDD, pp 497–507
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: AAAI, pp 4161–4167
Liu X, Samarabandu J (2006) Multiscale edge-based text extraction from complex images. In: Proceedings of the international conference of multimedia and Expo, pp 1721–1724
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lu Q, Wang Y (2019) Automatic text location of multimedia video for subtitle frame. J Ambient Intell Humaniz Comput
Moradi M, Mozaffari S, Orouji A (2010) Farsi/Arabic text extraction from video images by corner detection. In: 2010 6th Iranian conference on machine vision and image processing, pp 1–6
Nagabhushan P, Nirmala S (2009) Text extraction in complex color document images for enhanced readability. Intell Inf Manag 2:120–133
Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: Computer vision and pattern recognition (CVPR) IEEE conference, pp 3538–3545
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, Santiago: IEEE Computer Society, pp 1520–1528
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Shi J, Tomasi C (1994) Good features to track. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 593–600
Shivakumara P, Dutta A, Pal U, Tan C (2010) A new method for handwritten scene text detection in video. In: International conference on frontiers in handwriting recognition, pp 16–18
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas: IEEE Computer Society, arXiv:1604.03540
Sun L, Huo Q, Jia W, Chen K (2015) A robust approach for text detection from natural scene images. Pattern Recogn 48(9):2906–2920
Tian S, Pan Y, Huang C, Lu S, Yu K, Tan C (2015) Text flow: a unified text detection system in natural scene images. In: Proceedings of the IEEE international conference on computer vision, pp 4651–4659
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, pp 56–72
Vijayakumar V, Nedunchezhianm R (2011) A novel method for super imposed text extraction in a sports video. Int J Comput Appl 15(1):1
Xiang D, Yan H, Chen X, Cheng Y (2010) Offline Arabic handwriting recognition system based on HMM. In: 2010 3rd International conference on computer science and information technology
Yang C, Pei W, Wu L, Yin X (2018) Chinese text-line detection from web videos with fully convolutional networks. Big Data Anal 3(2):1
Ye Q, Doermann D (2015) Text detection recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500
Yin XC, Pei WY, Zhang J, Hao H (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937
Zamberletti A, Noce L, Gallo I (2014) Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: Asian conference on computer vision, pp 91–105
Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry based text line detection in natural scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2558–2567
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas: IEEE Computer Society, pp 4159–4167
Zhang S, Liu Y, Jin L, Luo C (2018) Feature enhancement network: a refined scene text detector. In: Thirty-second AAAI conference on artificial intelligence (AAAI-18), pp 2612–2619
Zhong Z, Jin L, Zhang S, Feng Z (2016) DeepText: a unified framework for text proposal generation and text detection in natural images. In: Computer vision and pattern recognition, Cornell University, arXiv:1605.07314
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Computer vision and pattern recognition, Cornell University, arXiv:1704.03155
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36
Acknowledgements
I would like to thank God for his help. Special thanks to the RDI team, Dr. Sven Dickinson, and my faculty department members for supporting me with their experience and data set used in my research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Elshahaby, H., Rashwan, M. An end to end system for subtitle text extraction from movie videos. J Ambient Intell Human Comput 13, 1853–1865 (2022). https://doi.org/10.1007/s12652-021-02951-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-02951-1