Abstract
This paper deals with Korean-English bilingual videotext recognition for news headline generation. Because videotext contains semantic content information, it can be effectively used for understanding videos. Despite its usefulness, it is a challengeable task to apply text recognition technologies to practical video applications because of the computational complexity and recognition accuracy. In this paper, we propose a novel Korean-English bilingual videotext recognition method to overcome the computational complexity as well as achieve comparable recognition accuracy. To recognize both Korean and English characters effectively, the proposed method employs an elaborate split-merge strategy in which the split segments are merged into characters using the recognition scores. Moreover, it avoids unnecessary computation using geometric features such as squareness and internal gap, and thus its computational overhead is remarkably reduced. Therefore, the proposed method is successfully employed in generating news headlines. The effectiveness and efficiency of the proposed method are verified by extensive experiments on a challenging database containing 51,290 text images (176,884 characters).
Similar content being viewed by others
References
Schoeffmann, K., Hopfgartner, F., Marques, O., Boeszoermenyi, L., Jose, J.M.: Video browsing interfaces and applications: a review. SPIE Rev. 1, 018004 (2010)
Lee, C.C., Shih, C.Y., Huang, H.M.: Story-related caption detection and localization in news video. Opt. Eng. 48, 037005 (2009)
Dimitrova, N., Zhang, H.J., Shahraray, B., Sezan, I., Zakhor, A., Huang, T.: Applications of video content analysis and retrieval. IEEE Multimed. 9, 43–55 (2002)
Dimitrova, N., McGee, T., Elenbaas, H.: Video key-frame extraction and filtering: a key-frame is not a key-frame to everyone. In: Proceedings of ACM International Conference on Knowledge and Information Management, pp. 113–120 (1997)
Jasinschi, R. S., Dimitrova, N., McGee, T., Agnihotri, L., Zimmerman, J., Li, D.: Integrated multimedia processing for topic segmentation and classification. In: Proceedings of IEEE International Conference on Image Processing, pp. 366–369 (2001)
Kim, J.G., Chang, H.S., Kang, K., Kim, M., Kim, J., Kim, H.M.: Summarization of news video and its description for content-based access. Int. J. Imaging Syst. Technol. 13, 267–274 (2003)
Merialdo, B., Lee, K.T., Luparello, D., Roudaire, J.: Automatic construction of personalized TV news program. In: Proceedings of ACM International Conference on Multimedia, pp. 323–331 (1999)
Liu, J., He, Y., Peng, M.: NewsBR: a content-based news video browsing and retrieval system. In: Proceedings of Computer and Information Technology, pp. 857–863 (2004)
Kim, S.K., Hwang, D.S., Kim, J.Y., Seo, Y.S.: An effective news anchorperson shot detection method based on adaptive audio/visual method generation. Lect. Notes Comput. Sci. 3568, 276–285 (2005)
Gao, X., Li, J., Yang, B.: A graph-theoretical clustering based anchor person shot detection for news video indexing. In: Proceedings of International Conference on Computational Intelligence and Multimedia Applications, pp. 108–113 (2003)
Zhu, W., Toklu, C., Liou, S.P.: Automatic news video segmentation and categorization based on closed-captioned text. In: Proceedings of IEEE International Conference on Multimedia and Expo, pp. 1036–1039 (2001)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 511–518 (2001)
Jung, C., Liu, Q., Kim, J.K.: A new approach for text segmentation using a stroke filter. Signal Process. 88, 1907–1916 (2008)
Jung, C., Liu, Q., Kim, J.K.: Accurate text localization in images based on SVM output scores. Image Vis. Comput. 27, 1295–1301 (2009)
Jung, C., Liu, Q., Kim, J.K.: A stroke filter and its application to text localization. Pattern Recogn. Lett. 30, 114–122 (2009)
Sato, T., Kanade, T., Highes, E.K., Smith, M.A.: Video OCR for digital news archive. In: Proceedings of IEEE Workshop on Content-Based Access of Image and Video Database, pp. 52–60 (1998)
Sato, T., Kanade, T., Hughes, E.K., Smith, M.A., Satoh, S.: Video OCR: indexing digital news libraries by recognition of superimposed captions. Multimed. Syst. 7, 385–395 (1999)
Chang, F., Chen, G.C., Lin, C.C., Lin, W.H.: Caption analysis and recognition for building video indexing systems. Multimed. Syst. 10, 344–355 (2005)
Lee, S., Kim, J.: Complementary combination of holistic and component analysis for recognition of low-resolution video character image. Pattern Recogn. Lett. 29, 383–391 (2008)
Wang, F., Ngo, C.W., Pong, T.C.: Structuring low-quality videotaped lectures for cross-reference browsing by video text analysis. Pattern Recogn. 41, 3257–3269 (2008)
Park, J., Lee, G., Kim, E., Lim, J., Kim, S., Yang, H., Lee, M., Hwang, S.: Automatic detection and recognition of Korean text in outdoor signboard images. Pattern Recogn. Lett. 31, 1728–1739 (2010)
Chang, Y., Chen, D., Zhang, Y., Yang, J.: An image-based automatic Arabic translation system. Pattern Recogn. 42, 2127–2134 (2009)
Wolf, C., Jolion, J.M.: Extraction and recognition of artificial text in multimedia documents. Pattern Anal. Appl. 6, 309–326 (2003)
Chen, D., Odobez, J.M., Bourlard, H.: Text detection and recognition in images and video frames. Pattern Recogn. 13, 595–608 (2004)
Tang, X., Gao, X., Liu, J., Zhang, H.: A spatio-temporal approach for video caption detection and recognition. IEEE Trans. Neural Netw. 13, 961–971 (2002)
Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuit Syst. Video Technol. 12, 256–267 (2002)
Yang, H., Siebert, M., Lühne, P., Sack, H., Meinel, C.: Automatic lecture video indexing using video OCR technology. In: Proceedings of IEEE International Symposium on Multimedia, pp. 111–116 (2011)
Sarfraz, M.S., Shahzad, A., Elahi, M.A., Fraz, M.: Real-time automatic license plate recognition for CCTV forensic applications. J. Real Time Image Process. (2011). doi:10.1007/s11554-011-0232-7
Chin, S., Choi, Y., Choo, M.: A skew free Korean character recognition system for PDA devices. In: Proceedings of International Conference on Intelligent Computing, pp. 476–483 (2006)
Sharma, N., Pal, U., Blumenstein, M.: Recent advances in video based document processing: a review. In: Proceedings of IAPR International Workshop on Document Analysis Systems, pp. 63–68 (2012)
Kim, M.S., Cho, K.T., Kwag, H.K., Kim, J.H.: Segmentation of handwritten characters for digitalizing Korean historical documents. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 114–124 (2004)
Tseng, Y.H., Lee, H.J.: Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recogn. Lett. 20, 791–806 (1999)
Kang, K.W., Kim, J.H.: Utilization of hierarchical, stochastic relationship modeling for Hangul character recognition. IEEE Trans. Pattern Recogn. Mach. Intell. 26, 1185–1195 (2004)
Kim, J.H., Kim, K.K., Chien, S.I.: Korean and English character recognition system using hierarchical classification neural network. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp. 759–764 (1995)
Lim, K.T.: A study on machine printed character recognition based on character type classification. J. Electron. Eng. Korea 40, 26–39 (2003)
Kwak, N., Choi, C.H.: Input feature selection by mutual information based on Parzen window. IEEE Trans. Pattern Recogn. Mach. Intell. 24, 1667–1771 (2002)
Fisher, R.A.: The statistical utilization of multiple measurements. Ann. Eugen. 8, 376–386 (1938)
Ryu, S., Kim, J.H.: A language model using variable length tokens for open-vocabulary Hangul text recognition. Pattern Recogn. 37, 1549–1552 (2004)
Ryu, S., Kim, J.H.: Learning the lexicon from raw texts for open-vocabulary Korean word recognition. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 202–206 (2003)
Bagdanov, A., Kanai, J.: Projection profile based skew estimation algorithm for JBIG compressed images. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 401–405 (1997)
Acknowledgments
The partial work reported in this paper was conducted while the first author was with Samsung Electronics. The authors are grateful to Prof. Jinhyung Kim and Mr. Kyutae Cho in KAIST for their helpful discussion and the anonymous reviewers for their useful comments. This work was supported by the National Natural Science Foundation of China (Nos. 61050110144, 60803097, 60972148, 60971128, 60970066, 61072106, 61075041, 61003198, 61001206, and 61077009), the National Research Foundation for the Doctoral Program of Higher Education of China (No. 200807010003 and 20100203120005), the National Science and Technology Ministry of China (Nos. 9140A07011810DZ0107 and 9140A07021010DZ0131), the Key Project of Ministry of Education of China (No. 108115), and the Fundamental Research Funds for the Central Universities (Nos. JY10000902001, K50510020001, and JY10000902045).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jung, C., Jiao, L. Korean-English bilingual videotext recognition for news headline generation based on a split-merge strategy. J Real-Time Image Proc 11, 167–177 (2016). https://doi.org/10.1007/s11554-012-0298-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-012-0298-x