Abstract
Recognizing scene information in images or videos, such as locating the objects and answering “Where am I?”, has attracted much attention in computer vision research field. Many existing scene recognition methods focus on static images, and cannot achieve satisfactory results on videos which contain more complex scenes features than images. In this paper, we propose a robust movie scene recognition approach based on panoramic frame and representative feature patch. More specifically, the movie is first efficiently segmented into video shots and scenes. Secondly, we introduce a novel key-frame extraction method using panoramic frame and also a local feature extraction process is applied to get the representative feature patches (RFPs) in each video shot. Thirdly, a Latent Dirichlet Allocation (LDA) based recognition model is trained to recognize the scene within each individual video scene clip. The correlations between video clips are considered to enhance the recognition performance. When our proposed approach is implemented to recognize the scene in realistic movies, the experimental results shows that it can achieve satisfactory performance.
Similar content being viewed by others
References
Gupta L, Pathangay V, Dyana A, Das S. Indoor versus outdoor scene classification using probabilistic neural network. EURASIP Journal on Applied Signal Processing, 2007, 2007(1).
Li F F, Perona P. A Bayesian hierarchical model for learning natural scene categories. In Proc. the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2005, pp. 524-531.
Wu J X, Rehg J M. CENTRIST: A visual descriptor for scene categorization. IEEE Trans. Pattern Analysis and Machine Intelligence, 2011, 33(8): 1489-1501.
Liu J, Shah M. Scene modeling using co-clustering. In Proc. the 11th Int. Conf. Computer Vision, Oct. 2007.
Wu J X, Rehg J M. Where am I: Place instance and category recognition using spatial PACT. In Proc. the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2008.
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2006, pp. 2169-2178.
Marszalek M, Laptev I, Schmid C. Actions in context. In Proc. the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, pp. 2929-2936.
Engels C, Deschacht K, Becker J H et al. Automatic annotation of unique locations from video and text. In Proc. the 21st British Machine Vision Conference, Aug. 31 - Sept. 3, 2010.
Zhou X, Zhuang X D, Tang H et al. A novel Gaussianized vector representation for natural scene categorization. In Proc. the 19th Int. Conf. Pattern Recognition, Dec. 2008, pp. 1-4.
Greene M R, Oliva A. Recognition of natural scenes from global properties: Seeing the forest without representing the trees. Cognitive Psychology, 2009, 58(2): 137-176.
Xiao J X, Hays J, Ehinger K A et al. SUN database: Large-scale scene recognition from abbey to zoo. In Proc. the 23rd IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp. 3485-3492.
Ando R, Shinoda K, Mochizuki T. A robust scene recognition system for baseball broadcast using data-driven approach. In Proc. the 6th ACM Int. Conf. Image and Video Retrieval, Jul. 2007, pp. 186-193.
Huang J C, Liu Z, Wang Y. Joint scene classification and segmentation based on hidden Markov model. IEEE Trans. Multimedia, 2005, 7(3): 538-550.
Schaffalitzky F, Zisserman A. Automated location matching in movies. Computer Vision and Image Understanding, 2003, 92(2/3): 236-264.
Héritier M, Gagnon L, Foucher S. Places clustering of full-length film key-frames using latent aspect modeling over SIFT matches. IEEE Trans. Circuits and Systems for Video Technology, 2009, 19(6): 832-841.
Héritier M, Foucher S, Gagnon L. Key-places detection and clustering in movies using latent aspects. In Proc. the 14th IEEE Int. Conf. Image Processing, Sept. 16 - Oct. 19, 2007, pp. 225-228.
Bosch A, Zisserman A, Muãoz X. Scene classification via pLSA. In Proc. the 9th European Conference on Computer Vision, May 2006, pp. 517-530.
Ni K, Kannan A, Criminisi A, Win J. Epitomic location recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 2009, 31(12): 2158-2167.
Gao G Y, Ma H D. Accelerating shot boundary detection by reducing spatial and temporal redundant information. In Proc. the 2011 IEEE Int. Conf. Multimedia and Expo, Jul. 2011, pp. 1-6.
Gao G Y, Ma H D. Multi-modality movie scene detection using kernel canonical correlation analysis. In Proc. the 21st Int. Conf. Pattern Recogntion, Nov. 2012, pp. 3074-3077.
Zeng X L, Hu W M, Liy W et al. Key-frame extraction using dominant-set clustering. In Proc. the 2008 IEEE Int. Conf. Multimedia and Expo, Jun. 2008, pp. 1285-1288.
Rasheed Z, Shah M. Detection and representation of scenes in videos. IEEE Trans. Multimedia, 2005, 7(6): 1097-1105.
Xiao J X, Ehinger K A, Oliva A, Torralba A. Recognizing scene viewpoint using panoramic place representation. In Proc. the 2012 IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp. 2695-2702.
Ghanem B, Zhang T Z, Ahuja N. Robust video registration applied to field-sports video analysis. In Proc. the 2012 IEEE Int. Conf. Acoustics, Speech and Signal Processing, March 2012.
Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110.
Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10): 761-767.
Mikolajczyk K, Schmid C. Scale & affine invariant interest point detectors. International Journal of Computer Vision, 2004, 60(1): 63-86.
Bourdev L, Malik J, Poselets: Body-part detectors trained using 3D pose annotations. In Proc. the 12th Int. Conf. Computer Vision, Sept. 29 - Oct. 2, 2009, pp. 1365-1372.
Zhao W L, Ngo C, Tan H K, Wu X. Near duplicate keyframe identification with interest point matching and pattern learning. IEEE Trans. Multimedia, 2007, 9(5): 1037-1048.
Fellbaum C (editor). Wordnet: An Electronic Lexical Database. MIT Press, 1998.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research is supported by the National Funds for Distinguished Young Scientists of China under Grant No. 60925010, the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant No. 20120005130002, the Cosponsored Project of Beijing Committee of Education, the Funds for Creative Research Groups of China under Grant No. 61121001, and the Program for Changjiang Scholars and Innovative Research Team in University of China under Grant No. IRT1049.
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(DOC 28 kb)
Rights and permissions
About this article
Cite this article
Gao, GY., Ma, HD. Movie Scene Recognition Using Panoramic Frame and Representative Feature Patches. J. Comput. Sci. Technol. 29, 155–164 (2014). https://doi.org/10.1007/s11390-014-1418-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-014-1418-9