Film segmentation and indexing using autoassociative neural networks

Rao, K. Sreenivasa; Nandi, Dipanjan; Koolagudi, Shashidhar G.

doi:10.1007/s10772-013-9206-4

Film segmentation and indexing using autoassociative neural networks

Published: 28 August 2013

Volume 17, pages 65–74, (2014)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

K. Sreenivasa Rao¹,
Dipanjan Nandi¹ &
Shashidhar G. Koolagudi²

332 Accesses
Explore all metrics

Abstract

In this paper, Autoassociative Neural Network (AANN) models are explored for segmentation and indexing the films (movies) using audio features. A two-stage method is proposed for segmenting the film into sequence of scenes, and then indexing them appropriately. In the first stage, music and speech plus music segments of the film are separated, and music segments are labelled as title and fighting scenes based on their position. At the second stage, speech plus music segments are classified into normal, emotional, comedy and song scenes. In this work, Mel frequency cepstral coefficients (MFCCs), zero crossing rate and intensity are used as audio features for segmentation and indexing the films. The proposed segmentation and indexing method is evaluated on manual segmented Hindi films. From the evaluation results, it is observed that title, fighting and song scenes are segmented and indexed without any errors, and most of the errors are observed in discriminating the comedy and normal scenes. Performance of the proposed AANN models used for segmentation and indexing of the films, is also compared with hidden Markov models, Gaussian mixture models and support vector machines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Li, Y., Narayanan, S., & Kuo, C. C. J. (2003). Movie content analysis indexing and skimming (Vol. 6). Dordrecht: Kluwer Academic. Video Mining, Chap. 5.
Google Scholar
Yegnanarayana, B., & Kishore, S. P. (2002). AANN an alternative to GMM for pattern recognition. Neural Networks, 15, 459–469.
Article Google Scholar
Haykin, S. (1999). Neural networks: a comprehensive foundation. New Delhi: Pearson Education Aisa, Inc.
MATH Google Scholar
Yegnanarayana, B. (1999). Artificial neural networks. New Delhi: Prentice-Hall.
Google Scholar
Rao, K. S. (2010). Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Computer Speech & Language, 24(1), 474–494.
Article Google Scholar
Mallidi, S. H. R., Prahallad, K., Gangashetty, S. V., & Yegnanarayana, B. (2010). Significance of pitch synchronous analysis for speaker recognition using AANN models. In INTERSPEECH-2010, Makuhari, Japan, Sept. 2010.
Google Scholar
Bajpai, A., & Yegnanarayana, B. (2004). Exploring features for audio clip classification using LP residual and AANN models. In The international conference on intelligent sensing and information processing 2004 (ICISIP 2004), Chennai, India, Jan. 2004 (pp. 305–310).
Chapter Google Scholar
Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001). Source and system features for speaker recognition using AANN models. In Proc. IEEE int. conf. acoust., speech, signal processing, Salt Lake City, Utah, USA, May 2001 (pp. 409–412).
Google Scholar
Mary, L., & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In International conference on intelligent sensing and information processing (pp. 317–320). New York: IEEE Press. doi:10.1109/ICISIP.2004.1287674.
Google Scholar
Mary, L., Rao, K. S., Gangashetty, S., & Yegnanarayana, B. (2004). Neural network models for capturing duration and intonation knowledge for language and speaker identification. In Int. conf. on cognitive and neural systems (ICCNS), Boston, MA, USA, May 2004.
Google Scholar
Rao, K. S. (2011). Role of neural network models for developing speech systems. Sadhana (Springer), 36, 783–836.
Article Google Scholar
Rao, K. S. (2008). Acquisition and incorporation prosody knowledge for speech systems in Indian languages. Ph.D. thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology, Madras, Chennai, India, May 2008.
Mary, L., Rao, K. S., & Yegnanarayana, B. (2005). Neural network classifiers for language identification using syntactic and prosodic features. In 2nd int. conf. intelligent sensing and information processing (ICISIP-2005), Chennai, India, Jan. 2005.
Google Scholar
Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech & Language, 21, 282–295.
Article Google Scholar
Mary, L., Rao, K. S., Gangashetty, S., & Yegnanarayana, B. (2004). Modeling syllable duration in Indian languages using neural networks. In Proc. IEEE int. conf. acoust., speech, signal processing, Montreal, Quebec, Canada, May 2004.
Google Scholar
Rao, K. S. (2008). Modeling supra-segmental features of syllables using neural networks. In Speech, audio, image and biomedical signal processing using neural networks (pp. 71–95). Berlin: Springer.
Chapter Google Scholar
Rao, K. S., & Yegnanarayana, B. (2004). Two-stage duration model for Indian languages using neural networks. In Lecture notes in computer science: Vol. 3316. Neural information processing (pp. 1179–1185). Berlin: Springer.
Chapter Google Scholar
Reddy, V. R., & Rao, K. S. (2013). Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis. Computer Speech & Language, 27(10), 1105–1126.
Article Google Scholar
Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In IEEE international conference on signal processing and communication (SPCOM), IISc Bangalore, India, July 2010.
Google Scholar
Koolagudi, S. G., & Rao, K. S. (2008). Neural network models for capturing prosodic knowledge for emotion recognition. In 12th int. conf. on cognitive and neural systems (ICCNS), Boston, MA, USA, May 2008.
Google Scholar
Rao, K. S., & Yegnanarayana, B. (2004). Neural network models for text-to-speech synthesis. In 5th international conference on knowledge based computer systems (KBCS-2004), Hyderabad, India, Dec. 2004 (pp. 520–530).
Google Scholar
Rao, K. S., Saroj, V. K., Maity, S., & Koolagudi, S. G. (2011). Recognition of emotions from video using neural network models. Expert Systems with Applications, 38, 13181–13185.
Article Google Scholar
Rao, K. S., Yadav, J., Sarkar, S., Koolagudi, S. G., & Vuppala, A. K. (2012). Neural network based feature transformation for emotion independent speaker identification. International Journal of Speech Technology, 15(3), 335–349.
Article Google Scholar
Rao, K. S., Laskar, R. H., & Koolagudi, S. G. (2007). Voice transformation by mapping the features at syllable level. In Lecture notes in computer science: Vol. 4815. Pattern recognition and machine intelligence (pp. 479–486).
Chapter Google Scholar
Makhoul, J., Kubala, F., Leek, T., Liu, D., Nguyen, L., Schwartz, R., & Srivastava, A. (2000). Speech and language technologies for audio indexing and retrieval. In Proceedings of the IEEE (Vol. 88, pp. 1338–1353).
Google Scholar
Johnson, S., & Woodland, P. C. (2000). A method for direct audio search with applications to indexing and retrieval. In Proc. IEEE int. conf. acoust., speech, signal processing (Vol. 1, pp. 452–455).
Google Scholar
Brezeale, D., & Cook, D. J. (2006). Using closed captions and visual features to classify movies by genre. In 7th international workshop on multimedia data mining.
Google Scholar
Fischer, S., Lienhart, R., & Effelsberg, W. (1995). Automatic recognition of film genres. In ACM international conference on multimedia (pp. 295–304).
Google Scholar
Huang, H.-Y., Shih, W.-S., & Hsu, W.-H. (2008). A film classifier based on low-level visual features. Journal of Multimedia, 3, 26–33.
Article Google Scholar
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42, 145–175.
Article MATH Google Scholar
Ramachandran, C., Malik, R., Jin, X., Gao, J., Nahrstedt, K., & Han, J. (2009). Videomule: a consensus learning approach to multi-label classification from noisy user-generated videos. In ACM international conference on multimedia.
Google Scholar
Rasheed, Z., & Shah, M. (2002). Movie genre classification by exploiting audio-visual features of previews. In International conference on pattern recognition (ICPR).
Google Scholar
Rasheed, Z., Sheikh, Y., & Shah, M. (2003). On the use of computable features for film classification. IEEE Transactions on Circuits and Systems for Video Technology, 15, 52–64.
Article Google Scholar
Roach, M., & Mason, J. (2001). Classification of video genre using audio. In Proc. Eurospeech (Vol. 15, pp. 2693–2696).
Google Scholar
van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2008). Evaluation of color descriptors for object and scene recognition. In Proc. IEEE conf. on computer vision and pattern recognition (CVPR).
Google Scholar
Wang, Z., Zhao, M., Song, Y., Kumar, S., & Li, B. (2010). Youtubecat: Learning to categorize wild web videos. In Proc. IEEE conf. on computer vision and pattern recognition (CVPR).
Google Scholar
Wang, Y.-K., & Chang, C.-Y. (2003). Movie scene classification using hidden Markov model. In 16th IPPR conference on computer vision, graphics and image processing (CVGIP 2003), Kinmen, ROC, Aug. 2003 (pp. 196–202).
Google Scholar
Yeung, M. M., & Liu, B. L. (1996). Time-constrained clustering for segmentation of video into story unit. In International conference on pattern recognition (pp. 375–380).
Chapter Google Scholar
Delezoide, B. (2006). Multimedia movie segmentation using low-level and semantic features.
Zhou, H., Hermans, T., Karandikar, A. V., & Rehg, J. M. (2010). Movie genre classification via scene categorization. In ACM international conference on multimedia, Firenze, Italy, Oct. 2010.
Google Scholar
Delezoide, B. (2005). Hierarchical film segmentation using audio and visual similarity. In Proceedings of the IEEE international conference on multimedia and Expo (ICME 05).
Google Scholar
Zhai, Y., Rasheed, Z., & Shah, M. (2004). Finite state machines in movie scene classification. In 17th international conference on pattern recognition, Cambridge, UK.
Google Scholar
Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall.
Google Scholar
Quatieri, T. F. (2001). Discrete-time speech signal processing: principles and practice. Englewood Cliffs: Prentice Hall.
Google Scholar
Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589.
Article MATH Google Scholar
Hogg, R. V., & Ledolter, J. (1987). Engineering statistics. New York: Macmillan.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India
K. Sreenivasa Rao & Dipanjan Nandi
Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, 575025, Karnataka, India
Shashidhar G. Koolagudi

Authors

K. Sreenivasa Rao
View author publications
You can also search for this author inPubMed Google Scholar
Dipanjan Nandi
View author publications
You can also search for this author inPubMed Google Scholar
Shashidhar G. Koolagudi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to K. Sreenivasa Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, K.S., Nandi, D. & Koolagudi, S.G. Film segmentation and indexing using autoassociative neural networks. Int J Speech Technol 17, 65–74 (2014). https://doi.org/10.1007/s10772-013-9206-4

Download citation

Received: 10 April 2013
Accepted: 01 August 2013
Published: 28 August 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10772-013-9206-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Film segmentation and indexing using autoassociative neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of Audio-Based News Classification Using Machine Learning Techniques

Unsupervised story segmentation and indexing of broadcast news video

Research on sound classification based on SVM

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Film segmentation and indexing using autoassociative neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of Audio-Based News Classification Using Machine Learning Techniques

Unsupervised story segmentation and indexing of broadcast news video

Research on sound classification based on SVM

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now