Temporal video segmentation: detecting the end-of-act in circus performance videos

Iwan, Lukman H.; Thom, James A.

doi:10.1007/s11042-015-3130-3

Temporal video segmentation: detecting the end-of-act in circus performance videos

Published: 09 December 2015

Volume 76, pages 1379–1401, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lukman H. Iwan¹ &
James A. Thom¹

310 Accesses
9 Citations
Explore all metrics

Abstract

The segmentation into acts of a circus performance video is challenging as the content has similar characteristics to other performance videos but is quite different from movies, TV programs, and home videos. Segmentation is useful as a long duration circus show usually contains several shorter segments that are acts. We propose a new method for detecting end-of-act within circus performance videos. Unlike other temporal video segmentation methods, this method does not rely on shot detection techniques and uses audio and video content analysis separately. First is audio content analysis, for detecting applause on the circus audio stream. Second is image analysis. The applause is further analyzed to test whether this applause occurs at the end-of-act. An end-of-act is detected, if the image(s) before and after the applause are different or there are black frames just after the applause. Otherwise, it is not the end-of-act. The experiment to detect end-of-act on Circus Oz performance videos achieved a 92.27 % recall and 49.05 % precision, providing useful clues that assist human annotators to segment circus video into acts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Human Action Recognition and Prediction: A Survey

Article 28 March 2022

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

References

Briggs F, Raich R, Fern XZ (2009) Audio classification of bird species: a statistical manifold approach. 9th IEEE Int Conf Data Mining 51–60
Cai L-H, Lu L, Hanjalic A, Zhang H-J (2006) A flexible framework for key audio effects detection and auditory context inference. IEEE Trans Audio, Speech, Language Process 14:1026–1039
Article Google Scholar
Cao Y, Tavanapong W, Kim K, Oh JH (2003) Audio-assisted scene segmentation for story browsing. 2nd International Conference on Image and Video Retrieval. Springer-Verlag 446–455
Chasanis V, Kalogeratos A, Likas A (2009) Movie segmentation into scenes and chapters using locally weighted bag of visual words. Proceedings of the ACM International Conference on Image and Video Retrieval. ACM Press, New York, Article No. 35
Google Scholar
Chen L-H, Lai Y-C, Mark Liao H-YM (2008) Movie scene segmentation using background information. Pattern Recogn 41:1056–1065
Article MATH Google Scholar
Covell M, Baluja S, Fink M (2006) Advertisement detection and replacement using acoustic and visual repetition. IEEE 8th Workshop Multimed Sig Process 461–466
Duan L-Y, Wang J, Zheng Y, Jin JS, Lu H, Xu C (2006) Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis. Proceedings of the 14th annual ACM international conference on Multimedia. ACM, New York, pp 201–210
Google Scholar
Günsel B, Ferman AM, Tekalp AM (1998) Temporal video segmentation using unsupervised clustering and semantic object tracking. J Electronic Imag 7:592–604
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: and update. ACM SIGKDD Exporation Newslett 11:10–18
Article Google Scholar
Hanjalic A, Lagendijk R, Biemond J (1999) Automatically segmenting movies into logical story units. Proc 3rd Int Conf Visual Inform Inform Syst. Springer-Verlag 229–236
Harb H, Chen L (2007) A general audio classifier based on human perception motivated model. Multimed Tools Applic 34:375–395
Article Google Scholar
Jarina R, Olajec J (2007) Discriminative feature selection for applause sounds detection. Eighth Int Workshop Imag Anal Multimed Interact Service, Paper 13
Kijak E, Gravier G, Oisel L, Gros P (2006) Audiovisual integration for tennis broadcast structuring. Multimed Tools Applic 30:289–311
Article Google Scholar
Kiranyaz S, Qureshi AF, Gabbouj M (2006) A generic audio classification and segmentation approach for multimedia indexing and retrieval. IEEE Trans Audio, Speech, Language Process 14:1062–1081
Article Google Scholar
Lesser N, Ellis DPW (2005) Clap detection and discrimination for rhythm therapy. Proc IEEE Int Conf Acoustics, Speech, Sig Process. Philadelphia, Pennsylvania 37–40
Li Y-X, He Q-H, Kwong S, Li T, Yang J-C (2009) Characteristics-based effective applause detection for meeting speech. Signal Process 89:1625–1633
Article MATH Google Scholar
Lienhart R, Pfeiffer S, Effelsberg W (1999) Scene determination based on video and audio features. IEEE Int Conf Multimed Comput Syst. Florence 685–690
Lienhart R, Kuhmunch C, Effelsberg W et al. (1997) On the detection and recognition of television commercials. IEEE Int Conf Multimed Comput Syst 509–516
Liu C, Wang D, Zhu J, Zhang B (2012) Learning a contextual multi-thread model for movie/TV scene segmentation. IEEE Trans Multimed 15:884–897
Google Scholar
Liu N, Zhao Y, Zhu Z (2010) Commercial recognition in TV streams using coarse-to-fine matching strategy. Proc 11th Pacific Rim Conf Adv Multimed Inform Process: Part I. Shanghai, China 296–307
Liu N, Zhao Y, Zhu Z, Lu H (2011) Exploiting visual-audio-textual characteristics for automatic TV commercial block detection and segmentation. IEEE Trans Multimed 13:961–973
Article Google Scholar
Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. Multimedia Systems 8:482–492
Article Google Scholar
Manoj C, Magesh S, Sankaran AS, Manikandan MS (2011) Novel approach for detecting applause in continuous meeting speech. 3rd Int Conf Electronics Comput Technol (ICECT) 182–186
McEnnis D, McKay C, Fujinaga I, Depalle P (2005) JAudio: a feature extraction library. 6th Int Conf Music Inform Retriev 600–603
McKay C (2010) Automatic music classification with jMIR. Ph.D. Dissertation. McGill University, Canada
Olajec J, Jarina R, Kuba M (2006) GA-based feature extraction for clapping sound detection. 8th seminar on neural network applications in electrical engineering. NEUREL 2006:21–25
Google Scholar
Sadlier DA, Marlow S, O’Connor NE, Murphy N (2001) Automatic TV advertisement detection from MPEG Bitstream. Proc 1st Int Workshop Pattern Recognit Inform Syst: In conjunction with ICEIS 2001. ICEIS Press 14–25
Sidiropoulos P (2011) Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans Circ Syst Video Technol 21:1163–1177
Article Google Scholar
Silva P (2012) Classification, segmentation and chronological prediction of cinematic sound. 11th Int Conf Mach Learn Applic (ICMLA) 2:369–374
Google Scholar
Snoek CGM, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimed Tools Applic 25:5–35
Article Google Scholar
Subashini K, Palanivel S (2012) Audio-video based segmentation and classification using SVM and AANN. J Comput Applic 53:43–49
Google Scholar
Theodorou T, Mporas L, Fakotakis N (2012) Automatic sound classification of radio broadcast news. J Sig Process, Imag Process, Pattern Recognit 5:37–47
Google Scholar
Yuan J, Wang H, Xiao L, Zheng W, Li J, Lin F, Zhang B (2007) A formal study of shot boundary detection. IEEE Trans Circ Syst Video Technol 17:168–186
Article Google Scholar
Zhang T, Kuo C-CJ (1999) Hierarchical classification of audio data for archiving and retrieving. Proc IEEE Int Conf Acoustics, Speech, Sig Process 6:3001–3004
Google Scholar

Download references

Acknowledgments

This research was supported under Australian Research Council’s Linkage Projects funding scheme (project number LP100200118). We would like to thank our partners on the project: Australia Research Council, RMIT University, LaTrobe University, Circus Australia Ltd, Australia Council for the Arts, and Victoria Arts Centre Trust. We thank the anonymous referees for their helpful feedback and suggestion improvements to the paper.

Author information

Authors and Affiliations

School of Computer Science and Information Technology, RMIT University, Melbourne, Victoria, Australia
Lukman H. Iwan & James A. Thom

Authors

Lukman H. Iwan
View author publications
You can also search for this author in PubMed Google Scholar
James A. Thom
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lukman H. Iwan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iwan, L.H., Thom, J.A. Temporal video segmentation: detecting the end-of-act in circus performance videos. Multimed Tools Appl 76, 1379–1401 (2017). https://doi.org/10.1007/s11042-015-3130-3

Download citation

Received: 08 March 2015
Revised: 16 October 2015
Accepted: 30 November 2015
Published: 09 December 2015
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11042-015-3130-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Temporal video segmentation: detecting the end-of-act in circus performance videos

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Human Action Recognition and Prediction: A Survey

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Temporal video segmentation: detecting the end-of-act in circus performance videos

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Human Action Recognition and Prediction: A Survey

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation