Abstract
The segmentation into acts of a circus performance video is challenging as the content has similar characteristics to other performance videos but is quite different from movies, TV programs, and home videos. Segmentation is useful as a long duration circus show usually contains several shorter segments that are acts. We propose a new method for detecting end-of-act within circus performance videos. Unlike other temporal video segmentation methods, this method does not rely on shot detection techniques and uses audio and video content analysis separately. First is audio content analysis, for detecting applause on the circus audio stream. Second is image analysis. The applause is further analyzed to test whether this applause occurs at the end-of-act. An end-of-act is detected, if the image(s) before and after the applause are different or there are black frames just after the applause. Otherwise, it is not the end-of-act. The experiment to detect end-of-act on Circus Oz performance videos achieved a 92.27 % recall and 49.05 % precision, providing useful clues that assist human annotators to segment circus video into acts.
Similar content being viewed by others
References
Briggs F, Raich R, Fern XZ (2009) Audio classification of bird species: a statistical manifold approach. 9th IEEE Int Conf Data Mining 51–60
Cai L-H, Lu L, Hanjalic A, Zhang H-J (2006) A flexible framework for key audio effects detection and auditory context inference. IEEE Trans Audio, Speech, Language Process 14:1026–1039
Cao Y, Tavanapong W, Kim K, Oh JH (2003) Audio-assisted scene segmentation for story browsing. 2nd International Conference on Image and Video Retrieval. Springer-Verlag 446–455
Chasanis V, Kalogeratos A, Likas A (2009) Movie segmentation into scenes and chapters using locally weighted bag of visual words. Proceedings of the ACM International Conference on Image and Video Retrieval. ACM Press, New York, Article No. 35
Chen L-H, Lai Y-C, Mark Liao H-YM (2008) Movie scene segmentation using background information. Pattern Recogn 41:1056–1065
Covell M, Baluja S, Fink M (2006) Advertisement detection and replacement using acoustic and visual repetition. IEEE 8th Workshop Multimed Sig Process 461–466
Duan L-Y, Wang J, Zheng Y, Jin JS, Lu H, Xu C (2006) Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis. Proceedings of the 14th annual ACM international conference on Multimedia. ACM, New York, pp 201–210
Günsel B, Ferman AM, Tekalp AM (1998) Temporal video segmentation using unsupervised clustering and semantic object tracking. J Electronic Imag 7:592–604
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: and update. ACM SIGKDD Exporation Newslett 11:10–18
Hanjalic A, Lagendijk R, Biemond J (1999) Automatically segmenting movies into logical story units. Proc 3rd Int Conf Visual Inform Inform Syst. Springer-Verlag 229–236
Harb H, Chen L (2007) A general audio classifier based on human perception motivated model. Multimed Tools Applic 34:375–395
Jarina R, Olajec J (2007) Discriminative feature selection for applause sounds detection. Eighth Int Workshop Imag Anal Multimed Interact Service, Paper 13
Kijak E, Gravier G, Oisel L, Gros P (2006) Audiovisual integration for tennis broadcast structuring. Multimed Tools Applic 30:289–311
Kiranyaz S, Qureshi AF, Gabbouj M (2006) A generic audio classification and segmentation approach for multimedia indexing and retrieval. IEEE Trans Audio, Speech, Language Process 14:1062–1081
Lesser N, Ellis DPW (2005) Clap detection and discrimination for rhythm therapy. Proc IEEE Int Conf Acoustics, Speech, Sig Process. Philadelphia, Pennsylvania 37–40
Li Y-X, He Q-H, Kwong S, Li T, Yang J-C (2009) Characteristics-based effective applause detection for meeting speech. Signal Process 89:1625–1633
Lienhart R, Pfeiffer S, Effelsberg W (1999) Scene determination based on video and audio features. IEEE Int Conf Multimed Comput Syst. Florence 685–690
Lienhart R, Kuhmunch C, Effelsberg W et al. (1997) On the detection and recognition of television commercials. IEEE Int Conf Multimed Comput Syst 509–516
Liu C, Wang D, Zhu J, Zhang B (2012) Learning a contextual multi-thread model for movie/TV scene segmentation. IEEE Trans Multimed 15:884–897
Liu N, Zhao Y, Zhu Z (2010) Commercial recognition in TV streams using coarse-to-fine matching strategy. Proc 11th Pacific Rim Conf Adv Multimed Inform Process: Part I. Shanghai, China 296–307
Liu N, Zhao Y, Zhu Z, Lu H (2011) Exploiting visual-audio-textual characteristics for automatic TV commercial block detection and segmentation. IEEE Trans Multimed 13:961–973
Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. Multimedia Systems 8:482–492
Manoj C, Magesh S, Sankaran AS, Manikandan MS (2011) Novel approach for detecting applause in continuous meeting speech. 3rd Int Conf Electronics Comput Technol (ICECT) 182–186
McEnnis D, McKay C, Fujinaga I, Depalle P (2005) JAudio: a feature extraction library. 6th Int Conf Music Inform Retriev 600–603
McKay C (2010) Automatic music classification with jMIR. Ph.D. Dissertation. McGill University, Canada
Olajec J, Jarina R, Kuba M (2006) GA-based feature extraction for clapping sound detection. 8th seminar on neural network applications in electrical engineering. NEUREL 2006:21–25
Sadlier DA, Marlow S, O’Connor NE, Murphy N (2001) Automatic TV advertisement detection from MPEG Bitstream. Proc 1st Int Workshop Pattern Recognit Inform Syst: In conjunction with ICEIS 2001. ICEIS Press 14–25
Sidiropoulos P (2011) Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans Circ Syst Video Technol 21:1163–1177
Silva P (2012) Classification, segmentation and chronological prediction of cinematic sound. 11th Int Conf Mach Learn Applic (ICMLA) 2:369–374
Snoek CGM, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimed Tools Applic 25:5–35
Subashini K, Palanivel S (2012) Audio-video based segmentation and classification using SVM and AANN. J Comput Applic 53:43–49
Theodorou T, Mporas L, Fakotakis N (2012) Automatic sound classification of radio broadcast news. J Sig Process, Imag Process, Pattern Recognit 5:37–47
Yuan J, Wang H, Xiao L, Zheng W, Li J, Lin F, Zhang B (2007) A formal study of shot boundary detection. IEEE Trans Circ Syst Video Technol 17:168–186
Zhang T, Kuo C-CJ (1999) Hierarchical classification of audio data for archiving and retrieving. Proc IEEE Int Conf Acoustics, Speech, Sig Process 6:3001–3004
Acknowledgments
This research was supported under Australian Research Council’s Linkage Projects funding scheme (project number LP100200118). We would like to thank our partners on the project: Australia Research Council, RMIT University, LaTrobe University, Circus Australia Ltd, Australia Council for the Arts, and Victoria Arts Centre Trust. We thank the anonymous referees for their helpful feedback and suggestion improvements to the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Iwan, L.H., Thom, J.A. Temporal video segmentation: detecting the end-of-act in circus performance videos. Multimed Tools Appl 76, 1379–1401 (2017). https://doi.org/10.1007/s11042-015-3130-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3130-3