Abstract
With the advent of wearable computing, personal imaging, photojournalism and personal video diaries, the need for automated archiving of the videos captured by them has become quite pressing. The principal device used to capture the human-environment interaction with these devices is a wearable camera (usually a head-mounted camera). The videos obtained from such a camera are raw and unedited versions of the visual interaction of the wearer (the user of the camera) with the surroundings. The focus of our research is to develop post-processing techniques that can automatically abstract videos based on episode detection. An episode is defined as a part of the video that was captured when the user was interested in an external event and paid attention to record it. Our research is based on the assumption that head movements have distinguishable patterns during an episode occurrence and these patterns can be exploited to differentiate between an episode and a non-episode. Here we present a novel algorithm exploiting the head and body behaviour for detecting the episodes. The algorithm’s performance is measured by comparing the ground truth (user-declared episodes) with the detected episodes. The experiments show the high degree of success we achieved with our proposed method on several hours of head-mounted video captured in varying locations.
Similar content being viewed by others
References
Lienhart R, Pfeiffer S, Effelsberg W (1997a) Video abstracting. Commun ACM 40(12):55–62
Arman F, Depommier R, Hsu A, Chiu MY (1994) Content based browsing of video sequences. In: Proceedings of the ACM international conference on multimedia, pp 97–103
Rorvig ME (1993) A method for automatically abstracting visual documents. J R Am Soc Inf Sci 44(1):40–56
Taniguchi Y, Akutsu A, Tonomura Y, Hamada H (1995) An intuitive and efficient access interface to real-time incoming video based on automatic indexing. In: Proceedings of the ACM international conference on multimedia, San Francisco, pp 25–33
Tonomura Y, Akutsu A, Taniguchi Y, Suzuki G (1994) Structured video computing. IEEE Multimedia Mag 1(3):34–43
Yeung MM, Yeo BL, Wolf W, Liu B (1995) Video browsing using clustering and scene transitions on compressed sequences. In: Rodriguez AA, Maitan J (eds) Proceedings of SPIE, Multimedia Computing and Networking, San Jose, 2417:399–414
Zhang H, Low CY, Smoliar SW, Wu JH (1995) Video parsing, retrieval and browsing: an integrated and content-based solution. In: Proceedings of the ACM international conference on multimedia, San Francisco, pp 15–24
Zhang H, Smoliar SW, Wu JH (1995) Content based video browsing tools. In: Rodriguez AA, Maitan J (eds) Proceedings of SPIE, Multimedia Computing and Networking, San Jose, 2417:389–398
Smith M, Kanade T (1995) Video skimming for quick browsing based on audio and image characterisation. Computer Science Technical Report, Carnegie Mellon University, Pittsburgh
Pfeiffer S, Lienhart R, Fischer S, Effelsberg W (1996) Abstracting digital movies automatically. J Vis Commun Image Represent 7(4):345–353
Lienhart R, Pfeiffer S, Fischer S (1997b) Automatic movie abstracting and its presentation on an HTML page. Technical Report TR-97–003, University of Mannheim, Germany
Saarela J, Merialdo B (1999) Using content models to build audio-video summaries. In: Proceedings of SPIE 3656, Storage and Retrieval for Image and Video Databases VII, pp 338–347
Lienhart R (1990) Abstracting home video automatically. In: Proceedings of ACM Multimedia 99 (Part 2):37–40, Orlando, FL
Lienhart R (2000) Dynamic video summarization of home video. In: Proceedings of SPIE 3972, Storage and Retrieval for Media Databases 2000, January 2000, pp 378–389. Technical Report MRL-VIG99020, April 1999b
Mann S (1998) WearCam (The Wearable Camera): Personal imaging for long-term use in wearable tetherless computer-mediated reality and personal photo/videographic memory prosthesis. In: Proceedings of the 2nd international symposium on wearable computers, pp 124–131
Rowley HA, Baluja S, Kanade T (1995) Human face recognition in visual scenes. Technical Report, Carnegie Mellon University, CMU-CS-95–158R, School of Computer Science, Pittsburgh
Nakamura Y, Ohde J, Ohta Y (200a) Structuring personal experiences- analyzing views from a head-mounted camera. In: Proceedings of the IEEE international conference on multimedia and expo, New York, pp 1137–1140
Nakamura Y, Ohde J, Ohta Y (200b) Structuring personal activity records based on attention-analyzing videos from head mounted camera. In: Proceedings of the international conference on pattern recognitionBarcelona, pp 222–225
Pilu M (2003) A method for real-time, robust frame-to-frame global motion estimation. HP Labs Technical Report HPL-2003–65 April 2003
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting applications to image analysis and automated cartography. Commun ACM 24:381–395
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1. Confusion matrices for stationary and non-stationary classification
Appendix 2. Confusion matrices for head movement direction classification
Appendix 3. Confusion matrices for head movement direction classification (Note: “not applicable” has been inserted because we cannot have the value for non-episodes that were detected as non-episodes.)
Rights and permissions
About this article
Cite this article
Chauhan, A., Singh, S. & Grosvenor, D. Episode detection in videos captured using a head-mounted camera. Pattern Anal Applic 7, 176–189 (2004). https://doi.org/10.1007/s10044-004-0215-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-004-0215-4