Abstract
In scene investigation, creating a video log captured using a handheld camera is more convenient and more complete than taking photos and notes. By introducing video analysis and computer vision techniques, it is possible to build a spatio-temporal representation of the investigation. Such a representation gives a better overview than a set of photos and makes an investigation more accessible. We develop such methods and present an interface for navigating the result. The processing includes (i) segmenting a log into events using novel structure and motion features making the log easier to access in the time dimension, and (ii) mapping video frames to a 3D model of the scene so the log can be navigated in space. Our results show that, using our proposed features, we can recognize more than 70 percent of all frames correctly, and more importantly find all the events. From there we provide a method to semi-interactively map those events to a 3D model of the scene. With this we can map more than 80 percent of the events. The result is a 3D event log that captures the investigation and supports applications such as revisiting the scene, examining the investigation itself, or hypothesis testing.








Similar content being viewed by others
References
Abdollahian G, Taskiran CM, Pizlo Z, Delp EJ (2010) Camera motion-based analysis of user generated video. IEEE Trans Multimed 12(1):28–41
Aizawa K (2005) Digitizing personal experiences: capture and retrieval of life log In: MMM ’05: Proceedings of the 11th international multimedia modelling conference, pp 10–15
Albiol A, Torrest L, Delpt EJ (2003) The indexing of persons in news sequences using audio-visual data In: IEEE international conference on acoustic, speech, and signal processing
Bijhold J, Ruifrok A, Jessen M, Geradts Z, Ehrhardt S, Alberink I (2007) Forensic audio and visual evidence 2004–2007: a review. 15th INTERPOL forensic science symposium
Bush V (1945) As we may think. The atlantic
Dang TK, Worring M, Bui TD (2011) A semi-interactive panorama based 3D reconstruction framework for indoor scenes. Comp Vision Image Underst 115: 1516–1524
Dickie C, Vertegaal R, Fono D, Sohn C, Chen D, Cheng D, Shell JS, Aoudeh O (2004) Augmenting and sharing memory with eyeblog In: CARPE’04: Proceedings of the the 1st ACM workshop on continuous archival and retrieval of personal experiences, pp 105–109
Doherty AR, Smeaton AF (2008) Automatically segmenting lifelog data into events In: WIAMIS ’08: Proceedings of the 2008 9th international workshop on image analysis for multimedia interactive services, pp 20–23
Doherty AR, Smeaton AF, Lee K, Ellis DPW (2007) Multimodal segmentation of lifelog data In: Proceedings of RIAO 2007. Pittsburgh
Gemmell J, Williams L, Wood K, Lueder R, Bell G (2004) Passive capture and ensuing issues for a personal lifetime store In: CARPE’04: Proceedings of the the 1st ACM workshop on continuous archival and retrieval of personal experiences, pp 48–55
Gibson S, Hubbold RJ, Cook J, Howard TLJ (2003) Interactive reconstruction of virtual environments from video sequences. Comput Graph 27(2):293–301
Goldman DB, Gonterman C, Curless B, Salesin D, Seitz SM (2008) Video object annotation, navigation, and composition In: UIST ’08: Proceedings of the 21st annual ACM symposium on user interface software and technology, pp 3–12
Hartley R, Zisserman A (2004) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press
Howard TLJ, Murta AD, Gibson S (2000) Virtual environments for scene of crime reconstruction and analysis In: SPIE – visual data exploration and analysis VII, vol 3960, pp 1–8
Kang HW, Shin SY (2002) Tour into the video: image-based navigation scheme for video sequences of dynamic scenes In: VRST ’02: Proceedings of the ACM symposium on virtual reality software and technology, pp 73–80
Kim K, Essa I, Abowd GD (2006) Interactive mosaic generation for video navigation In: MULTIMEDIA ’06: Proceedings of the 14th annual ACM international conference on multimedia, pp 655–658
Lan DJ, Ma YF, Zhang HJ (2003) A novel motion-based representation for video mining In: International conference on multimedia and expo, vol 3, pp 469–472
Lowe DG (1999) Object recognition from local scale-invariant features In: International conference on computer vision, vol 2, pp 1150–1157
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Ma YF, Lu L, Zhang HJ, Li M (2003) A user attention model for video summarization In: ACM multimedia, pp 533–542
Mei T, Hua XS, Zhou HQ, Li S (2007) Modeling and mining of users’ capture intention for home video. IEEE Trans Multimed 9(1)
Meur OL, Thoreau D, Callet PL, Barba D (2005) A spatial-temporal model of the selective human visual attention In: International conference on image processing, vol 3, pp 1188–1191
Ngo CW, Pong TC, Zhang H (2002) Motion-based video representation for scene change detection. Int J Comput Vis 50(2):127–142
Pollefeys M, Van Gool L, Vergauwen M, Verbiest F, Cornelis K, Tops J, Koch R (2004) Visual modeling with a hand-held camera. Int J Comput Vis 59:207–232
Pollefeys M, Verbiest F, Van Gool L (2002) Surviving dominant planes in uncalibrated structure and motion recovery In: European conference on computer vision, pp 837–851
Robinson D, Milanfar P (2003) Fast local and global projection-based methods for affine motion estimation. J Math Imaging Vis 8(1):35–54
Rui Y, Gupta A, Acero A (2000) Automatically extracting highlights for TV baseball program In: ACM multimedia, pp 105–115
Sinha SN, Steedly D, Szeliski R, Agrawala M, Pollefeys M (2008) Interactive 3D architectural modeling from unordered photo collections. ACM Trans Graph 27(5):159
Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606
Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Trans Graph 25(3):835–846
Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vis 80(2):189–210
Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 4(2):215–322
Tancharoen D, Yamasaki T, Aizawa K (2005) Practical experience recording and indexing of life log video In: CARPE ’05: Proceedings of the 2nd ACM workshop on continuous archival and retrieval of personal experiences, pp 61–66
Torr P, Fitzgibbon AW, Zisserman A (1999) The problem of degeneracy in structure and motion recovery from uncalibrated image sequences. Int. J. Comput. Vis. 32(1)
van den Hengel A, Dick A, Thormählen T, Ward B, Torr PHS (2007) VideoTrace: rapid interactive scene modelling from video. ACM Trans Graph 26(3):86
Acknowledgments
We thank Jurrien Bijhold and the Netherlands Forensic Institute for providing the data and bringing in domain knowledge, and the police investigators for participating in the experiment. This work is supported by the Research Grant from Vietnam’s National Foundation for Science and Technology Development (NAFOSTED), No. 102.02-2011.13.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dang, T.K., Worring, M. & Bui, T.D. Building 3D event logs for video investigation. Multimed Tools Appl 74, 4617–4639 (2015). https://doi.org/10.1007/s11042-013-1826-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1826-9