Skip to main content
Log in

Keyframe Selection Framework Based on Visual and Excitement Features for Lifelog Image Sequences

  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

Keyframe selection is the process of finding a set of representative frames from an image sequence. We aim to achieve an automatic keyframe selection. The main problem is that the accuracy of keyframe selection is highly subjective to each particular user. To deal with this problem, we propose a VIEMO keyframe selection framework based on visual and excitement features that consists of two integrated modules, namely; event segmentation and keyframe selection. Firstly, scene change detection algorithm was applied for an event segmentation. Later, visual features which are contrast, color variance, sharpness, noise and saliency along with excitement features from a biosensor are used to filter keyframe that closely matches with user selection keyframe. Two different fusion scheme which are flat and hierarchical fusion were also investigated. To evaluate the quality of keyframe from the proposed method, we present an evaluation techniques which grades the quality of the keyframe automatically. Even when the keyframe does not exactly match with the keyframe selected by the user, the degree of acceptance calculated from visual similarity is provided. Experimental results showed that keyframe selection using only visual features yielded an acceptance rate of \(74.16 \,\%\). Our proposed method achieves a higher acceptance rate of \(83.71\,\%\). Moreover, the acceptance rate was improved by the average of \(9.55\,\%\) in all participants. Therefore, our framework provides a potential solution to this subjective issue for keyframe selection in lifelog image sequences selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Adolphs R, Cahill L, Schul R, Babinsky R (1997) Impaired declarative memory for emotional material following bilateral amygdala damage in humans. Learn Mem 4(3):291–300

    Article  Google Scholar 

  2. Angelica AD (2012) How to measure emotions. http://www.kurzweilainet/how-to-measure-emotions

  3. Barten PG (1999) Contrast sensitivity of the human eye and its effects on image quality. SPIE-International Society for Optical Engineering, Bellingham

    Book  Google Scholar 

  4. Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. Springer, Berlin

    Google Scholar 

  5. Bell G, Gemmell J (2007) A digital life. Sci Am 296:58–65

    Article  Google Scholar 

  6. Blighe M, O’Connor NE, Rehatschek H, Kienast G (2008) Identifying different settings in a visual diary. In: Proceedings of the 2008 ninth international workshop on image analysis for multimedia interactive services, WIAMIS ’08, IEEE Computer Society, Washington, DC, pp 24–27

  7. Bush V (1945) As We May Think. The Atlantic Monthly 176:101–108

  8. Cheng SC, Yang CK (2011) A novel algorithm for key frames selection. In: Proceedings of international conference on opto-electronics engineering and information science

  9. Connaire CO, O’Connor N, Smeaton AF, Jones G (2007) Organising a daily visual diary using multi-feature clustering. In: SPIE electronic imaging: multimedia content access: algorithms and systems (EI121)

  10. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on, vol 1, pp 886–893. doi:10.1109/CVPR.2005.177

  11. Datta R, Joshi D, Li J, Wang JZ (2006) Studying aesthetics in photographic images using a computational approach. In: Proceedings of the ECCV, pp 7–13

  12. Dean T, Ruzon M, Segal M, Shlens J, Vijayanarasimhan S, Yagnik J (2013) Fast, accurate detection of 100,000 object classes on a single machine. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR), pp 1814–1821. doi:10.1109/CVPR.2013.237

  13. Doherty AR, Smeaton AF (2008) Combining face detection and novelty to identify important events in a visual lifelog. In: Proceedings of the 2008 IEEE 8th international conference on computer and information technology workshops, IEEE Computer Society, Washington, DC, USA, CITWORKSHOPS ’08, pp 348–353

  14. Doherty AR, Smeaton AF, Lee K, Ellis DPW (2007) Multimodal segmentation of lifelog data. In: Proceedings of the RIAO 2007, Pittsburgh, pp 21–38

  15. Doherty AR, Byrne D, Smeaton AF, Jones G, Hughes M (2008) Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs. In: Proceedings of the international conference on content-based image and video retrieval, ACM, pp 259–268

  16. Dufaux F (2000) Key frame selection to represent a video. In: International conference on image processing (ICIP), pp 275–278

  17. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338

    Article  Google Scholar 

  18. Gemmell J, Williams L, Wood K, Lueder R, Bell G (2004) Passive capture and ensuing issues for a personal lifetime store. In: Proceedings of the the 1st ACM workshop on continuous archival and retrieval of personal experiences, ACM Press, CARPE’04, pp 48–55

  19. Gurrin C, Smeaton AF, Byrne D, Jones GJF (2008) An examination of a large visual lifelog. In: Proceedings of the AIRS, pp 537–542

  20. Hodges S, Williams L, Berry E, Izadi S, Srinivasan J, Butler A, Smyth G, Kapur N, Wood K (2006) Sensecam: a retrospective memory aid. UbiComp: ubiquitous computing. Springer, Heidelberg, pp 177–193

    Google Scholar 

  21. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. Trans Pattern Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  22. Jinda-apiraksa A (2012) A keyframe selection of lifelog image sequences. Erasmus Mundus M.Sc. in visions and robotics thesis, Vienna University of Technology (TU Wien)

  23. Jinda-apiraksa A, Machajdik J, Sablatnig R (2013) A keyframe selection of lifelog image sequences. In: Proceedings of the IAPR conference on machine vision applications (IAPR MVA). Kyoto, pp 33–36

  24. LaBar KS, Cabeza R (2006) Cognitive neuroscience of emotional memory. Nat Rev Neurosci 7(1):54–64

    Article  Google Scholar 

  25. Lin WH, Hauptmann E (2006) A.: Structuring continuous video recordings of everyday life using time-constrained clustering. In: Symposium on electronic imaging

  26. Mann S (1997) An historical account of the ’wearcomp’ and ’wearcam’ inventions developed for applications in ’personal imaging’. In: Proceedings of the 1st IEEE international symposium on wearable computers, IEEE Computer Society, Washington, DC, USA, ISWC ’97, pp 66–73

  27. Marziliano P, Dufaux F, Winkler S, Ebrahimi T (2002) A no-reference perceptual blur metric. In: Proceedings of the international conference on image processing, IEEE, vol 3, pp 57–60

  28. Meng J, Juan Y, Chang SF (1995) Scene change detection in an mpeg-compressed video sequence. In: IS&T/SPIE’s symposium on electronic imaging: science & technology, international society for optics and photonics, pp 14–25

  29. Poh MZ, Swenson NC, Picard RW (2010) A wearable sensor for unobtrusive, long-term assessment of electrodermal activity. IEEE Transa Biomed Eng 57(5):1243–1252

    Article  Google Scholar 

  30. Ratsamee P, Mae Y, Ohara K, Takubo T, Arai T (2013) Lifelogging keyframe selection using image quality measurements and physiological excitement features. In: International conference on intelligent robots and systems (IROS), IEEE, pp 1682–1687

  31. Rav-Acha A, Pritch Y, Peleg S (2006) Making a long video short: dynamic video synopsis. In: Computer society conference on computer vision and pattern recognition, IEEE, vol 1, pp 435–441

  32. Smeaton AF, Browne P (2006) A usage study of retrieval modalities for video shot retrieval. Inf Process Manag 42(5):1330–1344

    Article  Google Scholar 

  33. Steiner T, Verborgh R, Gabarrã Vallãs J, Troncy R, Hausenblas M, Van de Walle R, Brousseau A (2012) Enabling on-the-fly video shot detection on YouTube. In: WWW 2012, 21st international World Wide Web conference developer, Lyon, 16–20 Apr 2012

  34. Uchihashi S, Foote J (1999) Summarizing video using a shot importance measure and a frame-packing algorithm. In: Proceedings 1999 IEEE international conference on acoustics, speech, and signal processing, ICASSP ’99, vol 06, IEEE Computer Society, Washington, DC, pp 3041–3044

  35. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition, CVPR 2001, vol 1, pp I-511-I-518. doi:10.1109/CVPR.2001.990517

  36. Wagner J, Kim J, André E (2005) From physiological signals to emotions: Implementing and comparing selected methods for feature extraction and classification. In: International conference on multimedia and expo, IEEE, pp 940–943

  37. Wolf W (1996) Key frame selection by motion analysis. In: Proceedings on international conference on acoustics, speech, and signal processing, IEEE, vol 2, pp 1228–1231

Download references

Acknowledgments

We would like to thank the 6 volunteering students for data collection, ground truth labeling and scoring and Ubiqlog software from Computer Vision Lab in Computer Vision Lab, Vienna University of Technology(TU Wien), Austria. Special thank to Amornched Jinda-apiraksa, Jana Machajdik and Professor Robert Sablatnig who provided valuable comments on this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Photchara Ratsamee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ratsamee, P., Mae, Y., Jinda-apiraksa, A. et al. Keyframe Selection Framework Based on Visual and Excitement Features for Lifelog Image Sequences. Int J of Soc Robotics 7, 859–874 (2015). https://doi.org/10.1007/s12369-015-0306-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-015-0306-y

Keywords

Navigation