Abstract
In this paper, the key-frame extraction for video clips which are captured by smart phones are investigated. Different from existing work which exploits the image contents, this work utilizes the sensors embedded in the smart phone, such as accelerometers and the touch screen surface, to infer the user’s intention. Those intentions are further analyzed to extract the meaningful video key-frames. The proposed method is not only fast enough for on-device implementation, but it also can improve the key-frame extraction performance. Finally, a prototype is developed in a smart phone and extensive experimental validations are provided to show the advantages of the proposed method.
Similar content being viewed by others
Notes
In this paper, the terminology “sensor” does not include camera.
References
Abdollahian G, Taskiran C, Pizlo Z, Delp E (2010) Camera motion-based analysis of user generated video. IEEE Trans Multimedia 12(1):28–41
Alvarez A, Garcia F, Naranjo J, Anaya J, Jimenez F (2014) Modeling the driving behavior of electric vehicles using smartphones and neural networks. IEEE Intell Transp Syst Mag 6(3):44–53
Atzori L, Dessi T, and Popescu V (2012) “Indoor navigation system using image and sensor data processing on a smart phone,” in Proc. IEEE International Conference on Optimization of Electrical and Electronic Equipment, Brasov, Romania, 1158–1163
Bai Y, Wu S, Tsai C (2012) Design and implementation of a fall monitor system by using a 3-axis accelerometer in a smart phone. IEEE Trans Consum Electron 58(4):1269–1275
Chung M and Choo H (2014) “Picture browsing non-touch interaction methods for smartphones using an accelerometer and camera with a focus on phone dialing,” Multimedia Tools and Applications
Cricri F, Dabov K, Curcio I, Mate S, and Gabbouj M, “Multimodal extraction of events and of information about the recording activity in user generated videos,” accepted for publication in Multimedia Tools and Application.
Cricri F, Roininen M, Mate S, Leppänen J, Curcio I, and Gabbouj M (2013) “Multi-sensor fusion for sport genre classification of user generated mobile videos,” in Proc. IEEE International Conference on Multimedia and Expo, San Jose, USA, 1–8
Fuentes D, Gonzalez-Abril L, Angulo C, Ortega J (2012) Online motion recognition using an accelerometer in a mobile device. Expert Syst Appl 39(3):2461–2465
Gao Y, Wang W, Yong J (2008) A video summarization tool using two-level redundancy detection for personal video recorders. IEEE Trans Consum Electron 54(2):521–526
Guo Y, Yang L, Ding X, Han J, and Liu Y (2013) “OpenSesame: unlocking smart phone through handshaking biometrics,” in Proc. IEEE International Conference on Computer Communications, Turin, Italy, pp. 365–369
Hanning G, Forslow N, Forssen P, Ringaby E, Tornqvist D, and Callmer J (2011) “Stabilizing cell phone video using inertial measurement sensors,” in Proc. IEEE International Conference on Computer Vision Workshop, Barcelona, 1–8
Hua G, Fu Y, Turk M, Pollefeys M, Zhang Z (2012) Introduction to the special issue on mobile vision. Int J Comput Vis 96(2):277–279
Huang J, Hsieh W (2004) Automatic feature-based global motion estimation in video sequences. IEEE Trans Consum Electron 50(3):911–915
Jiang R, Sadka A, Crookes D (2009) Hierarchical video summarization in reference subspace. IEEE Trans Consum Electron 55(3):1551–1557
Kim S, Kang S, Wang T, and Ko S (2013) “Feature point classification based global motion estimation for video stabilization,” IEEE Trans Consum Electron, vol. 59, no. 1
Kosmopoulos D, Voulodimos A, Doulamis A (2013) A system for multi-camera task recognition and summarization for structured environments. IEEE Trans on Ind Informa 9(1):161–171
Lane N, Miluzzo E, Lu H, Peebles D, Choudhury T, Campbell A (2010) A survey of mobile phone sensing. IEEE Commun Mag 48(9):140–150
Lee W, Park Y, Lepetit V, Woo W (2011) Video-based in situ tagging on mobile phones. IEEE Trans Circ Syst for Video Technol 21(10):1487–1496
Luo J, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: From humans to computers. IEEE Trans Circ Syst Video Tech 19(2):289–301
Rasheed Z, Shah M (2005) Detection and representation of scenes in videos. IEEE Trans Multimedia 7(6):1097–1105
Schoeffmann K, Fabro M, and Szkaliczki T (2014) “Keyframe extraction in endoscopic video,” Multimedia Tools and Applications
Sentinelli A, Celetto L, Marfia G, and Roccetti M (2013) “Embedded key frame extraction in UGC scenarios,” in Proc. of Int. Conf. on Multimedia and Expo Workshops (ICMEW), 1–5
Shangguan L, Zhou Z, Yang Z, Liu K, Li Z, Zhao X, Liu Y (2014) Towards accurate object localization with smartphones. IEEE Trans Parallel Distrib Syst 25(10):2731–2742
Tian Y, Wang W, Gong X, Que X, Ma J (2013) An enhanced personal photo recommendation system by fusing contextual and textual features on mobile device. IEEE Trans Consum Electron 59(1):220–228
Truong B, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):1–37
Yong S, Deng J, and Purvis M, “Wildlife video key-frame extraction based on novelty detection in semantic context,” Multimedia Tools and Applications, 2013
Acknowledgments
This work was supported in part by the National Key Project for Basic Research of China under Grant 2013CB329403; in part by the Tsinghua Self-innovation Project under Grant 20111081111; and in part by the Tsinghua University Initiative Scientific Research Program under Grant 20131089295.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, H., Liu, Y. & Sun, F. Video key-frame extraction for smart phones. Multimed Tools Appl 75, 2031–2049 (2016). https://doi.org/10.1007/s11042-014-2390-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2390-7