Abstract
In this paper decision variables for the key-frame detection problem in a video are evaluated using statistical tools derived from the theory of design of experiments. The pixel-by-pixel intensity difference of consecutive video frames is used as the factor or decision variable for designing an experiment for key-frame detection. The determination of a key-frame is correlated with the different values of the factor. A novel concept of meaningfulness of a video key-frame is also introduced to select the representative key-frame from a set of possible key-frames. The use of the concepts of design of experiments and the meaningfulness property to summarize a video is tested using a number of videos taken from MUSCLE-VCD-2007 dataset. The performance of the proposed approach in detecting key-frames is found to be superior in comparison to the competing approaches like PME based method (Liu et al., IEEE Trans Circuits Syst Video Technol 13(10):1006–1013, 2003; Mukherjee et al., IEEE Trans Circuits Syst Video Technol 17(5):612–620, 2007; Panagiotakis et al., IEEE Trans Circuits Syst Video Technol 19(3):447–451, 2009).











Similar content being viewed by others
References
Adjeroh D, Lee MC, Banda N, Kandaswamy U (2009) Adaptive edge-oriented shot boundary detection. J Image Video Process 2009(5):5:1–5:13
Calic J, Izquierdo E (2002) Efficient key-frame extraction and video analysis. In: Proc. IEEE international conference on information technology: coding and computing, Washington, DC, USA, pp 28–33
Chasanis VT, Likas AC, Galatsanos NP (2009) Scene detection in videos using shot clustering and sequence alignment. IEEE Trans Multimedia 11(1):89–100
Desolneux A, Moisan L, Morel J (2003) A grouping principle and four applications. IEEE Trans Pattern Anal Mach Intell 25(4):508–513
Desolneux A, Moisan L, Morel J (2008) From gestalt theory to image analysis: a probabilistic approach. In: Interdisciplinary applied mathematics, vol 34. Springer, New York
Gao Y, Tang J, Xie X (2009) Key frame vector and its application to shot retrieval. In: Proc. 1st international workshop on interactive multimedia for consumer electronics, Beijing, China, pp 27–34
Hoeffding W (1963) Probability inequalities for sum of bounded random variables. J Am Stat Assoc 58(301):13–30
Law-To J, Joly A, Boujemaa N (2007) Muscle-VCD-2007: a live benchmark for video copy detection. http://www-rocq.inria.fr/imedia/civr-bench/. Accessed May 2010
Lienhart R (2001) Reliable transition detection in videos: a survey and practitioner’s guide. Int J Image Graph 1(3):469–486
Liu TM, Zhang HJ, Qi FH (2003) A novel key-frame extraction algorithm based on perceived motion energy model. IEEE Trans Circuits Syst Video Technol 13(10):1006–1013
Mills M (1992) A magnifier tool for video data. In: Proc. ACM conference on human factors in computing systems, Monterey, California, USA, pp 93–98
Mukherjee DP, Das SK, Saha S (2007) Key-frame estimation in video using randomness measure of feature point pattern. IEEE Trans Circuits Syst Video Technol 17(5):612–620
Ouyang J, Li J, Tang H (2006) Interactive key frame selection model. J Vis Commun Image Represent 17(6):1145–1163
Panagiotakis C, Doulamis A, Tziritas G (2009) Equivalent key frames selection based on iso-content principles. IEEE Trans Circuits Syst Video Technol 19(3):447–451
Park SH (1996) Robust design and analysis for quality engineering. Chapman & Hall, London
Pickering MJ, Rüger SM, Sinclair D (2002) Video retrieval by feature learning in key frames. In: Proc. International Conference on Image and Video Retrieval, pp 309–317
Pye D, Hollinghurst NJ, Mills TJ, Wood KR (1998) Audio-visual segmentation for content-based retrieval. In: Proc. international conference on spoken language processing
Rasheed Z, Shah M (2005) Detection and Representation of scenes in videos. IEEE Trans Multimedia 7(6):1097–1105
Richard GL (2007) Statistical concepts: a second course. Lawrence Erlbaum Associates, Mahwah
Roy RK (2001) Design of experiments using the Taguchi approach. Wile, New York
Smeaton AF, Over P, Doherty AR (2010) Video shot boundary detection: seven years of TRECVid activity. Comput Vis Image Underst 114(4):411–418
Song X, Fan G (2005) Joint key-frame extraction and object-based video segmentation. In: Proc. IEEE workshop on motion and video computing, vol 2. Breckenridge, Colorado, pp 126–131
Spyrou E, Tolias G, Mylonas P, Avrithis Y (2009) Concept detection and keyframe extraction using a visual thesaurus. Multimedia Tools Appl 41(3):337–373
Valdes V, Martinez JM (2010) A framework for video abstraction systems analysis and modelling from an operational point of view. Multimedia Tools Appl 49(1):7–35
Wolf W (1996) Key frame selection by motion analysis. In: Proc. IEEE international conference on acoustics, speech and signal processing, vol 2, Washington, DC, USA pp 1228–1231
Yeung MM, Yeo BL (1997) Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans Circuits Syst Video Technol 7(5):771–785
Zhuang Y, Rui Y, Huang TS, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In: Proc. IEEE international conference on image processing, Chicago, USA, pp 866–870
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Obtaining (13) from [7]
Hoeffding’s inequality [ (7)] In our problem, m i is the number of frames in the ith unit. Then we can formulate the problem by a sequence of i.i.d. random variables \(\{X_q\}_{q=1,2,3,...,m_i}\), such that 0 ≤ X q ≤ 1. Let us define X q as,
for a given η, where l q is the l-ratio value of the qth frame of the ith unit. We set \(S_{m_i}=\sum_{q=1}^{m_i}X_q\) (i.e., the number of frames of ith unit having l-ratio greater than η) and \(\nu m_i=E\left[S_{m_i}\right]\). Then for νm i < t < m i (since ν is a probability value less than 1), putting \(\sigma=\frac{t}{m_i}\) as in [5], according to Hoeffding’s inequality,
In addition, the right hand term of this inequality satisfies,
where
This is Hoeffding’s inequality. We then apply this for finding the sufficient condition of ϵ-meaningfulness. If \(t\geq \nu m_i+\sqrt{\frac{\log {\lambda} - \log {\epsilon}}{H(\nu)}}\sqrt{m_i}\), then using (20) and (21) and putting \(\sigma=\frac{t}{m_i}\) we get
Then using (20) and (23) we get,
This means by definition of meaningfulness, the cut-off η is meaningful (according to (11)).
Since for ν in (0,1), H(ν) ≥ 2 (according to (22)) so from (24) we get the sufficient condition of meaningfulness as (13).
Appendix B: Algorithm of the proposed approach
-
(1)
Input the video sequence with speed X fps.
-
(2)
Find the Euclidean distance of color values of each pair of consecutive frames.
-
(3)
For γ = 1 to R (R is the maximum bound of color values) do
-
(a)
Give a binary value to each pixel using (2).
-
(b)
Find the matrix β d for each frame d.
-
(c)
Calculate p-ratio p d using (3).
-
(d)
For κ = 0 to 1 step δ κ do
-
(i)
Find all the frames with p d > κ.
-
(ii)
If (any selected frame f i is less than X frame apart from f i + 1) do
-
(A)
Find the temporal distance between f i + 1 and f i + 2, f i + 2 and f i + 3, and so on until the temporal distance between f i + m and f i + m + 1 is greater than X.
-
(B)
Take the f i + m frame as the boundary of the group starting at frame f i − 1.
-
(A)
-
(iii)
End if
- (iv)
-
(i)
-
(e)
End for κ
-
(a)
-
(4)
End for γ
-
(5)
Find F max = max (F-ratio) and corresponding value of γ and κ.
-
(6)
If F max < F critical
-
(a)
Consider the set of frames having p-ratio greater than κ as unit boundaries. else
-
(b)
Consider whole video as a single unit.
-
(a)
-
(7)
End if
-
(8)
Find l-ratio of each frame by (8).
-
(9)
For each unit i do
-
(a)
For \(\eta=0 ~ \mathrm{to} ~ 1 ~ \mathrm{step} ~ \frac{1}{\lambda}\) do
-
(b)
End for η
-
(c)
For \(\xi=\eta^{\prime} ~ \mathrm{to} ~ 1 ~ \mathrm{step} ~ \frac{1}{\lambda}\) do
-
(d)
End for ξ
-
(e)
Find the ξ satisfying (16).
-
(f)
Select the frames having l-ratio greater than ξ as key-frames.
-
(a)
-
(10)
End for unit
-
(11)
Display all the selected frames as key-frames.
Rights and permissions
About this article
Cite this article
Mukherjee, S., Mukherjee, D.P. A design-of-experiment based statistical technique for detection of key-frames. Multimed Tools Appl 62, 847–877 (2013). https://doi.org/10.1007/s11042-011-0882-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0882-2