Skip to main content
Log in

A design-of-experiment based statistical technique for detection of key-frames

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper decision variables for the key-frame detection problem in a video are evaluated using statistical tools derived from the theory of design of experiments. The pixel-by-pixel intensity difference of consecutive video frames is used as the factor or decision variable for designing an experiment for key-frame detection. The determination of a key-frame is correlated with the different values of the factor. A novel concept of meaningfulness of a video key-frame is also introduced to select the representative key-frame from a set of possible key-frames. The use of the concepts of design of experiments and the meaningfulness property to summarize a video is tested using a number of videos taken from MUSCLE-VCD-2007 dataset. The performance of the proposed approach in detecting key-frames is found to be superior in comparison to the competing approaches like PME based method (Liu et al., IEEE Trans Circuits Syst Video Technol 13(10):1006–1013, 2003; Mukherjee et al., IEEE Trans Circuits Syst Video Technol 17(5):612–620, 2007; Panagiotakis et al., IEEE Trans Circuits Syst Video Technol 19(3):447–451, 2009).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Adjeroh D, Lee MC, Banda N, Kandaswamy U (2009) Adaptive edge-oriented shot boundary detection. J Image Video Process 2009(5):5:1–5:13

    Article  Google Scholar 

  2. Calic J, Izquierdo E (2002) Efficient key-frame extraction and video analysis. In: Proc. IEEE international conference on information technology: coding and computing, Washington, DC, USA, pp 28–33

  3. Chasanis VT, Likas AC, Galatsanos NP (2009) Scene detection in videos using shot clustering and sequence alignment. IEEE Trans Multimedia 11(1):89–100

    Article  Google Scholar 

  4. Desolneux A, Moisan L, Morel J (2003) A grouping principle and four applications. IEEE Trans Pattern Anal Mach Intell 25(4):508–513

    Article  Google Scholar 

  5. Desolneux A, Moisan L, Morel J (2008) From gestalt theory to image analysis: a probabilistic approach. In: Interdisciplinary applied mathematics, vol 34. Springer, New York

    Google Scholar 

  6. Gao Y, Tang J, Xie X (2009) Key frame vector and its application to shot retrieval. In: Proc. 1st international workshop on interactive multimedia for consumer electronics, Beijing, China, pp 27–34

  7. Hoeffding W (1963) Probability inequalities for sum of bounded random variables. J Am Stat Assoc 58(301):13–30

    Article  MathSciNet  MATH  Google Scholar 

  8. Law-To J, Joly A, Boujemaa N (2007) Muscle-VCD-2007: a live benchmark for video copy detection. http://www-rocq.inria.fr/imedia/civr-bench/. Accessed May 2010

  9. Lienhart R (2001) Reliable transition detection in videos: a survey and practitioner’s guide. Int J Image Graph 1(3):469–486

    Article  Google Scholar 

  10. Liu TM, Zhang HJ, Qi FH (2003) A novel key-frame extraction algorithm based on perceived motion energy model. IEEE Trans Circuits Syst Video Technol 13(10):1006–1013

    Article  Google Scholar 

  11. Mills M (1992) A magnifier tool for video data. In: Proc. ACM conference on human factors in computing systems, Monterey, California, USA, pp 93–98

  12. Mukherjee DP, Das SK, Saha S (2007) Key-frame estimation in video using randomness measure of feature point pattern. IEEE Trans Circuits Syst Video Technol 17(5):612–620

    Article  Google Scholar 

  13. Ouyang J, Li J, Tang H (2006) Interactive key frame selection model. J Vis Commun Image Represent 17(6):1145–1163

    Article  Google Scholar 

  14. Panagiotakis C, Doulamis A, Tziritas G (2009) Equivalent key frames selection based on iso-content principles. IEEE Trans Circuits Syst Video Technol 19(3):447–451

    Article  Google Scholar 

  15. Park SH (1996) Robust design and analysis for quality engineering. Chapman & Hall, London

    Google Scholar 

  16. Pickering MJ, Rüger SM, Sinclair D (2002) Video retrieval by feature learning in key frames. In: Proc. International Conference on Image and Video Retrieval, pp 309–317

  17. Pye D, Hollinghurst NJ, Mills TJ, Wood KR (1998) Audio-visual segmentation for content-based retrieval. In: Proc. international conference on spoken language processing

  18. Rasheed Z, Shah M (2005) Detection and Representation of scenes in videos. IEEE Trans Multimedia 7(6):1097–1105

    Article  Google Scholar 

  19. Richard GL (2007) Statistical concepts: a second course. Lawrence Erlbaum Associates, Mahwah

    Google Scholar 

  20. Roy RK (2001) Design of experiments using the Taguchi approach. Wile, New York

    Google Scholar 

  21. Smeaton AF, Over P, Doherty AR (2010) Video shot boundary detection: seven years of TRECVid activity. Comput Vis Image Underst 114(4):411–418

    Article  Google Scholar 

  22. Song X, Fan G (2005) Joint key-frame extraction and object-based video segmentation. In: Proc. IEEE workshop on motion and video computing, vol 2. Breckenridge, Colorado, pp 126–131

  23. Spyrou E, Tolias G, Mylonas P, Avrithis Y (2009) Concept detection and keyframe extraction using a visual thesaurus. Multimedia Tools Appl 41(3):337–373

    Article  Google Scholar 

  24. Valdes V, Martinez JM (2010) A framework for video abstraction systems analysis and modelling from an operational point of view. Multimedia Tools Appl 49(1):7–35

    Article  Google Scholar 

  25. Wolf W (1996) Key frame selection by motion analysis. In: Proc. IEEE international conference on acoustics, speech and signal processing, vol 2, Washington, DC, USA pp 1228–1231

  26. Yeung MM, Yeo BL (1997) Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans Circuits Syst Video Technol 7(5):771–785

    Article  Google Scholar 

  27. Zhuang Y, Rui Y, Huang TS, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In: Proc. IEEE international conference on image processing, Chicago, USA, pp 866–870

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Snehasis Mukherjee.

Appendices

Appendix A: Obtaining (13) from [7]

Hoeffding’s inequality [ (7)] In our problem, m i is the number of frames in the ith unit. Then we can formulate the problem by a sequence of i.i.d. random variables \(\{X_q\}_{q=1,2,3,...,m_i}\), such that 0 ≤ X q  ≤ 1. Let us define X q as,

$$ X_q =\left\{ \begin{array}{ll} 1 & \mathrm{when}~~ l_q < \eta\\ 0 & \mathrm{otherwise}, \end{array}\right. $$
(19)

for a given η, where l q is the l-ratio value of the qth frame of the ith unit. We set \(S_{m_i}=\sum_{q=1}^{m_i}X_q\) (i.e., the number of frames of ith unit having l-ratio greater than η) and \(\nu m_i=E\left[S_{m_i}\right]\). Then for νm i  < t < m i (since ν is a probability value less than 1), putting \(\sigma=\frac{t}{m_i}\) as in [5], according to Hoeffding’s inequality,

$$P_i^{\eta}=P\left(S_{m_i}\geq t\right)\leq e^{-m_i\left(\sigma \log {\frac{\sigma}{\nu}}+(1-\sigma) \log {\frac{1-\sigma}{1-\nu}}\right)}. $$
(20)

In addition, the right hand term of this inequality satisfies,

$$ e^{-m_i\left(\sigma \log {\frac{\sigma}{\nu}}+(1-\sigma) \log {\frac{1-\sigma}{1-\nu}}\right)}\leq e^{-m_i\left(\sigma-\nu\right)^2H(\nu)}\leq e^{-2m_i\left(\sigma-\nu\right)^2}, $$
(21)

where

$$ H(\nu) =\left\{ \begin{array}{ll} \dfrac{1}{1-2\nu}\log {\dfrac{1-\nu}{\nu}} & \mathrm{when}~~ 0<\nu<\dfrac{1}{2}\\\\ \dfrac{1}{2\nu(1-\nu)} & \mathrm{when}~~ \dfrac{1}{2}\leq \nu<1 \end{array}\right. $$
(22)

This is Hoeffding’s inequality. We then apply this for finding the sufficient condition of ϵ-meaningfulness. If \(t\geq \nu m_i+\sqrt{\frac{\log {\lambda} - \log {\epsilon}}{H(\nu)}}\sqrt{m_i}\), then using (20) and (21) and putting \(\sigma=\frac{t}{m_i}\) we get

$$ m_i\left(\sigma-\nu\right)^2 \geq \frac{\log {\lambda}-\log {\epsilon}}{H(\nu)}. $$
(23)

Then using (20) and (23) we get,

$$ P_i^{\eta}\leq e^{-m_i(\sigma-\nu)^2H(\nu)}\leq e^{-\log {\lambda}+\log{\epsilon}}=\frac{\epsilon}{\lambda}. $$
(24)

This means by definition of meaningfulness, the cut-off η is meaningful (according to (11)).

Since for ν in (0,1), H(ν) ≥ 2 (according to (22)) so from (24) we get the sufficient condition of meaningfulness as (13).

Appendix B: Algorithm of the proposed approach

  1. (1)

    Input the video sequence with speed X fps.

  2. (2)

    Find the Euclidean distance of color values of each pair of consecutive frames.

  3. (3)

    For γ = 1 to R (R is the maximum bound of color values) do

    1. (a)

      Give a binary value to each pixel using (2).

    2. (b)

      Find the matrix β d for each frame d.

    3. (c)

      Calculate p-ratio p d using (3).

    4. (d)

      For κ = 0 to 1 step δ κ do

      1. (i)

        Find all the frames with p d  > κ.

      2. (ii)

        If (any selected frame f i is less than X frame apart from f i + 1) do

        1. (A)

          Find the temporal distance between f i + 1 and f i + 2, f i + 2 and f i + 3, and so on until the temporal distance between f i + m and f i + m + 1 is greater than X.

        2. (B)

          Take the f i + m frame as the boundary of the group starting at frame f i − 1.

      3. (iii)

        End if

      4. (iv)

        Find F-ratio using (4)–(6).

    5. (e)

      End for κ

  4. (4)

    End for γ

  5. (5)

    Find F max  =  max (F-ratio) and corresponding value of γ and κ.

  6. (6)

    If F max  < F critical

    1. (a)

      Consider the set of frames having p-ratio greater than κ as unit boundaries. else

    2. (b)

      Consider whole video as a single unit.

  7. (7)

    End if

  8. (8)

    Find l-ratio of each frame by (8).

  9. (9)

    For each unit i do

    1. (a)

      For \(\eta=0 ~ \mathrm{to} ~ 1 ~ \mathrm{step} ~ \frac{1}{\lambda}\) do

      1. (i)

        Find probability ν using (9).

      2. (ii)

        Find probability \(P_{i}^{\eta}\) using (10).

      3. (iii)

        Find NFA using (11).

      4. (iv)

        If \(\emph{NFA}<(\epsilon = 1)\),

        1. (A)

          η  = η.

        2. (B)

          Exit from the for loop of η. else

        3. (C)

          Continue the for loop of η.

      5. (v)

        End if

    2. (b)

      End for η

    3. (c)

      For \(\xi=\eta^{\prime} ~ \mathrm{to} ~ 1 ~ \mathrm{step} ~ \frac{1}{\lambda}\) do

      1. (i)

        Find \(r_{i}(\eta^{\prime})\) using (14).

      2. (ii)

        Find c-value using (15).

    4. (d)

      End for ξ

    5. (e)

      Find the ξ satisfying (16).

    6. (f)

      Select the frames having l-ratio greater than ξ as key-frames.

  10. (10)

    End for unit

  11. (11)

    Display all the selected frames as key-frames.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mukherjee, S., Mukherjee, D.P. A design-of-experiment based statistical technique for detection of key-frames. Multimed Tools Appl 62, 847–877 (2013). https://doi.org/10.1007/s11042-011-0882-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-011-0882-2

Keywords

Navigation