skip to main content
10.1145/2072298.2072051acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Extracting key frames from consumer videos using bi-layer group sparsity

Published: 28 November 2011 Publication History

Abstract

Compared to well-edited videos with predefined structures (e.g., news or sports videos), extracting key frames from unconstrained consumer videos remains a much more challenging problem due to their extremely diverse contents (no pre-imposed structure) and uncontrolled video quality (e.g., due to poor lighting or camera shake). In order to exploit spatio-temporal correlation present in the video for key frame extraction, we propose a bi-layer group sparse representation in which the input video frames are first segmented into homogeneous patches and group sparsity is imposed at two levels simultaneously: (i) patch-to-frame, and (ii) frame-to-sequence. The grouped sparse coefficients are further combined with frame quality scores to generate key frames. Extensive experiments are performed on videos from actual end users. Results obtained by the proposed approach compare favorably with existing methods to confirm its effectiveness.

References

[1]
Truong, B. T., and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications and Applications, 3(1).
[2]
Luo, J., Papin, C., and Costello, K. 2009. Towards extracting semantically meaningful key frames from persona video clips: From humans to computers. IEEE Trans. on Circuits and Systems for Video Technology, 19(2): 289--301.
[3]
Rasheed, Z., and Shah, M. 2005. Detection and representation of scenes in videos. IEEE Transactions on Multimedia, 7(6): 1097--1105.
[4]
Tibshirani, R. 1996. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B 58(1): 267--288.
[5]
Mairal, J., Bach, F., Ponce, J., Sapiro, G., and Zisserman, A. 2009. Non-local sparse models for image restoration. IEEE International Conference on Computer Vision.
[6]
Chen, S., Donoho, D., and Saunders, M. 2001. Atomic decomposition by basis pursuit. Society for Industrial and Applied Mathematics, 43(1): 129--159.
[7]
Yuan, M., and Lin, Y. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B, 68(1): 49--67.
[8]
Meier, L., Geer, S., and Buhlmann, P. 2008. The group Lasso for logistic regression. Journal of the Royal Statistical Society: Series B, 70: 53--71.
[9]
Liu, J., and Ye, J. 2010. Moreau-Yosida Regularization for Grouped Tree Structure Learning. Annual Conference on Neural Information Processing Systems.
[10]
Moorthy, A. K., and Bovik, A. C. 2010. A two-step framework for constructing blind image quality indices. IEEE Signal Processing Letters, 17(5): 513--516.
[11]
Sundaram, H., Xie, L., Chang, S. 2002. A utility framework for the automatic generation of audio-visual skims. ACM International Conference on Multimedia.
[12]
Loui, A., Luo, J., Chang, S., Ellis, D., Jiang, W., Kennedy, L., Lee, K., and Yanagawa, A. 2007. Consumer video benchmark data set: concept definition and annotation. International Workshop on Multimedia Information Retrieval.
[13]
Lazebnik, S., Schmid, and C. Ponce, J. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition.

Cited By

View all
  • (2021)Picture Preview Generation for Interactive Educational ResourcesComplexity10.1155/2021/55368672021Online publication date: 1-Jan-2021
  • (2020)Preview Generation for Mathematical Interactive Educational Resources in Netpad2020 8th International Conference on Digital Home (ICDH)10.1109/ICDH51081.2020.00045(221-226)Online publication date: Sep-2020
  • (2019)Sentence Specified Dynamic Video Thumbnail GenerationProceedings of the 27th ACM International Conference on Multimedia10.1145/3343031.3350985(2332-2340)Online publication date: 15-Oct-2019
  • Show More Cited By

Index Terms

  1. Extracting key frames from consumer videos using bi-layer group sparsity

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '11: Proceedings of the 19th ACM international conference on Multimedia
    November 2011
    944 pages
    ISBN:9781450306164
    DOI:10.1145/2072298
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 November 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. consumer video
    2. group sparsity
    3. key frame extraction

    Qualifiers

    • Short-paper

    Conference

    MM '11
    Sponsor:
    MM '11: ACM Multimedia Conference
    November 28 - December 1, 2011
    Arizona, Scottsdale, USA

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Picture Preview Generation for Interactive Educational ResourcesComplexity10.1155/2021/55368672021Online publication date: 1-Jan-2021
    • (2020)Preview Generation for Mathematical Interactive Educational Resources in Netpad2020 8th International Conference on Digital Home (ICDH)10.1109/ICDH51081.2020.00045(221-226)Online publication date: Sep-2020
    • (2019)Sentence Specified Dynamic Video Thumbnail GenerationProceedings of the 27th ACM International Conference on Multimedia10.1145/3343031.3350985(2332-2340)Online publication date: 15-Oct-2019
    • (2017)Keyframe Extraction for Human Motion Capture Data Based on Joint Kernel Sparse RepresentationIEEE Transactions on Industrial Electronics10.1109/TIE.2016.261094664:2(1589-1599)Online publication date: Feb-2017
    • (2016)Scalable gastroscopic video summarization via similar-inhibition dictionary selectionArtificial Intelligence in Medicine10.1016/j.artmed.2015.08.00666:C(1-13)Online publication date: 1-Jan-2016
    • (2015)RPCA-KFE: Key Frame Extraction for Video Using Robust Principal Component AnalysisIEEE Transactions on Image Processing10.1109/TIP.2015.244557224:11(3742-3753)Online publication date: 1-Nov-2015
    • (2014)Heterogeneity Image Patch Index and Its Application to Consumer Video SummarizationIEEE Transactions on Image Processing10.1109/TIP.2014.232081423:6(2704-2718)Online publication date: 1-Jun-2014
    • (2013)Video Key Frame Extraction for Semantic RetrievalInformation Computing and Applications10.1007/978-3-642-53932-9_52(531-540)Online publication date: 2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media