Multi-modal and multi-scale photo collection summarization

Shen, Xu; Tian, Xinmei

doi:10.1007/s11042-015-2658-6

Multi-modal and multi-scale photo collection summarization

Published: 24 May 2015

Volume 75, pages 2527–2541, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xu Shen¹ &
Xinmei Tian¹

402 Accesses
4 Citations
Explore all metrics

Abstract

With the proliferation of digital cameras and mobile devices, people are taking much more photos than ever before. However, these photos can be redundant in content and varied in quality. Therefore there is a growing need for tools to manage the photo collections. One efficient photo management way is photo collection summarization which segments the photo collection into different events and then selects a set of representative and high quality photos (key photos) from those events. However, existing photo collection summarization methods mainly consider the low-level features for photo representation only, such as color, texture, etc, while ignore many other useful features, for example high-level semantic feature and location. Moreover, they often return fixed summarization results which provide little flexibility. In this paper, we propose a multi-modal and multi-scale photo collection summarization method by leveraging multi-modal features, including time, location and high-level semantic features. We first use Gaussian mixture model to segment photo collection into events. With images represented by those multi-modal features, our event segmentation algorithm can generate better performance since the multi-modal features can better capture the inhomogeneous structure of events. Next we propose a novel key photo ranking and selection algorithm to select representative and high quality photos from the events for summarization. Our key photo ranking algorithm takes the importance of both events and photos into consideration. Furthermore, our photo summarization method allows users to control the scale of event segmentation and number of key photos selected. We evaluate our method by extensive experiments on four photo collections. Experimental results demonstrate that our method achieves better performance than previous photo collection summarization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sub-event recognition and summarization for structured scenario photos

Article 03 March 2016

Liyan Zhang, Bradley Denney & Juwei Lu

Location-Based Hierarchical Event Summary for Social Media Photos

Visualization of Photo Album: Selecting a Representative Photo of a Specific Event

References

Bao B-K, Liu G, Changsheng X, Yan S (2012) Inductive robust principal component analysis. IEEE Trans Image Process 21(8):3794–3800
Article MathSciNet Google Scholar
Bao B-K, Zhu G, Shen J, Yan S (2013) Robust image analysis with sparse representation on quantized visual features. IEEE Trans Image Process 22(3):860–871
Article MathSciNet Google Scholar
Bengio Y, Courville AC, Pascal V (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Chu W-T, Lin C-H (2008) Automatic selection of representative photo and smart thumbnailing using near-duplicate detection. In: ACM Multimedia. ACM, pp 829–832
Cooper M, Foote J, Girgensohn A, Wilcox L (2005) Temporal event clustering for digital photo collections. ACM Trans Multimedia Comput Commun Appl 1:269–288
Article Google Scholar
Gong B, Jain R (2007) Segmenting photo streams in events based on optical metadata. In: ICSC. IEEE Computer Society, pp 71–78
Gozali JP, Kan M-Y, Sundaram H (2012) Hidden markov model for event photo stream segmentation. In: Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops. IEEE Computer Society, pp 25–30
Graham A, Garcia-Molina H, Paepcke A, Winograd T (2002) Time as essence for photo browsing through personal digital libraries. In: Proceedings of the second ACM/IEEE-CS joint conference on digital libraries. ACM, pp 326–335
Hong R, Tang J, Tan H-K, Ngo C-W, Shuicheng Y, Chua T-S (2011) Beyond search: event-driven summarization for web videos. TOMCCAP 7(4):35
Article Google Scholar
Hong R, Wang M, Gao Y, Tao D, Li X, Xindong W (2014) Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE T Cybernetics 44(5):669–680
Article Google Scholar
Jing Y, Visualrank SB (2008) Applying pagerank to large-scale image search. IEEE Trans Pattern Anal Mach Intell 30:1877–1890
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25. Curran Associates Inc, pp 1097–1105
Liu H, Mei T, Luo J, Li H, Li S (2012) Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 9–18
Loui AC (2000) Automatic image event segmentation and quality screening for albuming applications. In: ICME 2000, pp 1125–1128
Loui AC, Wood M D (1999) A software system for automatic albuming of consumer pictures. In: Proceedings of the seventh ACM international conference on multimedia (Part 2), MULTIMEDIA ’99. ACM, pp 159–162
Loui A, Savakis A (2000) Automatic image event segmentation and quality screening for albumin application. In: Proceedings of IEEE international conference on multimedia and expo. IEEE, pp 1125–1128
Mei T, Wang B, Hua X-S, Zhou H-Q, Li S (2006) Probabilistic multimodality fusion for event based home photo clustering. In: ICME. IEEE, pp 1757–1760
Mei T, Wang B, Hua X-S, Zhou H-Q, Li S (2006) Probabilistic multimodality fusion for event based home photo clustering. In: 2006 IEEE international conference on multimedia and expo. IEEE, pp 1757–1760
Murray N, Marchesotti L, Perronnin F (2012) Ava: A large-scale database for aesthetic visual analysis. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2408–2415
Platt J C (2000) Autoalbum: clustering digital photographs using probabilistic model merging. Institute of Electrical and Electronics Engineers,Inc
Platt J C, Czerwinski M, Field B (2003) Phototoc: Automatic clustering for browsing personal photographs. Institute of Electrical and Electronics Engineers, Inc., p 21
Richang H, Bao B-K, Guangcan L (11) General subspace learning with corrupted training data via graph embedding. IEEE Trans Image Process 22:2013
Tamura H, Mori S, Yamawaki T Texture features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):1978
Tao M, Yong R, Li S, Tian Q (2014) Multimedia search reranking: a literature survey. ACM Comput Surv 46(3)
Teng L, Tao M, Kewon I-S, Hua X-S (2009) Multi-video synopsis for video representation. Signal Process 89(13)
Ullas G (2003) Modeling and clustering of photo capture streams. In: Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval. ACM, pp 47–54
Vinod N, Hinton GE, Thorsten J (2010) Rectified linear units improve restricted boltzmann machines. In: Fnkranz J (ed) ICML. Omni press, pp 807–814
Yangqing J (2013) Caffe: an open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/
Zhiwei L, Wang B, Li M, Wei-Ying M (2005) A probabilistic model for retrospective news event detection. In: SIGIR. ACM, pp 106–113

Download references

Acknowledgments

This work is supported by the NSFC under the contract No.61201413 and 61390514, the Specialized Research Fund for the Doctoral Program of Higher Education No. WJ2100060003, the Fundamental Research Funds for the Central Universities No. WK2100060011, WK2100100021.

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Xu Shen & Xinmei Tian

Authors

Xu Shen
View author publications
You can also search for this author in PubMed Google Scholar
Xinmei Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinmei Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, X., Tian, X. Multi-modal and multi-scale photo collection summarization. Multimed Tools Appl 75, 2527–2541 (2016). https://doi.org/10.1007/s11042-015-2658-6

Download citation

Received: 01 October 2014
Revised: 22 April 2015
Accepted: 23 April 2015
Published: 24 May 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11042-015-2658-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal and multi-scale photo collection summarization

Abstract

Access this article

Similar content being viewed by others

Sub-event recognition and summarization for structured scenario photos

Location-Based Hierarchical Event Summary for Social Media Photos

Visualization of Photo Album: Selecting a Representative Photo of a Specific Event

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-modal and multi-scale photo collection summarization

Abstract

Access this article

Similar content being viewed by others

Sub-event recognition and summarization for structured scenario photos

Location-Based Hierarchical Event Summary for Social Media Photos

Visualization of Photo Album: Selecting a Representative Photo of a Specific Event

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation