Skip to main content
Log in

Multi-modal and multi-scale photo collection summarization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the proliferation of digital cameras and mobile devices, people are taking much more photos than ever before. However, these photos can be redundant in content and varied in quality. Therefore there is a growing need for tools to manage the photo collections. One efficient photo management way is photo collection summarization which segments the photo collection into different events and then selects a set of representative and high quality photos (key photos) from those events. However, existing photo collection summarization methods mainly consider the low-level features for photo representation only, such as color, texture, etc, while ignore many other useful features, for example high-level semantic feature and location. Moreover, they often return fixed summarization results which provide little flexibility. In this paper, we propose a multi-modal and multi-scale photo collection summarization method by leveraging multi-modal features, including time, location and high-level semantic features. We first use Gaussian mixture model to segment photo collection into events. With images represented by those multi-modal features, our event segmentation algorithm can generate better performance since the multi-modal features can better capture the inhomogeneous structure of events. Next we propose a novel key photo ranking and selection algorithm to select representative and high quality photos from the events for summarization. Our key photo ranking algorithm takes the importance of both events and photos into consideration. Furthermore, our photo summarization method allows users to control the scale of event segmentation and number of key photos selected. We evaluate our method by extensive experiments on four photo collections. Experimental results demonstrate that our method achieves better performance than previous photo collection summarization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Bao B-K, Liu G, Changsheng X, Yan S (2012) Inductive robust principal component analysis. IEEE Trans Image Process 21(8):3794–3800

    Article  MathSciNet  Google Scholar 

  2. Bao B-K, Zhu G, Shen J, Yan S (2013) Robust image analysis with sparse representation on quantized visual features. IEEE Trans Image Process 22(3):860–871

    Article  MathSciNet  Google Scholar 

  3. Bengio Y, Courville AC, Pascal V (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  4. Chu W-T, Lin C-H (2008) Automatic selection of representative photo and smart thumbnailing using near-duplicate detection. In: ACM Multimedia. ACM, pp 829–832

  5. Cooper M, Foote J, Girgensohn A, Wilcox L (2005) Temporal event clustering for digital photo collections. ACM Trans Multimedia Comput Commun Appl 1:269–288

    Article  Google Scholar 

  6. Gong B, Jain R (2007) Segmenting photo streams in events based on optical metadata. In: ICSC. IEEE Computer Society, pp 71–78

  7. Gozali JP, Kan M-Y, Sundaram H (2012) Hidden markov model for event photo stream segmentation. In: Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops. IEEE Computer Society, pp 25–30

  8. Graham A, Garcia-Molina H, Paepcke A, Winograd T (2002) Time as essence for photo browsing through personal digital libraries. In: Proceedings of the second ACM/IEEE-CS joint conference on digital libraries. ACM, pp 326–335

  9. Hong R, Tang J, Tan H-K, Ngo C-W, Shuicheng Y, Chua T-S (2011) Beyond search: event-driven summarization for web videos. TOMCCAP 7(4):35

    Article  Google Scholar 

  10. Hong R, Wang M, Gao Y, Tao D, Li X, Xindong W (2014) Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE T Cybernetics 44(5):669–680

    Article  Google Scholar 

  11. Jing Y, Visualrank SB (2008) Applying pagerank to large-scale image search. IEEE Trans Pattern Anal Mach Intell 30:1877–1890

    Article  Google Scholar 

  12. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25. Curran Associates Inc, pp 1097–1105

  13. Liu H, Mei T, Luo J, Li H, Li S (2012) Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 9–18

  14. Loui AC (2000) Automatic image event segmentation and quality screening for albuming applications. In: ICME 2000, pp 1125–1128

  15. Loui AC, Wood M D (1999) A software system for automatic albuming of consumer pictures. In: Proceedings of the seventh ACM international conference on multimedia (Part 2), MULTIMEDIA ’99. ACM, pp 159–162

  16. Loui A, Savakis A (2000) Automatic image event segmentation and quality screening for albumin application. In: Proceedings of IEEE international conference on multimedia and expo. IEEE, pp 1125–1128

  17. Mei T, Wang B, Hua X-S, Zhou H-Q, Li S (2006) Probabilistic multimodality fusion for event based home photo clustering. In: ICME. IEEE, pp 1757–1760

  18. Mei T, Wang B, Hua X-S, Zhou H-Q, Li S (2006) Probabilistic multimodality fusion for event based home photo clustering. In: 2006 IEEE international conference on multimedia and expo. IEEE, pp 1757–1760

  19. Murray N, Marchesotti L, Perronnin F (2012) Ava: A large-scale database for aesthetic visual analysis. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2408–2415

  20. Platt J C (2000) Autoalbum: clustering digital photographs using probabilistic model merging. Institute of Electrical and Electronics Engineers,Inc

  21. Platt J C, Czerwinski M, Field B (2003) Phototoc: Automatic clustering for browsing personal photographs. Institute of Electrical and Electronics Engineers, Inc., p 21

  22. Richang H, Bao B-K, Guangcan L (11) General subspace learning with corrupted training data via graph embedding. IEEE Trans Image Process 22:2013

  23. Tamura H, Mori S, Yamawaki T Texture features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):1978

  24. Tao M, Yong R, Li S, Tian Q (2014) Multimedia search reranking: a literature survey. ACM Comput Surv 46(3)

  25. Teng L, Tao M, Kewon I-S, Hua X-S (2009) Multi-video synopsis for video representation. Signal Process 89(13)

  26. Ullas G (2003) Modeling and clustering of photo capture streams. In: Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval. ACM, pp 47–54

  27. Vinod N, Hinton GE, Thorsten J (2010) Rectified linear units improve restricted boltzmann machines. In: Fnkranz J (ed) ICML. Omni press, pp 807–814

  28. Yangqing J (2013) Caffe: an open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/

  29. Zhiwei L, Wang B, Li M, Wei-Ying M (2005) A probabilistic model for retrospective news event detection. In: SIGIR. ACM, pp 106–113

Download references

Acknowledgments

This work is supported by the NSFC under the contract No.61201413 and 61390514, the Specialized Research Fund for the Doctoral Program of Higher Education No. WJ2100060003, the Fundamental Research Funds for the Central Universities No. WK2100060011, WK2100100021.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinmei Tian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, X., Tian, X. Multi-modal and multi-scale photo collection summarization. Multimed Tools Appl 75, 2527–2541 (2016). https://doi.org/10.1007/s11042-015-2658-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2658-6

Keywords

Navigation