Skip to main content
Log in

A novel unsupervised approach for multilevel image clustering from unordered image collection

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

A novel unsupervised approach to automatically constructing multilevel image clusters from unordered images is proposed in this paper. The whole input image collection is represented as an imaging sample space (ISS) consisting of globally indexed image features extracted by a new efficient multi-view image feature matching method. By making an analogy between image capturing and observation of ISS, each image is represented as a binary sequence, in which each bit indicates the visibility of a corresponding feature. Based on information theory-inspired image popularity and dissimilarity measures, we show that the image content and distance can be quantitatively described, guided by which an input image collection is organized into multilevel clusters automatically. The effectiveness and the efficiency of the proposed approach are demonstrated using three real image collections and promising results were obtained from both qualitative and quantitative evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Brown M, Lowe D G. Unsupervised 3D object recognition and reconstruction in unordered datasets. In: Proceedings of the 5th International Conference on 3-D Digital Imaging and Modeling. 2005, 56–63

    Chapter  Google Scholar 

  2. Schaffalitzky F, Zisserman A. Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?” In: Proceedings of the 7th European Conference on Computer Vision. 2002, 414–431

    Google Scholar 

  3. Snavely N, Seitz S M, Szeliski R. Modeling the world from internet photo collections. International Journal of Computer Vision, 2008, 80(2): 189–210

    Article  Google Scholar 

  4. Johnson T, Pierre F G, Raguram R, Frahm J M. Fast organization of large photo collections using CUDA. In: Proceedings of the Workshop on Computer Vision on GPUs, European Conference on Computer Vision. 2010

    Google Scholar 

  5. Frahm J M, Pierre F G, Gallup D, Johnson T, Raguram R, Wu C, Jen Y H, Dunn E, Clipp B, Lazebnik S, Pollefeys M. Building Rome on a cloudless day. In: Proceedings of the 11th European Conference on Computer Vision. 2010, 368–381

    Google Scholar 

  6. Agarwal S, Snavely N, Simon I, Seitz S M, Szeliski R. Building Rome in a day. In: Proceedings of the 8th IEEE International Conference on Computer Vision. 2009, 72–79

    Google Scholar 

  7. Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. 2007, 1–8

    Chapter  Google Scholar 

  8. Sivic J, Schaffalitzky F, Zisserman A. Object level grouping for video shots. International Journal of Computer Vision, 2006, 67(2): 189–210

    Article  Google Scholar 

  9. Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, 42(3): 145–175

    Article  MATH  Google Scholar 

  10. Mikolajczyk K, Schmid C. Scale and affine invariant interest point detectors. International Journal of Computer Vision, 2004, 60(1): 63–86

    Article  Google Scholar 

  11. Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110

    Article  Google Scholar 

  12. Ke Y, Sukthankar R. PCA-SIFT: A more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE Conference on Computer Vision and Pattern Recognition. 2004, 506–513

    Google Scholar 

  13. Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool L V. A comparison of affine region detectors. International Journal of Computer Vision, 2005, 65(1–2): 43–72

    Article  Google Scholar 

  14. Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10): 1615–1630

    Article  Google Scholar 

  15. Avidan S, Moses Y, Moses Y. Probabilistic multi-view correspondence in a distributed setting with no central server. In: Proceedings of the 8th European Conference of Computer Vision. 2004, 428–441

    Google Scholar 

  16. Ferrari V, Tuytelaars T, Gool L V. Wide-baseline multiple-view correspondences. In: Proceedings of the 2003 IEEE Conference on Computer Vision and Pattern Recognition. 2003, 718–725

    Chapter  Google Scholar 

  17. Tuytelaars T, Gool L V. Wide baseline stereo matching based on local, affinely invariant regions. In: Proceedings of the 11th British Machine Vision Conference. 2000, 412–425

    Google Scholar 

  18. Yao J, Cham W K. Robust multi-view feature matching from multiple unordered views. Pattern Recognition, 2007, 40(11): 3081–3099

    Article  MATH  Google Scholar 

  19. Sivic J, Zisserman A. Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 1470–1477

    Chapter  Google Scholar 

  20. Jiang Y G, Ngo C W, Yang J. Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval. 2007, 494–501

    Chapter  Google Scholar 

  21. Cao Y, Wang C, Li Z, Zhang L, Zhang L. Spatial-bag-of-features. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. 2010, 3352–3359

    Chapter  Google Scholar 

  22. Marszałek M, Schmid C. Spatial weighting for bag-of-features. In: Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition. 2006, 2118–2125

    Google Scholar 

  23. Viitaniemi V, Laaksonen J. Spatial extensions to bag of visual words. In: Proceedings of the 8th ACM International Conference on Image and Video Retrieval. 2009, 1–8

    Chapter  Google Scholar 

  24. Muja M, Lowe D G. Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of the 4th International Conference on Computer Vision Theory and Applications. 2009, 331–340

    Google Scholar 

  25. Hartley R, Zisserman A. Multiple view geometry in computer vision. 2 edition. New York: Cambridge University Press, 2003

    Google Scholar 

  26. Blahut R E. Principles and practice of information theory. Boston: Addison-Wesley, 1987

    MATH  Google Scholar 

  27. Cover T M, Thomas J A. Elements of information theory. Wiley-Interscience, 2006

    MATH  Google Scholar 

  28. Vázquez P P, Feixas M, Sbert M, Heidrich W. Viewpoint selection using viewpoint entropy. In: Proceedings of the 2001 Vision Modeling and Visualization Conference. 2001, 273–280

    Google Scholar 

  29. Shao H, Svoboda T, Ferrari V, Tuytelaars T, Gool L V. Fast indexing for image retrieval based on local appearance with re-ranking. In: Proceedings of the 10th IEEE International Conference on Image Processing. 2003, 737–740

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lai Kang.

Additional information

Lai Kang received the MS degree in systems engineering from the National University of Defense Technology, Changsha, China. He is currently pursuing the PhD degree from the same university. From September 2010 to August 2012, he is a visiting PhD student with the Computer Graphics Laboratory, Department of Computing Science, University of Alberta, Edmonton, Canada. His current research interests include computer vision with an emphasis on 3D reconstruction from images and global optimization.

Lingda Wu received the PhD Degree in management science and engineering from the National University of Defense Technology, Changsha, China. She is currently a professor at the Key Laboratory, the Academy of Equipment Command and Technology, Beijing, China. Her research focuses on multimedia information systems and virtual reality technology.

Yee-Hong Yang received the PhD degree from the University of Pittsburgh, Pittsburgh, PA. His research interests cover a wide spectrum of topics in computer vision and computer graphics. Topics in computer vision include animation, environment matting, hardware accelerated graphics, motion editing, physics-based modelling, texture analysis and synthesis, and static and dynamic image-based rendering. Topics in computer vision include edge detection, face detection and recognition, light source estimation, motion estimation, segmentation, 2D and 3D shape analysis, and real-time multiview stereo. Dr. Yang is a senior member of the IEEE and has published over 100 technical papers in international journals and conference Proceedings. He has served as reviewer to numerous international journals, as committee members to many conferences and review panels.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, L., Wu, L. & Yang, YH. A novel unsupervised approach for multilevel image clustering from unordered image collection. Front. Comput. Sci. 7, 69–82 (2013). https://doi.org/10.1007/s11704-013-1266-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-013-1266-8

Keywords

Navigation