Abstract
A novel unsupervised approach to automatically constructing multilevel image clusters from unordered images is proposed in this paper. The whole input image collection is represented as an imaging sample space (ISS) consisting of globally indexed image features extracted by a new efficient multi-view image feature matching method. By making an analogy between image capturing and observation of ISS, each image is represented as a binary sequence, in which each bit indicates the visibility of a corresponding feature. Based on information theory-inspired image popularity and dissimilarity measures, we show that the image content and distance can be quantitatively described, guided by which an input image collection is organized into multilevel clusters automatically. The effectiveness and the efficiency of the proposed approach are demonstrated using three real image collections and promising results were obtained from both qualitative and quantitative evaluation.
Similar content being viewed by others
References
Brown M, Lowe D G. Unsupervised 3D object recognition and reconstruction in unordered datasets. In: Proceedings of the 5th International Conference on 3-D Digital Imaging and Modeling. 2005, 56–63
Schaffalitzky F, Zisserman A. Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?” In: Proceedings of the 7th European Conference on Computer Vision. 2002, 414–431
Snavely N, Seitz S M, Szeliski R. Modeling the world from internet photo collections. International Journal of Computer Vision, 2008, 80(2): 189–210
Johnson T, Pierre F G, Raguram R, Frahm J M. Fast organization of large photo collections using CUDA. In: Proceedings of the Workshop on Computer Vision on GPUs, European Conference on Computer Vision. 2010
Frahm J M, Pierre F G, Gallup D, Johnson T, Raguram R, Wu C, Jen Y H, Dunn E, Clipp B, Lazebnik S, Pollefeys M. Building Rome on a cloudless day. In: Proceedings of the 11th European Conference on Computer Vision. 2010, 368–381
Agarwal S, Snavely N, Simon I, Seitz S M, Szeliski R. Building Rome in a day. In: Proceedings of the 8th IEEE International Conference on Computer Vision. 2009, 72–79
Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. 2007, 1–8
Sivic J, Schaffalitzky F, Zisserman A. Object level grouping for video shots. International Journal of Computer Vision, 2006, 67(2): 189–210
Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, 42(3): 145–175
Mikolajczyk K, Schmid C. Scale and affine invariant interest point detectors. International Journal of Computer Vision, 2004, 60(1): 63–86
Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110
Ke Y, Sukthankar R. PCA-SIFT: A more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE Conference on Computer Vision and Pattern Recognition. 2004, 506–513
Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool L V. A comparison of affine region detectors. International Journal of Computer Vision, 2005, 65(1–2): 43–72
Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10): 1615–1630
Avidan S, Moses Y, Moses Y. Probabilistic multi-view correspondence in a distributed setting with no central server. In: Proceedings of the 8th European Conference of Computer Vision. 2004, 428–441
Ferrari V, Tuytelaars T, Gool L V. Wide-baseline multiple-view correspondences. In: Proceedings of the 2003 IEEE Conference on Computer Vision and Pattern Recognition. 2003, 718–725
Tuytelaars T, Gool L V. Wide baseline stereo matching based on local, affinely invariant regions. In: Proceedings of the 11th British Machine Vision Conference. 2000, 412–425
Yao J, Cham W K. Robust multi-view feature matching from multiple unordered views. Pattern Recognition, 2007, 40(11): 3081–3099
Sivic J, Zisserman A. Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 1470–1477
Jiang Y G, Ngo C W, Yang J. Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval. 2007, 494–501
Cao Y, Wang C, Li Z, Zhang L, Zhang L. Spatial-bag-of-features. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. 2010, 3352–3359
Marszałek M, Schmid C. Spatial weighting for bag-of-features. In: Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition. 2006, 2118–2125
Viitaniemi V, Laaksonen J. Spatial extensions to bag of visual words. In: Proceedings of the 8th ACM International Conference on Image and Video Retrieval. 2009, 1–8
Muja M, Lowe D G. Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of the 4th International Conference on Computer Vision Theory and Applications. 2009, 331–340
Hartley R, Zisserman A. Multiple view geometry in computer vision. 2 edition. New York: Cambridge University Press, 2003
Blahut R E. Principles and practice of information theory. Boston: Addison-Wesley, 1987
Cover T M, Thomas J A. Elements of information theory. Wiley-Interscience, 2006
Vázquez P P, Feixas M, Sbert M, Heidrich W. Viewpoint selection using viewpoint entropy. In: Proceedings of the 2001 Vision Modeling and Visualization Conference. 2001, 273–280
Shao H, Svoboda T, Ferrari V, Tuytelaars T, Gool L V. Fast indexing for image retrieval based on local appearance with re-ranking. In: Proceedings of the 10th IEEE International Conference on Image Processing. 2003, 737–740
Author information
Authors and Affiliations
Corresponding author
Additional information
Lai Kang received the MS degree in systems engineering from the National University of Defense Technology, Changsha, China. He is currently pursuing the PhD degree from the same university. From September 2010 to August 2012, he is a visiting PhD student with the Computer Graphics Laboratory, Department of Computing Science, University of Alberta, Edmonton, Canada. His current research interests include computer vision with an emphasis on 3D reconstruction from images and global optimization.
Lingda Wu received the PhD Degree in management science and engineering from the National University of Defense Technology, Changsha, China. She is currently a professor at the Key Laboratory, the Academy of Equipment Command and Technology, Beijing, China. Her research focuses on multimedia information systems and virtual reality technology.
Yee-Hong Yang received the PhD degree from the University of Pittsburgh, Pittsburgh, PA. His research interests cover a wide spectrum of topics in computer vision and computer graphics. Topics in computer vision include animation, environment matting, hardware accelerated graphics, motion editing, physics-based modelling, texture analysis and synthesis, and static and dynamic image-based rendering. Topics in computer vision include edge detection, face detection and recognition, light source estimation, motion estimation, segmentation, 2D and 3D shape analysis, and real-time multiview stereo. Dr. Yang is a senior member of the IEEE and has published over 100 technical papers in international journals and conference Proceedings. He has served as reviewer to numerous international journals, as committee members to many conferences and review panels.
Rights and permissions
About this article
Cite this article
Kang, L., Wu, L. & Yang, YH. A novel unsupervised approach for multilevel image clustering from unordered image collection. Front. Comput. Sci. 7, 69–82 (2013). https://doi.org/10.1007/s11704-013-1266-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-013-1266-8