Abstract
In recent years, the expansion of acquisition devices such as digital cameras, the development of storage and transmission techniques of multimedia documents and the development of tablet computers facilitate the development of many large image databases as well as the interactions with the users. This increases the need for efficient and robust methods for finding information in these huge masses of data, including feature extraction methods and feature space structuring methods. The feature extraction methods aim to extract, for each image, one or more visual signatures representing the content of this image. The feature space structuring methods organize indexed images in order to facilitate, accelerate and improve the results of further retrieval. Clustering is one kind of feature space structuring methods. There are different types of clustering such as hierarchical clustering, density-based clustering, grid-based clustering, etc. In an interactive context where the user may modify the automatic clustering results, incrementality and hierarchical structuring are properties growing in interest for the clustering algorithms. In this article, we propose an experimental comparison of different clustering methods for structuring large image databases, using a rigorous experimental protocol. We use different image databases of increasing sizes (Wang, PascalVoc2006, Caltech101, Corel30k) to study the scalability of the different approaches.















Similar content being viewed by others
Notes
The medoid is defined as the cluster object which has the minimal average distance between it and the other objects in the cluster.
References
Goldberg DE (1989) Genetic algorithms in search optimization and machine learning. Addison-Wesley, Redwood City
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: Proceedings of the workshop on text mining, 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2000)
Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC (2006) Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22:2405–2412
Marinai S, Marino E, Soda G (2008) A comparison of clustering methods for word image indexing. In: The 8th IAPR international workshop on document analysis system, pp 671–676
Serban G, Moldovan GS (2006) A comparison of clustering techniques in aspect mining. Studia Universitatis Babes Bolyai Informatica LI(1):69–78
Wang XY, Garibaldi JM (2005) A comparison of fuzzy and non-fuzzy clustering techniques in cancer diagnosis. In: Proceedings of the 2nd international conference on computational intelligence in medicine and healthcare (CIMED), pp 250–256
Hirano S, Tsumoto S (2005) Empirical comparison of clustering methods for long time-series databases. Lecture notes in artificial intelligence (LNAI) (Subseries of Lecture notes in computer science), vol 3430, pp 268–286
Meila M, Heckerman D (2001) An experimental comparison of model-based clustering methods. Mach Learn 42:9–29
Hirano S, Sun X, Tsumoto S (2004) Comparison of clustering methods for clinical databases. Inf Sci 159(3-4):155–165
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323
Xu R, Wunsch DII (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Plataniotis KN, Venetsanopoulos AN (2000) Color image processing and applications. Springer, Berlin, pp 25–32, 260–275
van de Sande KEA, Gevers T, Snoek CGM (2008) Evaluation of color descriptors for object and scene recognition. In: IEEE proceedings of the computer society conference on computer vision and pattern recognition (CVPR), Anchorage, Alaska
Mindru F, Tuytelaars T, Van Gool L, Moons T (2004) Moment invariants for recognition under changing viewpoint and illumination. Comput Vis Image Underst 94(1–3):3–27
Haralick RM (1979) Statistical and structural approaches to texture. IEEE Proc 67(5):786–804
Lee TS (1996) Image representation using 2D Gabor wavelets. IEEE Trans Pattern Anal Mach Intell 18(10):959–971
Kuizinga P, Petkov N, Grigorescu S (1999) Comparison of texture features based on gabor filters. In: Proceedings of the 10th international conference on image analysis and processing (ICIAP), pp 142–147
Hu MK (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8:179–187
Teague MR (1979) Image analysis via the general theory of moments. J Opt Soc Am 70(8):920–930
Khotanzad A, Hong YH (1990) Invariant image recognition by zernike moments. IEEE Trans PAMI 12:489–498
Fonga H (1996) Pattern recognition in gray-level images by Fourier analysis. Pattern Recogn Lett 17(14):1477–1489
Harris C, Stephens MJ (1998) A combined corner and edge detector. In: Proceedings of the 4th Alvey vision conference, Manchester, UK, pp 147–151
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 2(60):91–110
Lindeberg T (1994) Scale-space theory: a basic tool for analysing structures at different scales. J Appl Stat 21(2):224–270
Zhang J, Marszaek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73(2):213–238
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: speeded up robust features. CVIU 110(3):346–359
Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans PAMI 30(4):712–727
van de Weijer J, Gevers T, Bagdanov A (2006) Boosting color saliency in image feature detection. IEEE Trans PAMI 28(1):150–156
Abdel-Hakim AE, Farag AA (2006) CSIFT: a SIFT descriptor with color invariant characteristics. In: IEEE conference on CVPR, New York, pp 1978–1983
Antonopoulos P, Nikolaidis N, Pitas I (2007) Hierarchical face clustering using SIFT image features. In: Proceedings of IEEE symposium on computational intelligence in image and signal processing (CIISP), pp 325–329
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision (ICCV), Nice, France, pp 1470–1477
McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, pp 281–297
Zhang B, Hsu M, Dayal U (1999) K-harmonic means—a data clustering algorithm. Technical report HPL-1999-124, Hewlett-Packard Labs
Likas A, Vlassis N, Verbeek J (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461
Berrani SA (2004) Recherche approximative de plus proches voisins avec contrôle probabiliste de la précision; application à la recherche dimages par le contenu, PhD thesis
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Ng RT, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. The Morgan Kaufmann, San Francisco
Ball G, Hall D (1967) A clustering technique for summarizing multivariate data. Behav Sci 12(2):153–155
Gallager RG, Humblet PA, Spira PM (1983) A distributed algorithm for minimum-weight spanning trees. ACM Trans Program Lang Syst 5:66–77
Lance GN, Williams WT (1967) A general theory of classification sorting strategies. II. Clustering systems. Comput J 10:271–277
Ward JH (1963) Hierarchical grouping to optimize an objective function. J ACM 58(301):236–244
Ribert A, Ennaji A, Lecourtier Y (1999) An incremental hierarchical clustering. In: Proceedings of the 1999 vision interface (VI) conference, pp 586–591
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114
Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th IEEE international conference on data engineering (ICDE), pp 512–521
Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithms for large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, Seattle, WA, pp 73–84
Guttman A (1984) R-tree: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on management of data, Boston, MA, pp 47–57
Sellis T, Roussopoulos N, Faloutsos C (1987) The R+-tree: a dynamic index for multi-dimensional objects. In: Proceedings of the 16th international conference on very large databases (VLDB), pp 507–518
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 322–331
White DA, Jain R (1996) Similarity indexing with the SS-tree. In: Proceedings of the 12th IEEE ICDE, pp 516–523
Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, Arizon USA, pp 369–380
Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23th VLDB, Athens, Greece. Morgan Kaufmann, San Francisco, pp 186–195
Sheikholeslami G, Chatterjee S, Zhang A (1998) WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings of the 24th VLDB, New York, NY, pp 428–439
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD international conference on management of data, New York, NY, USA, pp 94–105
Mclachlan G, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231
Hinneburg A, Keim DA (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5(4):387–415
Ankerst M, Breunig MM, Kriegel HP, Sande J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, pp 49–60
Koskela M (2003) Interactive image retrieval using self-organizing maps. PhD thesis, Helsinki University of Technology, Dissertations in Computer and Information Science, Report D1, Espoo, Finland
Carpenter G, Grossberg S (1990) ART3: hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Netw 3:129–152
Shamir R, Sharan R (2002) Algorithmic approaches to clustering gene expression data. Current topics in computational biology. MIT Press, Boston, pp 269–300
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: part I and II. SIGMOD Record 31(2):40–45
Zhao Y, Karypis G (2001) Criterion functions for document clustering: experiments and analysis. Technical report TR 0140, Department of Computer Science, University of Minnesota
Fung BCM, Wang K, Ester M (2003) Hierarchical document clustering using frequent itemsets. In: Proceedings of the SIAM international conference on data mining, pp 59–70
Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. In: Joint conference on empirical methods in natural language processing and computational language learning, Prague, pp 410–420
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Milligan GW, Soon SC, Sokol LM (1983) The effect of cluster size, dimensionality and the number of clusters on recovery of true cluster structure. IEEE Trans PAMI 5:40–47
Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78:553–569
Mirkin BG (1996) Mathematical classification and clustering. Kluwer, Dordrecht, pp 105–108
Acknowledgment
Grateful acknowledgment is made for financial support by the Poitou-Charentes Region (France).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lai, H.P., Visani, M., Boucher, A. et al. An experimental comparison of clustering methods for content-based indexing of large image databases. Pattern Anal Applic 15, 345–366 (2012). https://doi.org/10.1007/s10044-011-0261-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-011-0261-7