tutorial

Systematic Review of Clustering High-Dimensional and Large Datasets

Authors:

Rinkl RaniAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 12, Issue 2

Article No.: 16, Pages 1 - 68

https://doi.org/10.1145/3132088

Published: 23 January 2018 Publication History

Abstract

Technological advancement has enabled us to store and process huge amount of data in relatively short spans of time. The nature of data is rapidly changing, particularly its dimensionality is more commonly multi- and high-dimensional. There is an immediate need to expand our focus to include analysis of high-dimensional and large datasets. Data analysis is becoming a mammoth task, due to incremental increase in data volume and complexity in terms of heterogony of data. It is due to this dynamic computing environment that the existing techniques either need to be modified or discarded to handle new data in multiple high-dimensions. Data clustering is a tool that is used in many disciplines, including data mining, so that meaningful knowledge can be extracted from seemingly unstructured data. The aim of this article is to understand the problem of clustering and various approaches addressing this problem. This article discusses the process of clustering from both microviews (data treating) and macroviews (overall clustering process). Different distance and similarity measures, which form the cornerstone of effective data clustering, are also identified. Further, an in-depth analysis of different clustering approaches focused on data mining, dealing with large-scale datasets is given. These approaches are comprehensively compared to bring out a clear differentiation among them. This article also surveys the problem of high-dimensional data and the existing approaches, that makes it more relevant. It also explores the latest trends in cluster analysis, and the real-life applications of this concept. This survey is exhaustive as it tries to cover all the aspects of clustering in the field of data mining.

References

[1]

Elke Achtert, Christian Bohm, Hans-Peter Kriegel, Peer Kroger, and Arthur Zimek. 2007b. On exploring complex relationships of correlation clusters. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSBDM’07). IEEE, 7--7.

Digital Library

[2]

Elke Achtert, Christian Böhm, Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek. 2007a. Robust, complete, and efficient correlation clustering. In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM, 413--418.

[3]

Elke Achtert, Christian Bohm, Peer Kroger, and Arthur Zimek. 2006. Mining hierarchies of correlation clusters. In Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM’06). IEEE, 119--128.

Digital Library

[4]

Charu C. Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu. 2003. A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on Very Large Data Bases-Volume 29 (VLDB’03). 81--92.

Digital Library

[5]

Charu C. Aggarwal and S. Yu Philip. 2004. A condensation approach to privacy preserving data mining. In Proceedings of the International Conference on Extending Database Technology, Advances in Database Technology (EDBT’04). Springer, 183--199.

[6]

Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu, Cecilia Procopiuc, and Jong Soo Park. 1999. Fast algorithms for projected clustering. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Vol. 28. ACM, 61--72.

Digital Library

[7]

Charu C. Aggarwal and Philip S. Yu. 2000. Finding Generalized Projected Clusters in High Dimensional Spaces, Vol. 29. ACM.

Digital Library

[8]

Charu C. Aggarwal and Philip S. Yu. 2002. Redefining clustering for high-dimensional applications. IEEE Transactions on Knowledge and Data Engineering 14, 2 (2002), 210--225.

Digital Library

[9]

Charu C. Aggarwal and ChengXiang Zhai. 2012a. Mining Text Data. Springer Science 8 Business Media.

Digital Library

[10]

Charu C. Aggarwal and ChengXiang Zhai. 2012b. A survey of text clustering algorithms. In Mining Text Data. Springer, 77--128.

Digital Library

[11]

Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. 1998. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications, Vol. 27. ACM.

[12]

Rakesh Agrawal, Johannes Ernst Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. 1999. Automatic subspace clustering of high dimensional data for data mining applications. U.S. Patent 6,003,029, issued December 14, 1999.

[13]

Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12, 4 (2009), 461--486.

Digital Library

[14]

Amineh Amini, Teh Ying Wah, and Hadi Saboohi. 2014. On density-based data streams clustering algorithms: a survey. Journal of Computer Science and Technology 29, 1 (2014), 116--141.

[15]

Rajaraman Anand and D. U. Jeffrey. 2012. Mining of Massive Datasets.

Digital Library

[16]

S. Aranganayagi and K. Thangavel. 2007. Clustering categorical data using silhouette coefficient as a relocating measure. In Proceedings of the 2007 International Conference on Computational Intelligence and Multimedia Applications, Vol. 2. IEEE, 13--17.

Digital Library

[17]

Saurabh Arora and Inderveer Chana. 2014. A survey of clustering techniques for big data analysis. In Proceedings of the 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence). IEEE, 59--65.

[18]

Ira Assent. 2012. Clustering high dimensional data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2, 4 (2012), 340--350.

Digital Library

[19]

Ira Assent, Ralph Krieger, Emmanuel Muller, and Thomas Seidl. 2007. DUSC: Dimensionality unbiased subspace clustering. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE, 409--414.

Digital Library

[20]

Abdelkarim Ben Ayed, Mohamed Ben Halima, and Adel M. Alimi. 2014. Survey on clustering methods: Towards fuzzy clustering for big data. In Proceedings of the 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR’14). IEEE, 331--336.

[21]

Pierre Baldi and Kurt Hornik. 1989. Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks 2, 1 (1989), 53--58.

Digital Library

[22]

Daniel Barbará and Ping Chen. 2000. Using the fractal dimension to cluster datasets. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 260--264.

Digital Library

[23]

Mikhail Belkin and Partha Niyogi. 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01), Vol. 14. 585--591.

Digital Library

[24]

Michael W. Berry and Malu Castellanos. 2004. Survey of text mining. Computing Reviews 45, 9 (2004), 548.

Digital Library

[25]

Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1999. When is nearest neighbor meaningful? In Proceedings of the International Conference on Database Theory (ICDT’99). Springer, 217--235.

Digital Library

[26]

Christophe Biernacki, Gilles Celeux, and Gérard Govaert. 2000. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 7 (2000), 719--725.

Digital Library

[27]

Christopher M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press.

Digital Library

[28]

Leon Bobrowski and James C. Bezdek. 1991. c-means clustering with the l l and l norms. IEEE Transactions on Systems, Man and Cybernetics 21, 3 (1991), 545--554.

[29]

Christian Böhm, Karin Kailing, Peer Kröger, and Arthur Zimek. 2004. Computing clusters of correlation connected objects. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. ACM, 455--466.

Digital Library

[30]

Urszula Boryczka. 2009. Finding groups in data: Cluster analysis with ants. Applied Soft Computing 9, 1 (2009), 61--70.

Digital Library

[31]

Olutayo Boyinbode, Hanh Le, and Makoto Takizawa. 2011. A survey on clustering algorithms for wireless sensor networks. International Journal of Space-Based and Situated Computing 1, 2--3 (2011), 130--136.

[32]

Ulrik Brandes, Marco Gaertler, and Dorothea Wagner. 2003. Experiments on graph clustering algorithms. In Proceedings of the European Symposium on Algorithms. Springer, 568--579.

[33]

Janez Brank, Marko Grobelnik, and Dunja Mladenic. 2005. A survey of ontology evaluation techniques. In Proceedings of the Conference on Data Mining and Data Warehouses (SiKDD’05). 166--170.

[34]

Ryan P. Browne, Paul D. McNicholas, and Matthew D. Sparling. 2012. Model-based learning using a mixture of mixtures of Gaussian and uniform distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 4 (2012), 814--817.

Digital Library

[35]

Peter Brucker. 1978. On the complexity of clustering problems. In Optimization and Operations Research. Springer, 45--54.

[36]

Joachim Buhmann. 1995. Data clustering and learning. In The Handbook of Brain Theory and Neural Networks, Michael A. Arbib (Ed.). MIT Press, 278--281.

Digital Library

[37]

Gail A. Carpenter, Stephen Grossberg, and David B. Rosen. 1991. Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks 4, 6 (1991), 759--771.

Digital Library

[38]

Umit V. Catalyiirek, Kamer Kaya, Johannes Langguth, and Bora Uçar. 2013. A partitioning-based divisive clustering technique for maximizing the modularity. Graph Partitioning and Graph Clustering 588 (2013), 171.

[39]

Kaushik Chakrabarti and Sharad Mehrotra. 2000. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In Proceedings of the 26th VLDB Conference. 89--100.

Digital Library

[40]

Asis Kumar Chattopadhyay, Tanuka Chattyopadhyay, Tuli De, and Saptarshi Mondal. 2013. Independent component analysis for dimension reduction classification: Hough transform and CASH algorithm. In Astrostatistical Challenges for the New Astronomy. Springer, 185--202.

[41]

C. L. Philip Chen and Chun-Yang Zhang. 2014. Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences 275 (2014), 314--347.

[42]

Min Chen, Shiwen Mao, and Yunhao Liu. 2014. Big data: A survey. Mobile Networks and Applications 19, 2 (2014), 171--209.

Digital Library

[43]

Yixin Chen, Guozhu Dong, Jiawei Han, Benjamin W. Wah, and Jianyong Wang. 2002. Multi-dimensional regression analysis of time-series data streams. In Proceedings of the 28th International Conference on Very Large Data Bases. 323--334.

Digital Library

[44]

Chun-Hung Cheng, Ada Waichee Fu, and Yi Zhang. 1999. Entropy-based subspace clustering for mining numerical data. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 84--93.

Digital Library

[45]

Yizong Cheng and George M. Church. 2000. Biclustering of expression data. ISMB 8 (2000), 93--103.

Digital Library

[46]

Vladimir Cherkassky and Filip M. Mulier. 2007. Learning from Data: Concepts, Theory, and Methods. John Wiley 8 Sons.

Digital Library

[47]

Michael Cochez and Hao Mou. 2015. Twister tries: Approximate hierarchical agglomerative clustering for average distance in linear time. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 505--517.

Digital Library

[48]

Ronan Collobert and Samy Bengio. 2001. SVMTorch: Support vector machines for large-scale regression problems. The Journal of Machine Learning Research 1 (2001), 143--160.

Digital Library

[49]

Nick Craswell and Martin Szummer. 2007. Random walks on the click graph. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 239--246.

Digital Library

[50]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1 (2008), 107--113.

Digital Library

[51]

Hongbo Deng and Jiawei Han. 2013. Probabilistic models for clustering. In Data Clustering: Algorithms and Applications, Charu C. Aggarwal and Chandan K. Reddy (Eds.). CRC Press, 61.

[52]

Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 269--274.

Digital Library

[53]

Chris Ding and Xiaofeng He. 2004. K-means clustering via principal component analysis. In Proceedings of the 21st International Conference on Machine Learning. ACM, 29.

Digital Library

[54]

Chris Ding, Xiaofeng He, Hongyuan Zha, and Horst D. Simon. 2002. Adaptive dimension reduction for clustering high dimensional data. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02). IEEE, 147--154.

Digital Library

[55]

Chris H. Q. Ding, Xiaofeng He, Hongyuan Zha, Ming Gu, and Horst D. Simon. 2001. A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM’01). IEEE, 107--114.

Digital Library

[56]

Hristo N. Djidjev and Melih Onus. 2013. Scalable and accurate graph clustering and community structure detection. IEEE Transactions on Parallel and Distributed Systems 24, 5 (2013), 1022--1029.

Digital Library

[57]

Chuong B. Do and Serafim Batzoglou. 2008. What is the expectation maximization algorithm? Nature Biotechnology 26, 8 (2008), 897--899.

[58]

Richard C. Dubes. 1993. Cluster analysis and related issues. In Handbook of Pattern Recognition 8 Computer Vision, C. H. Chen, L. F. Pau, and P. S. P. Wang (Eds.). World Scientific Publishing Co., Inc., 3--32.

Digital Library

[59]

Jordi Duch and Alex Arenas. 2005. Community detection in complex networks using extremal optimization. Physical Review E 72, 2 (2005), 027104.

[60]

Richard O. Duda, Peter E. Hart, and David G. Stork. 2001. Pattern Classification (2nd ed.). Wiley.

Digital Library

[61]

Jack Edmonds. 1965. Paths, trees, and flowers. Canadian Journal of Mathematics 17, 3 (1965), 449--467.

[62]

Michael B. Eisen, Paul T. Spellman, Patrick O. Brown, and David Botstein. 1998. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 95, 25 (1998), 14863--14868.

[63]

Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman. 2013. Taming the curse of dimensionality: Discrete integration by hashing and optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML’13), 334--342.

Digital Library

[64]

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, Vol. 96. 226--231.

Digital Library

[65]

Brian Everitt and Torsten Hothorn. 2011. Cluster analysis. In An Introduction to Applied Multivariate Analysis with R, Robert Gentleman, Kurt Hornik, and Giovanni Parmigiani (Eds.). Springer, 163--200.

[66]

Adil Fahad, Najlaa Alshatri, Zahir Tari, Abdullah Alamri, Ibrahim Khalil, Albert Y. Zomaya, Sebti Foufou, and Abdelaziz Bouras. 2014. A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Transactions on Emerging Topics in Computing 2, 3 (2014), 267--279.

[67]

Gary William Flake, Robert E. Tarjan, and Kostas Tsioutsiouliklis. 2004. Graph clustering and minimum cut trees. Internet Mathematics 1, 4 (2004), 385--408.

[68]

Chris Fraley and Adrian E. Raftery. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 41, 8 (1998), 578--588.

[69]

Laurent Galluccio, Olivier Michel, Pierre Comon, Mark Kliger, and Alfred O. Hero. 2013. Clustering with a new distance measure based on a dual-rooted tree. Information Sciences 251 (2013), 96--113.

[70]

Laurent Galluccio, Olivier Michel, Pierre Comon, Mark Kliger, and Alfred O. Hero. 2013. Hybrid clustering algorithm with modifications enhanced K-means and hierarchal clustering. International Journal of Advanced Research in Computer Science and Software Engineering 3, 5 (2013), 166--170.

[71]

Junhao Gan and Yufei Tao. 2015. DBSCAN revisited: Mis-claim, un-fixability, and approximation. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 519--530.

Digital Library

[72]

Esther Garcia, Francisco Pedroche, and Miguel Romance. 2013. On the localization of the personalized PageRank of complex networks. Linear Algebra and Its Applications 439, 3 (2013), 640--652.

[73]

Andreas Geyer-Schulz and Michael Ovelgönne. 2014. The randomized greedy modularity clustering algorithm and the core groups graph clustering scheme. In German-Japanese Interchange of Data Analysis Results, Wolfgang Gaul, Andreas Geyer-Schulz, Yasumasa Baba, and Akinori Okada (Eds.). Springer, 17--36.

[74]

K. Chidananda Gowda and Edwin Diday. 1991. Symbolic clustering using a new dissimilarity measure. Pattern Recognition 24, 6 (1991), 567--578.

Digital Library

[75]

Sudipto Guha, Nina Mishra, Rajeev Motwani, and Liadan O’Callaghan. 2000. Clustering data streams. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science. IEEE, 359--366.

Digital Library

[76]

Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. 1998. CURE: An efficient clustering algorithm for large databases. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD’98), Vol. 27. ACM, 73--84.

Digital Library

[77]

Michael Hahsler and Matthew Bolaños. 2016. Clustering data streams based on shared density between micro-clusters. IEEE Transactions on Knowledge and Data Engineering 28, 6 (2016), 1449--1461.

Digital Library

[78]

Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. On clustering validation techniques. Journal of Intelligent Information Systems 17, 2 (2001), 107--145.

Digital Library

[79]

Greg Hamerly and Charles Elkan. 2002. Alternatives to the k-means algorithm that find better clusterings. In Proceedings of the 11th International Conference on Information and Knowledge Management. ACM, 600--607.

Digital Library

[80]

Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques. Elsevier.

Digital Library

[81]

Ibrahim Abaker Targio Hashem, Ibrar Yaqoob, Nor Badrul Anuar, Salimah Mokhtar, Abdullah Gani, and Samee Ullah Khan. 2015. The rise of big data on cloud computing: Review and open research issues. Information Systems 47 (2015), 98--115.

Digital Library

[82]

Richard J. Hathaway, James C. Bezdek, and Yingkang Hu. 2000. Generalized fuzzy c-means clustering strategies using L p norm distances. IEEE Transactions on Fuzzy Systems 8, 5 (2000), 576--582.

Digital Library

[83]

Yaobin He, Haoyu Tan, Wuman Luo, Shengzhong Feng, and Jianping Fan. 2014. MR-DBSCAN: A scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Frontiers of Computer Science 8, 1 (2014), 83--99.

Digital Library

[84]

Zengyou He, Xiaofei Xu, and Shengchun Deng. 2008. k-ANMI: A mutual information based clustering algorithm for categorical data. Information Fusion 9, 2 (2008), 223--233.

Digital Library

[85]

Monika Henzinger. 2006. Finding near-duplicate web pages: A large-scale evaluation of algorithms. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 284--291.

Digital Library

[86]

Alexander Hinneburg and Daniel A. Keim. 1998. An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD’98), Vol. 98. 58--65.

Digital Library

[87]

Alexander Hinneburg and Daniel A. Keim. 1999. Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In Proceedings of the 25th VLDB Conference.

Digital Library

[88]

Chenping Hou, Feiping Nie, Dongyun Yi, and Dacheng Tao. 2015. Discriminative embedded clustering: A framework for grouping high-dimensional data. IEEE Transactions on Neural Networks and Learning Systems 26, 6 (2015), 1287--1299.

[89]

Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In Proceedings of the 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW’10). IEEE, 41--51.

[90]

Aapo Hyvarinen. 1999. Survey on independent component analysis. Neural Computing Surveys 2, 4 (1999), 94--128.

[91]

Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice-Hall, Inc.

Digital Library

[92]

Anil K. Jain, Robert P. W. Duin, and Jianchang Mao. 2000. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 1 (2000), 4--37.

Digital Library

[93]

Anil K. Jain, M. Narasimha Murty, and Patrick J. Flynn. 1999. Data clustering: A review. ACM Computing Surveys 31, 3 (1999), 264--323.

Digital Library

[94]

Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 538--543.

Digital Library

[95]

Huidong Jin, Jie Chen, Hongxing He, Graham J. Williams, Chris Kelman, and Christine M. O’Keefe. 2008. Mining unexpected temporal associations: Applications in detecting adverse drug reactions. IEEE Transactions on Information Technology in Biomedicine 12, 4 (2008), 488--500.

Digital Library

[96]

Karin Kailing, Hans-Peter Kriegel, and Peer Kröger. 2004. Density-connected subspace clustering for high-dimensional data. In Proceedings of the 2004 SIAM International Conference on Data Mining, Vol. 4. SIAM.

[97]

George Karypis, Eui-Hong Han, and Vipin Kumar. 1999. Chameleon: Hierarchical clustering using dynamic modeling. Computer 32, 8 (1999), 68--75.

Digital Library

[98]

Leonard Kaufman and Peter J. Rousseeuw. 2009. Finding Groups in Data: An Introduction to Cluster Analysis, Vol. 344. John Wiley 8 Sons.

[99]

Yoonsoo Kim and Mehran Mesbahi. 2006. On maximizing the second smallest eigenvalue of a state-dependent graph Laplacian. IEEE Transactions on Automatic Control 51, 1 (2006), 116--120.

[100]

Jon Kleinberg. 2003. An impossibility theorem for clustering. In Proceedings of the 15th International Conference on Neural Information Processing Systems. 463--470.

Digital Library

[101]

Teuvo Kohonen. 1990. The self-organizing map. Proceedings of the IEEE 78, 9 (1990), 1464--1480.

[102]

Teuvo Kohonen, Samuel Kaski, Krista Lagus, Jarkko Salojärvi, Jukka Honkela, Vesa Paatero, and Antti Saarela. 2000. Self organization of a massive document collection. IEEE Transactions on Neural Networks 11, 3 (2000), 574--585.

Digital Library

[103]

Teuvo Kohonen, M. R. Schroeder, and T. S. Huang. 2001. Self-Organizing Maps. Springer-Verlag, New York, Inc., Secaucus, NJ, 43.

Digital Library

[104]

Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek. 2008. Detecting clusters in moderate-to-high dimensional data: Subspace clustering, pattern-based clustering, and correlation clustering. Proceedings of the VLDB Endowment 1, 2 (2008), 1528--1529.

Digital Library

[105]

Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek. 2009. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data 3, 1 (2009), 1.

Digital Library

[106]

G. N. Lance and W. T. Williams. 1967. A general theory of classification sorting strategies: 1= hierarchical systems, 2= clustering systems. Computer Journal 10, 3 (1967), 271--277.

[107]

Peter Langfelder, Bin Zhang, and Steve Horvath. 2008. Defining clusters from a hierarchical cluster tree: The dynamic tree cut package for R. Bioinformatics 24, 5 (2008), 719--720.

Digital Library

[108]

Kyong-Ha Lee, Yoon-Joon Lee, Hyunsik Choi, Yon Dohn Chung, and Bongki Moon. 2012. Parallel data processing with MapReduce: A survey. ACM SIGMOD Record 40, 4 (2012), 11--20.

Digital Library

[109]

Deyi Li, Shuliang Wang, Wenyan Gan, and Deren Li. 2012. Data field for hierarchical clustering. Developments in Data Extraction, Management, and Analysis (2012), 303.

[110]

Jiuyong Li, Xiaodi Huang, Clinton Selke, and Jianming Yong. 2007. A fast algorithm for finding correlation clusters in noise data. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 639--647.

Digital Library

[111]

Ning Li, Li Zeng, Qing He, and Zhongzhi Shi. 2012. Parallel implementation of Apriori algorithm based on MapReduce. In Proceeding of the 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel 8 Distributed Computing (SNPD). IEEE, 236--241.

Digital Library

[112]

Xin-Ye Li and Li-jie Guo. 2012. Constructing affinity matrix in spectral clustering based on neighbor propagation. Neurocomputing 97 (2012), 125--130.

Digital Library

[113]

Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7, 1 (2003), 76--80.

Digital Library

[114]

Bing Liu, Yiyuan Xia, and Philip S. Yu. 2000. Clustering through decision tree construction. In Proceedings of the 9th International Conference on Information and Knowledge Management. ACM, 20--29.

Digital Library

[115]

Chung Laung Liu. 1968. Introduction to Combinatorial Mathematics, Vol. 181. McGraw-Hill, New York.

[116]

Yuechang Liu and Yong Tang. 2015. Network based framework for author name disambiguation applications. International Journal of u-and e-Service, Science and Technology 8, 9 (2015), 75--82.

[117]

Stuart P. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (1982), 129--137.

Digital Library

[118]

James MacQueen and others. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. Oakland, CA, 281--297.

[119]

Sara C. Madeira and Arlindo L. Oliveira. 2004. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 1 (2004), 24--45.

Digital Library

[120]

Jianchang Mao and Anil K. Jain. 1996. A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Transactions on Neural Networks 7, 1 (1996), 16--29.

Digital Library

[121]

Marie-Hélène Masson and Thierry Denoeux. 2011. Ensemble clustering in the belief functions framework. International Journal of Approximate Reasoning 52, 1 (2011), 92--109.

Digital Library

[122]

Geoffrey J. McLachlan and Kaye E. Basford. 1988. Mixture Models: Inference and Applications to Clustering. Statistics: Textbooks and Monographs. Dekker, New York, Dekker.

[123]

Ryszard S. Michalski and Robert E. Stepp. 1983. Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 4 (1983), 396--410.

Digital Library

[124]

Zijian Ming, Chunjie Luo, Wanling Gao, Rui Han, Qiang Yang, Lei Wang, and Jianfeng Zhan. 2013. BDGS: A scalable big data generator suite in big data benchmarking. In Workshop on Big Data Benchmarks, Tilmann Rabl, Nambiar Raghunath, Meikel Poess, Milind Bhandarkar, Hans-Arno Jacobsen, and Chaitanya Baru (Eds.). Springer, 138--154.

[125]

Jiawei Han and Micheline Kamber. 2001. Data Mining: Concepts and Techniques. Elsevier.

Digital Library

[126]

Priyanka Mukhopadhyay and Bidyut B. Chaudhuri. 2015. A survey of hough transform. Pattern Recognition 48, 3 (2015), 993--1010.

Digital Library

[127]

T. M. Murali and Simon Kasif. 2003. Extracting conserved gene expression motifs from gene expression data. In Pacific Symposium on Biocomputing, Vol. 8. 77--88.

[128]

Fionn Murtagh. 1983. A survey of recent advances in hierarchical clustering algorithms. Computer Journal 26, 4 (1983), 354--359.

[129]

Mor Naaman. 2012. Social multimedia: Highlighting opportunities for search and mining of multimedia data in social media applications. Multimedia Tools and Applications 56, 1 (2012), 9--34.

Digital Library

[130]

Mohammad Hossein Nadimi and Mostafa Mosakhani. 2015. A more accurate clustering method by using co-author social networks for author name disambiguation. Journal of Computing and Security 1, 4 (2015), 307--317.

[131]

Mark E. J. Newman. 2004. Detecting community structure in networks. The European Physical Journal B-Condensed Matter and Complex Systems 38, 2 (2004), 321--330.

[132]

Andrew Y. Ng, Michael I. Jordan, Yair Weiss, and others. 2001. On spectral clustering: Analysis and an algorithm. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01), Vol. 14. 849--856.

Digital Library

[133]

R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Santiago, Chile, 144--155.

Digital Library

[134]

Raymond T. Ng and Jiawei Han. 2002. CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering 14, 5 (2002), 1003--1016.

Digital Library

[135]

Feiping Nie, Chris Ding, Dijun Luo, and Heng Huang. 2010. Improved minmax cut graph clustering with nonnegative relaxation. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 451--466.

Digital Library

[136]

Feiping Nie, Xiaoqian Wang, and Heng Huang. 2014. Clustering and projected clustering with adaptive neighbors. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 977--986.

Digital Library

[137]

Feiping Nie, Xiaoqian Wang, Michael I. Jordan, and Heng Huang. 2016. The constrained Laplacian rank algorithm for graph-based clustering. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). Citeseer, 1969--1976.

Digital Library

[138]

Feiping Nie, Dong Xu, and Xuelong Li. 2012. Initialization independent clustering with actively self-training method. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 1 (2012), 17--27.

Digital Library

[139]

Feiping Nie, Zinan Zeng, Ivor W. Tsang, Dong Xu, and Changshui Zhang. 2011. Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering. IEEE Transactions on Neural Networks 22, 11 (2011), 1796--1808.

Digital Library

[140]

Liadan O’callaghan, Adam Meyerson, Rajeev Motwani, Nina Mishra, and Sudipto Guha. 2002. Streaming-data algorithms for high-quality clustering. In Proceedings of the 18th International Conference on Data Engineering. IEEE, 0685.

Digital Library

[141]

Erkki Oja. 1992. Principal components, minor components, and linear neural networks. Neural Networks 5, 6 (1992), 927--935.

Digital Library

[142]

Nikhil R. Pal, James C. Bezdek, and Eric C. K. Tsao. 1993. Generalized clustering networks and Kohonen’s self-organizing scheme. IEEE Transactions on Neural Networks 4, 4 (1993), 549--557.

Digital Library

[143]

Divya Pandove and Shivani Goel. 2015. A comprehensive study on clustering approaches for big data mining. In Proceedings of the 2015 2nd International Conference on Electronics and Communication Systems (ICECS). IEEE, 1333--1338.

[144]

Divya Pandove and Shivani Goel. 2015. Prototyping and in-depth analysis of big data benchmarking. In Proceedings of the 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM). IEEE, 1222--1229.

[145]

Lance Parsons, Ehtesham Haque, and Huan Liu. 2004. Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations Newsletter 6, 1 (2004), 90--105.

Digital Library

[146]

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM, 165--178.

Digital Library

[147]

Adriano Pereira, Leonardo Rocha, Fernando Mourão, Paulo Góes, and Wagner Meira Jr. 2009. Reactivity based model to study online auctions dynamics. Information Technology and Management 10, 1 (2009), 21--37.

Digital Library

[148]

K. Rajendra Prasad and B. Eswara Reddy. 2013. Assessment of clustering tendency through progressive random sampling and graph-based clustering results. In Proceedings of the 2013 IEEE 3rd International Advance Computing Conference (IACC). IEEE, 726--731.

[149]

Aaron Quigley and Peter Eades. 2000. FADE: Graph drawing, clustering, and visual abstraction. In International Symposium on Graph Drawing. Springer, 197--210.

Digital Library

[150]

M. Kuchaki Rafsanjani, Z. Asghari Varzaneh, and N. Emami Chukanlo. 2012. A survey of hierarchical clustering algorithms. The Journal of Mathematics and Computer Science 5, 3 (2012), 229--240.

[151]

Anand Rajaraman, Jeffrey D. Ullman, Jeffrey David Ullman, and Jeffrey David Ullman. 2012. Mining of Massive Datasets, Vol. 1. Cambridge University Press, Cambridge.

Digital Library

[152]

I. K. Ravichandra Rao. 2003. Data mining and clustering techniques. In Proceedings of DRTC Workshop on Semantic Web, Vol. 8.

[153]

Bellman Richard. 1961. Adaptive Control Processes: A Guided Tour. Princeton University Press.

[154]

Andrew Rosenberg and Julia Hirschberg. 2007. V-Measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of EMNLP-CoNLL, Vol. 7. 410--420.

[155]

Satu Elisa Schaeffer. 2007. Graph clustering. Computer Science Review 1, 1 (2007), 27--64.

Digital Library

[156]

John Scott. 2012. Social Network Analysis. Sage.

[157]

M. Omair Shafiq and Eric Torunski. 2016. A parallel K-medoids algorithm for clustering based on MapReduce. In Proceedings of 15th IEEE International Conference on Machine Learning and Applications (ICMLA’16). IEEE, 502--507.

[158]

B. A. Shboul and Sung-Hyon Myaeng. 2009. Initializing k-means using genetic algorithms. (2009).

[159]

Gholamhosein Sheikholeslami, Surojit Chatterjee, and Aidong Zhang. 1998. Wavecluster: A multi-resolution clustering approach for very large spatial databases. In Proceedings of VLDB, Vol. 98. 428--439.

Digital Library

[160]

Peter H. A. Sneath. 1957. The application of computers to taxonomy. Microbiology 17, 1 (1957), 201--226.

[161]

Thorvald Sørensen. 1948. A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. Biologiske skrifter 5 (1948), 1--34.

[162]

Michael Steinbach, George Karypis, Vipin Kumar, and others. 2000. A comparison of document clustering techniques. In Proceedings of KDD Workshop on Text Mining, Vol. 400. Boston, 525--526.

[163]

Mark Steyvers and Tom Griffiths. 2007. Probabilistic topic models. Handbook of Latent Semantic Analysis 427, 7 (2007), 424--440.

[164]

Eric A. Stone and Julien F. Ayroles. 2009. Modulated modularity clustering as an exploratory tool for functional genomic inference. PLoS Genetics 5, 5 (2009), e1000479.

[165]

Mu-Chun Su and Chien-Hsing Chou. 2001. A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Transactions on Pattern Analysis 8 Machine Intelligence 6 (2001), 674--680.

Digital Library

[166]

Yizhou Sun and Jiawei Han. 2013. Meta-path-based search and mining in heterogeneous information networks. Tsinghua Science and Technology 18, 4 (2013), 329--338.

[167]

Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter 14, 2 (2013), 20--28.

Digital Library

[168]

Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In Proceedings of the VLDB Endowment. 11.

Digital Library

[169]

Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. Rankclus: Integrating clustering with ranking for heterogeneous information network analysis. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, 565--576.

Digital Library

[170]

Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S. Yu, and Xiao Yu. 2013. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 7, 3 (2013), 11.

Digital Library

[171]

Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 797--806.

Digital Library

[172]

Rashish Tandon and Suvrit Sra. 2010. Sparse nonnegative matrix approximation: New formulations and algorithms. Rapport Technique 193 (2010), 38--42.

[173]

Zhuo Tang, Kunkun Liu, Jinbo Xiao, Li Yang, and Zheng Xiao. 2017. A parallel k-means clustering algorithm based on redundance elimination and extreme points optimization employing MapReduce. Concurrency and Computation: Practice and Experience 29, 20 (2017), 1--18.

[174]

Joshua B. Tenenbaum, Vin De Silva, and John C. Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 5500 (2000), 2319--2323.

[175]

Anthony K. H. Tung, Xin Xu, and Beng Chin Ooi. 2005. Curler: Finding and visualizing nonlinear correlation clusters. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. ACM, 467--478.

Digital Library

[176]

Lei Wang, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, Wanling Gao, Zhen Jia, Yingjie Shi, Shujie Zhang, and others. 2014. Bigdatabench: A big data benchmark suite from internet services. In Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE, 488--499.

[177]

Wei Wang, Jiong Yang, Richard Muntz, and others. 1997. STING: A statistical information grid approach to spatial data mining. In Proceedings of VLDB, Vol. 97. 186--195.

Digital Library

[178]

William J. Welch. 1982. Algorithmic complexity: Three NP-hard problems in computational statistics. Journal of Statistical Computation and Simulation 15, 1 (1982), 17--25.

[179]

Douglas Brent West and others. 2001. Introduction to Graph Theory, Vol. 2. Prentice Hall Upper Saddle River.

[180]

Tom White. 2012. Hadoop: The Definitive Guide. O’Reilly Media, Inc.

Digital Library

[181]

Rui Xu and Donald Wunsch. 2005. Survey of clustering algorithms. IEEE Transactions on Neural Networks 16, 3 (2005), 645--678.

Digital Library

[182]

Rui Xu, Donald Wunsch, and others. 2005. Survey of clustering algorithms. IEEE Transactions on Neural Networks 16, 3 (2005), 645--678.

Digital Library

[183]

Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. 2007. Map-reduce-merge: Simplified relational data processing on large clusters. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. ACM, 1029--1040.

Digital Library

[184]

Forrest W. Young. 2013. Multidimensional Scaling: History, Theory, and Applications. Psychology Press.

[185]

Jane Yang Yu and Peter Han Joo Chong. 2005. A survey of clustering schemes for mobile ad hoc networks. IEEE Communications Surveys 8 Tutorials 7, 1 (2005), 32--48.

Digital Library

[186]

Btissam Zerhari, Ayoub Ait Lahcen, and Salma Mouline. 2015. Big data clustering: Algorithms and challenges. In Proceedings of the International Conference on Big Data, Cloud and Applications (BDCA’15).

[187]

Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: An efficient data clustering method for very large databases. In Proceedings of ACM Sigmod Record, Vol. 25. ACM, 103--114.

Digital Library

[188]

Weizhong Zhao, Huifang Ma, and Qing He. 2009. Parallel k-means clustering based on MapReduce. In Proceedings of IEEE International Conference on Cloud Computing. Springer, 674--679.

Digital Library

[189]

Ding Zhou, Sergey A. Orshanskiy, Hongyuan Zha, and C. Lee Giles. 2007. Co-ranking authors and documents in a heterogeneous network. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE, 739--744.

Digital Library

[190]

Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2, 1 (2009), 718--729.

Digital Library

[191]

Xinhua Zhuang, Yan Huang, Kannappan Palaniappan, and Yunxin Zhao. 1996. Gaussian mixture density modeling, decomposition, and applications. IEEE Transactions on Image Processing 5, 9 (1996), 1293--1302.

Digital Library

[192]

Arthur Zimek. 2009. Correlation clustering. ACM SIGKDD Explorations Newsletter 11, 1 (2009), 53--54.

Digital Library

Cited By

Zhang JZhou JHua JNiu NLiu C(2025)Mining user privacy concern topics from app reviewsJournal of Systems and Software10.1016/j.jss.2025.112355222(112355)Online publication date: Apr-2025
https://doi.org/10.1016/j.jss.2025.112355
Pang JHuang Q(2025)Towards scalable topic detection on web via simulating Lévy walks nature of topics in similarity spaceInformation Sciences10.1016/j.ins.2024.121544690(121544)Online publication date: Feb-2025
https://doi.org/10.1016/j.ins.2024.121544
Pang JHu AHuang Q(2025)Bundle fragments into a whole: Mining more complete clusters via submodular selection of interesting webpages for web topic detectionExpert Systems with Applications10.1016/j.eswa.2024.125125260(125125)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125125
Show More Cited By

Index Terms

Systematic Review of Clustering High-Dimensional and Large Datasets
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Recommendations

Subspace clustering for high dimensional data: a review
Special issue on learning from imbalanced datasets

Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. Often in high dimensional data, many dimensions are irrelevant and can mask existing clusters in noisy data. Feature ...
Using Projection-Based Clustering to Find Distance- and Density-Based Clusters in High-Dimensional Data
Abstract
For high-dimensional datasets in which clusters are formed by both distance and density structures (DDS), many clustering algorithms fail to identify these clusters correctly. This is demonstrated for 32 clustering algorithms using a suite of ...
Efficient Feature Clustering for High-Dimensional Datasets: A Non-Parametric Approach
Abstract
Clustering methods have become widely popular in the field of data analysis, as they enable the grouping of similar data points. However, the challenge of clustering high-dimensional data remains a significant obstacle due to the ”curse of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 12, Issue 2

Survey Papers and Regular Papers

April 2018

376 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3178544

Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
University of Louisiana at Lafayette, USA

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 January 2018

Accepted: 01 August 2017

Revised: 01 June 2017

Received: 01 September 2016

Published in TKDD Volume 12, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

51
Total Citations
View Citations
1,458
Total Downloads

Downloads (Last 12 months)157
Downloads (Last 6 weeks)17

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JZhou JHua JNiu NLiu C(2025)Mining user privacy concern topics from app reviewsJournal of Systems and Software10.1016/j.jss.2025.112355222(112355)Online publication date: Apr-2025
https://doi.org/10.1016/j.jss.2025.112355
Pang JHuang Q(2025)Towards scalable topic detection on web via simulating Lévy walks nature of topics in similarity spaceInformation Sciences10.1016/j.ins.2024.121544690(121544)Online publication date: Feb-2025
https://doi.org/10.1016/j.ins.2024.121544
Pang JHu AHuang Q(2025)Bundle fragments into a whole: Mining more complete clusters via submodular selection of interesting webpages for web topic detectionExpert Systems with Applications10.1016/j.eswa.2024.125125260(125125)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125125
Kumar AAjani ODas SMallipeddi R(2025)Entropy-weighted medoid shift: An automated clustering algorithm for high-dimensional dataApplied Soft Computing10.1016/j.asoc.2024.112347169(112347)Online publication date: Jan-2025
https://doi.org/10.1016/j.asoc.2024.112347
Battaglia EPeiretti FPensa R(2024)Co-clustering: A Survey of the Main Methods, Recent Trends, and Open ProblemsACM Computing Surveys10.1145/369887557:2(1-33)Online publication date: 4-Oct-2024
https://dl.acm.org/doi/10.1145/3698875
Mersha MGemeda yigezu MKalita J(2024)Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering AlgorithmsProcedia Computer Science10.1016/j.procs.2024.10.185244(121-132)Online publication date: 2024
https://doi.org/10.1016/j.procs.2024.10.185
Herrmann MKazempour DScheipl FKröger P(2024)Enhancing cluster analysis via topological manifold learningData Mining and Knowledge Discovery10.1007/s10618-023-00980-238:3(840-887)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s10618-023-00980-2
Yu XXiong TJiang WZhou J(2023)Comparative Assessment of the Efficacy of the Five Kinds of Models in Landslide Susceptibility Map for Factor Screening: A Case Study at Zigui-Badong in the Three Gorges Reservoir Area, ChinaSustainability10.3390/su1501080015:1(800)Online publication date: 1-Jan-2023
https://doi.org/10.3390/su15010800
Laila Ab Ghani NAbdul Aziz IJadid AbdulKadir S(2023)Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature ReviewComputers, Materials & Continua10.32604/cmc.2023.03598775:2(4649-4668)Online publication date: 2023
https://doi.org/10.32604/cmc.2023.035987
Yuan JLiang ZWang RLi YWang ZGao J(2023)A novel self-learning framework for fault identification of wind turbine drive bearingsProceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering10.1177/09596518231153231237:7(1296-1312)Online publication date: 5-Feb-2023
https://doi.org/10.1177/09596518231153231
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents