Abstract
The infinite ensemble clustering (IEC) incorporates both ensemble clustering and representation learning by fusing infinite basic partitions and shows appealing performance in the unsupervised context. However, it needs to solve the linear equation system with the high time complexity in proportion to O(d3) where d is the concatenated dimension of many clustering results. Inspired by the cognitive characteristic of human memory that can pay attention to the pivot features in a more compressed data space, we propose an acceleration version of IEC (AIEC) by extracting the pivot features and learning the multiple mappings to reconstruct them, where the linear equation system can be solved with the time complexity O(dr2) (r ≪ d). Experimental results on the standard datasets including image and text ones show that our algorithm AIEC improves the running time of IEC greatly but achieves the comparable clustering performance.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Bailey K. Numerical Taxonomy and cluster a. Typologies and Taxonomies. CA: SAGE Publications Ltd; 1994.
Filipovych R, Resnick SM, Davatzikos C. Semi-supervised cluster analysis of imaging data. NeuroImage 2011;54(3):2185–2197.
Bewley A, Upcroft B. Advantages of exploiting projection structure for segmenting dense 3D point clouds. Proceedings of the 2013 Australasian Conference on Robotics and Automation, Australian Robotics & Automation Association. In: Katupitiya J, Guivant J, and Eaton R, editors. Sydney: University of New South Wales; 2013. p. 1–8.
Kim G, Xing EP. Reconstructing storyline graphs for image recommendation from web community photos. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14. Washington: IEEE Computer Society; 2014. p. 3882–3889.
Estivill-Castro V. Why so many clustering algorithms: a position paper. SIGKDD Explor Newsl 2002;4(1):65–75.
Li X, Lu Q, Dong Y, Tao D. SCE: A manifold regularized set-covering method for data partitioning. IEEE Trans Neural Netw Learn Syst 2017;PP(99):1–14.
Breiman L. Bagging predictors. Mach Learn 1996;24(2):123–140.
Luo D, Ding C, Huang H, Nie F. Consensus spectral clustering in near-linear time. Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, ICDE ’11. Washington: IEEE Computer Society; 2011. p. 1079–1090.
Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 2013;35(8):1798–1828.
Hinton GE, Osindero S, Teh Y -W. A fast learning algorithm for deep belief nets. Neural Comput 2006; 18(7):1527–1554.
Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19. In: Schölkopf PB, Platt JC, and Hoffman T, editors. MIT Press; 2007. p. 153–160.
Song C, Liu F, Huang Y, Wang L, Tan T. 2013. Auto-encoder based data clustering: Springer, Berlin.
Huang P, Huang Y, Wang W, Wang L. Deep embedding network for clustering. In: 2014 22nd International Conference on Pattern Recognition; 2014. p. 1532–1537.
Liu H, Shao M, Li S, Fu Y. Infinite ensemble for image clustering. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. New York: ACM; 2016. p. 1745–1754.
Alelyani S, Tang J, Liu H. 2013. Feature selection for clustering: a review. In: Data Clustering: Algorithms and Applications .
Klawonn F, Keller A. Fuzzy clustering based on modified distance measures. Advances in Intelligent Data Analysis. Berlin: Springer; 1999. p. 291–301.
Jiu M, Wolf C, Garcia C, Baskurt A. Supervised learning and codebook optimization for bag-of-words models. Cognitive Comput 2012;4(4):409–419.
Pandarachalil R, Sendhilkumar S, Mahalakshmi GS. Twitter sentiment analysis for large-scale data: an unsupervised approach. Cognitive Comput 2015;7(2):254–262.
Jin X -B, Geng G -G, Sun M, Zhang D. Combination of multiple bipartite ranking for multipartite web content quality evaluation. Neurocomputing 2015;149:1305–1314.
Ding S, Zhang J, Jia H, Qian J. An adaptive density data stream clustering algorithm. Cognitive Comput 2016;8(1):30–38.
MacQueen J. 1967. Some methods for classification and analysis of multivariate observations The Regents of the University of California.
Bishop CM. Pattern recognition and machine learning. New York: Springer; 2006.
De la Torre F, Kanade T. Discriminative cluster analysis. Proceedings of the 23rd International Conference on Machine Learning, ICML ’06. New York: ACM; 2006. p. 241–248.
Li X, Cui G, Dong Y. Graph regularized non-negative low-rank matrix factorization for image clustering. IEEE Trans Cybern 2017;47(11):3840–3853.
Li X, Cui G, Dong Y. Refined-graph regularization-based nonnegative matrix factorization. ACM Trans Intell Syst Technol 2017;9(1):1:1–1:21.
Fred A. Finding consistent clusters in data partitions. Multiple Classifier Systems. Berlin: Springer; 2001. p. 309–318.
Topchy A, Jain AK, Punch W. Combining multiple weak clusterings. In: Third IEEE International Conference on Data Mining; 2003. p. 331–338.
Fred ALN, Jain AK. Learning pairwise similarity for data clustering. In: 18th International Conference on Pattern Recognition (ICPR’06); 2006. Vol 1. p. 925–928.
Vega-Pons S, Ruiz-Shulcloper J. A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 2011;25(03):337–372.
Minaei-Bidgoli B, Topchy A, Punch WF. Ensembles of partitions via data resampling. In: International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004., Vol. 2; 2004. p. 188–192.
Chen M, Xu Z, Weinberger KQ, Sha F. Marginalized denoising autoencoders for domain adaptation. Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12. USA: Omni Press; 2012. p. 1627–1634.
Glorot X, Bordes A, Bengio Y. Domain adaptation for large-scale sentiment classification: a deep learning approach. In Proceedings of the Twenty-eight International Conference on Machine learning, ICML; 2011.
Bingham E, Mannila H. Random projection in dimensionality reduction: applications to image and text data. San Francisco: ACM Press; 2001, pp. 245–250.
Achlioptas D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J Comput Syst Sci 2003;66(4):671–687.
Li P, Hastie TJ, Church KW. Very sparse random projections. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06. New York: ACM; 2006. p. 287–296.
Bache K, Lichman M. UCI Repository of machine learning databases, Ph.D. thesis, University of California. Irvine: School of Information and Computer Sciences; 1998.
Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE 1998;86(11):2278–2324.
Samaria FS, Harter AC. Parameterisation of a stochastic model for human face identification. In: IEEE Workshop on Applications of Computer Vision; 1994. p. 138–142.
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. 2007. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) results.
Lang K. NewsWeeder: learning to filter netnews. In: ICML; 1995. p. 331–339.
Strehl A, Strehl E, Ghosh J, Mooney R. Impact of similarity measures on web-page clustering, in: In Workshop on Artificial Intelligence for Web Search (AAAI 2000, AAAI; 2000. p. 58–64.
Brun M, Sima C, Hua J, Lowey J, Carroll B, Suh E, Dougherty ER. Model-based evaluation of clustering validation measures. Pattern Recogn. 2007;40(3):807–824.
Funding
This work was partially supported by the Fundamental Research Funds for the Henan Provincial Colleges and Universities in the Henan University of Technology (2016RCJH06), the National Key Research & Development Program 418 (2016YFD0400104-5), the National Basic Research Program of China (2012CB316301), the National Natural Science Foundation of China (61103138 and 61473236).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Rights and permissions
About this article
Cite this article
Jin, XB., Xie, GS., Huang, K. et al. Accelerating Infinite Ensemble of Clustering by Pivot Features. Cogn Comput 10, 1042–1050 (2018). https://doi.org/10.1007/s12559-018-9583-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-018-9583-8