Skip to main content

Advertisement

Log in

Accelerating Infinite Ensemble of Clustering by Pivot Features

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

The infinite ensemble clustering (IEC) incorporates both ensemble clustering and representation learning by fusing infinite basic partitions and shows appealing performance in the unsupervised context. However, it needs to solve the linear equation system with the high time complexity in proportion to O(d3) where d is the concatenated dimension of many clustering results. Inspired by the cognitive characteristic of human memory that can pay attention to the pivot features in a more compressed data space, we propose an acceleration version of IEC (AIEC) by extracting the pivot features and learning the multiple mappings to reconstruct them, where the linear equation system can be solved with the time complexity O(dr2) (rd). Experimental results on the standard datasets including image and text ones show that our algorithm AIEC improves the running time of IEC greatly but achieves the comparable clustering performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html.

  2. https://archive.ics.uci.edu/ml/datasets/letter+recognition

  3. http://yann.lecun.com/exdb/mnist/

  4. http://www.cad.zju.edu.cn/home/dengcai/Data/ORL/ORL_32x32.mat

  5. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html

  6. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#news20

References

  1. Bailey K. Numerical Taxonomy and cluster a. Typologies and Taxonomies. CA: SAGE Publications Ltd; 1994.

  2. Filipovych R, Resnick SM, Davatzikos C. Semi-supervised cluster analysis of imaging data. NeuroImage 2011;54(3):2185–2197.

    Article  PubMed  Google Scholar 

  3. Bewley A, Upcroft B. Advantages of exploiting projection structure for segmenting dense 3D point clouds. Proceedings of the 2013 Australasian Conference on Robotics and Automation, Australian Robotics & Automation Association. In: Katupitiya J, Guivant J, and Eaton R, editors. Sydney: University of New South Wales; 2013. p. 1–8.

  4. Kim G, Xing EP. Reconstructing storyline graphs for image recommendation from web community photos. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14. Washington: IEEE Computer Society; 2014. p. 3882–3889.

  5. Estivill-Castro V. Why so many clustering algorithms: a position paper. SIGKDD Explor Newsl 2002;4(1):65–75.

    Article  Google Scholar 

  6. Li X, Lu Q, Dong Y, Tao D. SCE: A manifold regularized set-covering method for data partitioning. IEEE Trans Neural Netw Learn Syst 2017;PP(99):1–14.

    Google Scholar 

  7. Breiman L. Bagging predictors. Mach Learn 1996;24(2):123–140.

    Google Scholar 

  8. Luo D, Ding C, Huang H, Nie F. Consensus spectral clustering in near-linear time. Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, ICDE ’11. Washington: IEEE Computer Society; 2011. p. 1079–1090.

  9. Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 2013;35(8):1798–1828.

    Article  PubMed  Google Scholar 

  10. Hinton GE, Osindero S, Teh Y -W. A fast learning algorithm for deep belief nets. Neural Comput 2006; 18(7):1527–1554.

    Article  PubMed  Google Scholar 

  11. Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19. In: Schölkopf PB, Platt JC, and Hoffman T, editors. MIT Press; 2007. p. 153–160.

  12. Song C, Liu F, Huang Y, Wang L, Tan T. 2013. Auto-encoder based data clustering: Springer, Berlin.

  13. Huang P, Huang Y, Wang W, Wang L. Deep embedding network for clustering. In: 2014 22nd International Conference on Pattern Recognition; 2014. p. 1532–1537.

  14. Liu H, Shao M, Li S, Fu Y. Infinite ensemble for image clustering. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. New York: ACM; 2016. p. 1745–1754.

  15. Alelyani S, Tang J, Liu H. 2013. Feature selection for clustering: a review. In: Data Clustering: Algorithms and Applications .

  16. Klawonn F, Keller A. Fuzzy clustering based on modified distance measures. Advances in Intelligent Data Analysis. Berlin: Springer; 1999. p. 291–301.

  17. Jiu M, Wolf C, Garcia C, Baskurt A. Supervised learning and codebook optimization for bag-of-words models. Cognitive Comput 2012;4(4):409–419.

    Article  Google Scholar 

  18. Pandarachalil R, Sendhilkumar S, Mahalakshmi GS. Twitter sentiment analysis for large-scale data: an unsupervised approach. Cognitive Comput 2015;7(2):254–262.

    Article  Google Scholar 

  19. Jin X -B, Geng G -G, Sun M, Zhang D. Combination of multiple bipartite ranking for multipartite web content quality evaluation. Neurocomputing 2015;149:1305–1314.

    Article  Google Scholar 

  20. Ding S, Zhang J, Jia H, Qian J. An adaptive density data stream clustering algorithm. Cognitive Comput 2016;8(1):30–38.

    Article  Google Scholar 

  21. MacQueen J. 1967. Some methods for classification and analysis of multivariate observations The Regents of the University of California.

  22. Bishop CM. Pattern recognition and machine learning. New York: Springer; 2006.

    Google Scholar 

  23. De la Torre F, Kanade T. Discriminative cluster analysis. Proceedings of the 23rd International Conference on Machine Learning, ICML ’06. New York: ACM; 2006. p. 241–248.

  24. Li X, Cui G, Dong Y. Graph regularized non-negative low-rank matrix factorization for image clustering. IEEE Trans Cybern 2017;47(11):3840–3853.

    Article  PubMed  Google Scholar 

  25. Li X, Cui G, Dong Y. Refined-graph regularization-based nonnegative matrix factorization. ACM Trans Intell Syst Technol 2017;9(1):1:1–1:21.

    Google Scholar 

  26. Fred A. Finding consistent clusters in data partitions. Multiple Classifier Systems. Berlin: Springer; 2001. p. 309–318.

  27. Topchy A, Jain AK, Punch W. Combining multiple weak clusterings. In: Third IEEE International Conference on Data Mining; 2003. p. 331–338.

  28. Fred ALN, Jain AK. Learning pairwise similarity for data clustering. In: 18th International Conference on Pattern Recognition (ICPR’06); 2006. Vol 1. p. 925–928.

  29. Vega-Pons S, Ruiz-Shulcloper J. A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 2011;25(03):337–372.

    Article  Google Scholar 

  30. Minaei-Bidgoli B, Topchy A, Punch WF. Ensembles of partitions via data resampling. In: International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004., Vol. 2; 2004. p. 188–192.

  31. Chen M, Xu Z, Weinberger KQ, Sha F. Marginalized denoising autoencoders for domain adaptation. Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12. USA: Omni Press; 2012. p. 1627–1634.

  32. Glorot X, Bordes A, Bengio Y. Domain adaptation for large-scale sentiment classification: a deep learning approach. In Proceedings of the Twenty-eight International Conference on Machine learning, ICML; 2011.

  33. Bingham E, Mannila H. Random projection in dimensionality reduction: applications to image and text data. San Francisco: ACM Press; 2001, pp. 245–250.

    Google Scholar 

  34. Achlioptas D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J Comput Syst Sci 2003;66(4):671–687.

    Article  Google Scholar 

  35. Li P, Hastie TJ, Church KW. Very sparse random projections. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06. New York: ACM; 2006. p. 287–296.

  36. Bache K, Lichman M. UCI Repository of machine learning databases, Ph.D. thesis, University of California. Irvine: School of Information and Computer Sciences; 1998.

    Google Scholar 

  37. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE 1998;86(11):2278–2324.

    Article  Google Scholar 

  38. Samaria FS, Harter AC. Parameterisation of a stochastic model for human face identification. In: IEEE Workshop on Applications of Computer Vision; 1994. p. 138–142.

  39. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. 2007. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) results.

  40. Lang K. NewsWeeder: learning to filter netnews. In: ICML; 1995. p. 331–339.

  41. Strehl A, Strehl E, Ghosh J, Mooney R. Impact of similarity measures on web-page clustering, in: In Workshop on Artificial Intelligence for Web Search (AAAI 2000, AAAI; 2000. p. 58–64.

  42. Brun M, Sima C, Hua J, Lowey J, Carroll B, Suh E, Dougherty ER. Model-based evaluation of clustering validation measures. Pattern Recogn. 2007;40(3):807–824.

    Article  Google Scholar 

Download references

Funding

This work was partially supported by the Fundamental Research Funds for the Henan Provincial Colleges and Universities in the Henan University of Technology (2016RCJH06), the National Key Research & Development Program 418 (2016YFD0400104-5), the National Basic Research Program of China (2012CB316301), the National Natural Science Foundation of China (61103138 and 61473236).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Bo Jin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, XB., Xie, GS., Huang, K. et al. Accelerating Infinite Ensemble of Clustering by Pivot Features. Cogn Comput 10, 1042–1050 (2018). https://doi.org/10.1007/s12559-018-9583-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-018-9583-8

Keywords

Navigation