Abstract
Clustering is a crucial step in scientific data analysis and engineering systems. Thus, an efficient cluster analysis method often remains a key challenge. In this paper, we introduce a general purpose exemplar-based clustering method called (MEGA), which performs a novel message-passing strategy based on variational expectation–maximization and generalized arc-consistency techniques. Unlike message passing clustering methods, MEGA formulates the message-passing schema as E- and M-steps of variational expectation–maximization based on a reparameterized factor graph. It also exploits an adaptive variant of generalized arc consistency technique to perform a variational mean-field approximation in E-step to minimize a Kullback–Leibler divergence on the model evidence. Dissimilar to density-based clustering methods, MEGA has no sensitivity to initial parameters. In contrast to partition-based clustering methods, MEGA does not require pre-specifying the number of clusters. We focus on the binary-variable factor graph to model the clustering problem but MEGA is applicable to other graphical models in general. Our experiments on real-world problems demonstrate the efficiency of MEGA over existing prominent clustering algorithms such as Affinity propagation, Agglomerative, DBSCAN, K-means, and EM.
Similar content being viewed by others
Notes
We say that a factor node has deterministic dependency if at least one of its tuples has zero probability.
Note that the local factors have been summed since we use log-domain formulation of the objective function.
Note that, based on Jensen’s inequality, each update step that minimizes the Kullback–Leibler divergence, also maximizes the lower bounding on the model evidence (cf. Beal and Ghahramani 2003).
Publicly available: http://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008.
Publicly available at: http://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones.
Publicly available: http://archive.ics.uci.edu/ml/datasets/Wall-Following+Robot+Navigation+Data.
Publicly available: http://konect.uni-koblenz.de/networks/ucidata-zachary.
Note that the Jaccard distance satisfies all conditions of the distance measure, including the triangle inequality.
publicly available: http://scikit-learn.org/stable/modules/clustering.html
Publicly available: https://github.com/bnpy/bnpy.
References
Ahmadi B, Kersting K, Mladenov M, Natarajan S (2013) Exploiting symmetries for scaling loopy belief propagation and relational training. Mach Learn 92(1):91–132
Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A public domain dataset for human activity recognition using smartphones. In: 21th European symposium on artificial neural networks, computational intelligence and machine learning, ESANN
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035
Beal MJ, Ghahramani Z (2003) The variational bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayesian Stat. 7:453–464
Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, Berlin, pp 25–71
Cannistraci CV, Ravasi T, Montevecchi FM, Ideker T, Alessio M (2010) Nonlinear dimension reduction and clustering by minimum curvilinearity unfold neuropathic pain and tissue embryological classes. Bioinformatics 26(18):i531–i539
Cheeseman PC, Stutz JC (1996) Bayesian classification (autoclass): theory and results. In: Advances in knowledge discovery and data mining, CA, USA, pp 153–180
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Dalli A (2003) Adaptation of the f-measure to cluster based lexicon quality evaluation. In: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable? Association for Computational Linguistics, pp 51–56
Danon L, Diaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp. https://doi.org/10.1088/1742-5468/2005/09/P09008
Day WH, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1(1):7–24
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological) 39:1–38
Elidan G, McGraw I, Koller D (2006) Residual belief propagation: informed scheduling for asynchronous message passing. In: Proceedings of the twenty-second conference annual conference on uncertainty in artificial intelligence (UAI-06). AUAI Press, Arlington, Virginia, pp 165–173
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: The second international conference on knowledge discovery and data mining, vol 96, pp 226–231
Fraley C, Raftery AE (1998) How many clusters? which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Fujiwara Y, Irie G, Kitahara T et al (2011) Fast algorithm for affinity propagation. In: IJCAI proceedings-international joint conference on artificial intelligence, vol 22:3, p 2238
Givoni IE (2012) Beyond affinity propagation: message passing algorithms for clustering. Citeseer
Givoni I, Frey B (2009a) Semi-supervised affinity propagation with instance-level constraints. In: Artificial intelligence and statistics, pp 161–168
Givoni IE, Frey BJ (2009b) A binary variable model for affinity propagation. Neural Comput 21(6):1589–1600
Givoni IE, Chung C, Frey BJ (2011) Hierarchical affinity propagation. In: Proceedings of the twenty-seventh conference on uncertainty in artificial intelligence. AUAI Press, Cambridge, pp 238–246
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3):107–145
Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Ser B (Methodological) 58:155–176
Heskes T (2004) On the uniqueness of loopy belief propagation fixed points. Neural Comput 16(11):2379–2413
Horsch MC, Havens WS (2000) Probabilistic arc consistency: a connection between constraint reasoning and probabilistic reasoning. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc, pp 282–290
Ibrahim MH, Pal C, Pesant G (2017) Improving probabilistic inference in graphical models with determinism and cycles. Mach Learn 106(1):1–54
Jamshidian M, Jennrich RI (1997) Acceleration of the EM algorithm by using quasi-Newton methods. J R Stat Soc Ser B (Stat Methodol) 59(3):569–587
Jiang B, Pei J, Tao Y, Lin X (2013) Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng 25(4):751–763
Jiang Y, Liao Y, Yu G (2016) Affinity propagation clustering using path based similarity. Algorithms 9(3):46
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge
Lam D, Wunsch DC (2014) Clustering. In: Academic Press library in signal processing, vol 1, pp 1115–1149. Elsevier, Amsterdam
Lashkari D, Golland P (2008) Convex clustering with exemplar-based models. In: Advances in neural information processing systems, pp 825–832
Leone M, Weigt M (2007) Clustering by soft-constraint affinity propagation: applications to gene-expression data. Bioinformatics 23(20):2708–2715
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Mai ST, Assent I, Jacobsen J, Dieu MS (2018) Anytime parallel density-based clustering. In: Data mining and knowledge discovery pp 1–56
McLachlan G, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, New York
Mooij JM, Kappen HJ (2005) Sufficient conditions for convergence of loopy belief propagation. In: Proceedings of the twenty-first conference on uncertainty in artificial intelligence, UAI’05, pp. 396–403. AUAI Press, Arlington, Virginia, USA. http://dl.acm.org/citation.cfm?id=3020336.3020386
Murphy K, Weiss Y, Jordan M (1999) Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the fifteenth conference annual conference on uncertainty in artificial intelligence (UAI-99), Stockholm, Sweden. Morgan Kaufmann, pp 467–476
Neal RM, Hinton GE (1999) Learning in graphical models. chap. In: A view of the EM algorithm that justifies incremental, sparse, and other variants, MIT Press, Cambridge, pp 355–368
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 849–856
Nguyen DT, Chen L, Chan CK (2012) Clustering with multiviewpoint-based similarity measure. IEEE Trans Knowl Data Eng 24(6):988–1001
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Burlington
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Petersen KB, Winther O, Hansen LK (2005) On the slow convergence of EM and VBEM in low-noise linear models. Neural Comput 17(9):1921–1926
Potetz B (2007) Efficient belief propagation for vision using linear constraint nodes. In: Proceeding of IEEE conference on computer vision and pattern recognition (CVPR’07), IEEE computer society, Minneapolis, MN, USA, pp 1–8
Rasmussen CE (2000) The infinite Gaussian mixture model. In: Advances in neural information processing systems, pp. 554–560
Rawashdeh A, Ralescu AL (2015) Similarity measure for social networks—A brief survey. In: Proceedings of the 26th modern AI and cognitive science conference 2015, Greensboro, NC, USA, 25–26 April 2015, pp 153–159
Roosta T, Wainwright MJ, Sastry SS (2008) Convergence analysis of reweighted sum-product algorithms. IEEE Trans Signal Process 56(9):4293–4305
Rossi F, Van Beek P, Walsh T (2006) Handbook of constraint programming. Elsevier, Amsterdam
Ruiz C, Spiliopoulou M, Menasalvas E (2010) Density-based semi-supervised clustering. Data Min Knowl Disc 21(3):345–370
Sander J, Ester M, Kriegel HP, Xu X (1998) Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Disc 2(2):169–194
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Shang F, Jiao L, Shi J, Wang F, Gong M (2012) Fast affinity propagation clustering: a multilevel approach. Pattern Recogn 45(1):474–486
Singla P, Nath A, Domingos P (2010) Approximate lifted belief propagation. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence, Atlanta, Georgia, USA, 11–15 July 2010. AAAI Press, pp 92–97
Strack B, DeShazo JP, Gennings C, Olmo JL, Ventura S, Cios KJ, Clore JN (2014) Impact of hba1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international 2014
Sun L, Guo C (2014) Incremental affinity propagation clustering based on message passing. IEEE Trans Knowl Data Eng 26(11):2731–2744
Tarlow D, Zemel RS, Frey BJ (2008) Flexible priors for exemplar-based clustering. In: Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence. AUAI Press, pp 537–545
Teh YW, Jordan MI, Beal MJ, Blei DM (2005) Sharing clusters among related groups: hierarchical Dirichlet processes. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems, vol 17. MIT Press, Cambridge, pp 1385–1392
Wang CD, Lai JH, Suen CY, Zhu JY (2013) Multi-exemplar affinity propagation. IEEE Trans Pattern Anal Mach Intell 35(9):2223–2237
Weiss Y (1997) Belief propagation and revision in networks with loops. Technical Report
Winn JM, Bishop CM (2005) Variational message passing. J Mach Learn Res 6:661–694
Wu CJ (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103
Xu X, Ester M, Kriegel HP, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. In: 14th international conference on data engineering, 1998. Proceedings IEEE, pp 324–331
Yang Y, Chu X, Liang F, Huang TS (2012) Pairwise exemplar clustering. In: Twenty-sixth AAAI conference on artificial intelligence
Yedidia J, Freeman W, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inf Theory 7:2282–2312
Yu J, Jia C (2009) Convergence analysis of affinity propagation. In: International conference on knowledge science, engineering and management. Springer, Berlin, pp 54–65
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Zhang X, Furtlehner C, Germain-Renaud C, Sebag M (2014) Data stream clustering with affinity propagation. IEEE Trans Knowl Data Eng 26(7):1644–1656
Zopf M, Mencía EL, Fürnkranz J (2016) Sequential clustering and contextual importance measures for incremental update summarization. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 1071–1082
Acknowledgements
We acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC) for the financial support of this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Fei Wang
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ibrahim, M.H., Missaoui, R. An exemplar-based clustering using efficient variational message passing. Data Min Knowl Disc 35, 248–289 (2021). https://doi.org/10.1007/s10618-020-00720-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-020-00720-w