Skip to main content
Log in

An exemplar-based clustering using efficient variational message passing

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Clustering is a crucial step in scientific data analysis and engineering systems. Thus, an efficient cluster analysis method often remains a key challenge. In this paper, we introduce a general purpose exemplar-based clustering method called (MEGA), which performs a novel message-passing strategy based on variational expectation–maximization and generalized arc-consistency techniques. Unlike message passing clustering methods, MEGA formulates the message-passing schema as E- and M-steps of variational expectation–maximization based on a reparameterized factor graph. It also exploits an adaptive variant of generalized arc consistency technique to perform a variational mean-field approximation in E-step to minimize a Kullback–Leibler divergence on the model evidence. Dissimilar to density-based clustering methods, MEGA has no sensitivity to initial parameters. In contrast to partition-based clustering methods, MEGA does not require pre-specifying the number of clusters. We focus on the binary-variable factor graph to model the clustering problem but MEGA is applicable to other graphical models in general. Our experiments on real-world problems demonstrate the efficiency of MEGA over existing prominent clustering algorithms such as Affinity propagation, Agglomerative, DBSCAN, K-means, and EM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The constraints in Eqs. (1) and (2) are assigned the same weights (\(-\infty \) when unsatisfied, and 0 when unsatisfied ). Thus, without loss of accuracy, we can recast them as factors that allow \(\{+,-\}\) values without recourse to infinity.

  2. We say that a factor node has deterministic dependency if at least one of its tuples has zero probability.

  3. Note that the local factors have been summed since we use log-domain formulation of the objective function.

  4. Note that, based on Jensen’s inequality, each update step that minimizes the Kullback–Leibler divergence, also maximizes the lower bounding on the model evidence (cf. Beal and Ghahramani 2003).

  5. Publicly available: http://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008.

  6. Publicly available at: http://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones.

  7. Publicly available: http://archive.ics.uci.edu/ml/datasets/Wall-Following+Robot+Navigation+Data.

  8. Publicly available: http://konect.uni-koblenz.de/networks/ucidata-zachary.

  9. Note that the Jaccard distance satisfies all conditions of the distance measure, including the triangle inequality.

  10. publicly available: http://scikit-learn.org/stable/modules/clustering.html

  11. Publicly available: https://github.com/bnpy/bnpy.

References

  • Ahmadi B, Kersting K, Mladenov M, Natarajan S (2013) Exploiting symmetries for scaling loopy belief propagation and relational training. Mach Learn 92(1):91–132

    Article  MathSciNet  Google Scholar 

  • Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A public domain dataset for human activity recognition using smartphones. In: 21th European symposium on artificial neural networks, computational intelligence and machine learning, ESANN

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035

  • Beal MJ, Ghahramani Z (2003) The variational bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayesian Stat. 7:453–464

    MathSciNet  Google Scholar 

  • Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, Berlin, pp 25–71

  • Cannistraci CV, Ravasi T, Montevecchi FM, Ideker T, Alessio M (2010) Nonlinear dimension reduction and clustering by minimum curvilinearity unfold neuropathic pain and tissue embryological classes. Bioinformatics 26(18):i531–i539

    Article  Google Scholar 

  • Cheeseman PC, Stutz JC (1996) Bayesian classification (autoclass): theory and results. In: Advances in knowledge discovery and data mining, CA, USA, pp 153–180

  • Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619

    Article  Google Scholar 

  • Dalli A (2003) Adaptation of the f-measure to cluster based lexicon quality evaluation. In: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable? Association for Computational Linguistics, pp 51–56

  • Danon L, Diaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp. https://doi.org/10.1088/1742-5468/2005/09/P09008

    Article  MATH  Google Scholar 

  • Day WH, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1(1):7–24

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological) 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Elidan G, McGraw I, Koller D (2006) Residual belief propagation: informed scheduling for asynchronous message passing. In: Proceedings of the twenty-second conference annual conference on uncertainty in artificial intelligence (UAI-06). AUAI Press, Arlington, Virginia, pp 165–173

  • Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: The second international conference on knowledge discovery and data mining, vol 96, pp 226–231

  • Fraley C, Raftery AE (1998) How many clusters? which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588

    Article  Google Scholar 

  • Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MathSciNet  Google Scholar 

  • Fujiwara Y, Irie G, Kitahara T et al (2011) Fast algorithm for affinity propagation. In: IJCAI proceedings-international joint conference on artificial intelligence, vol 22:3, p 2238

  • Givoni IE (2012) Beyond affinity propagation: message passing algorithms for clustering. Citeseer

  • Givoni I, Frey B (2009a) Semi-supervised affinity propagation with instance-level constraints. In: Artificial intelligence and statistics, pp 161–168

  • Givoni IE, Frey BJ (2009b) A binary variable model for affinity propagation. Neural Comput 21(6):1589–1600

    Article  MathSciNet  Google Scholar 

  • Givoni IE, Chung C, Frey BJ (2011) Hierarchical affinity propagation. In: Proceedings of the twenty-seventh conference on uncertainty in artificial intelligence. AUAI Press, Cambridge, pp 238–246

  • Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3):107–145

    Article  Google Scholar 

  • Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Ser B (Methodological) 58:155–176

    MathSciNet  MATH  Google Scholar 

  • Heskes T (2004) On the uniqueness of loopy belief propagation fixed points. Neural Comput 16(11):2379–2413

    Article  Google Scholar 

  • Horsch MC, Havens WS (2000) Probabilistic arc consistency: a connection between constraint reasoning and probabilistic reasoning. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc, pp 282–290

  • Ibrahim MH, Pal C, Pesant G (2017) Improving probabilistic inference in graphical models with determinism and cycles. Mach Learn 106(1):1–54

    Article  MathSciNet  Google Scholar 

  • Jamshidian M, Jennrich RI (1997) Acceleration of the EM algorithm by using quasi-Newton methods. J R Stat Soc Ser B (Stat Methodol) 59(3):569–587

    Article  MathSciNet  Google Scholar 

  • Jiang B, Pei J, Tao Y, Lin X (2013) Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng 25(4):751–763

    Article  Google Scholar 

  • Jiang Y, Liao Y, Yu G (2016) Affinity propagation clustering using path based similarity. Algorithms 9(3):46

    Article  MathSciNet  Google Scholar 

  • Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge

    MATH  Google Scholar 

  • Lam D, Wunsch DC (2014) Clustering. In: Academic Press library in signal processing, vol 1, pp 1115–1149. Elsevier, Amsterdam

  • Lashkari D, Golland P (2008) Convex clustering with exemplar-based models. In: Advances in neural information processing systems, pp 825–832

  • Leone M, Weigt M (2007) Clustering by soft-constraint affinity propagation: applications to gene-expression data. Bioinformatics 23(20):2708–2715

    Article  Google Scholar 

  • Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Mai ST, Assent I, Jacobsen J, Dieu MS (2018) Anytime parallel density-based clustering. In: Data mining and knowledge discovery pp 1–56

  • McLachlan G, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, New York

    MATH  Google Scholar 

  • Mooij JM, Kappen HJ (2005) Sufficient conditions for convergence of loopy belief propagation. In: Proceedings of the twenty-first conference on uncertainty in artificial intelligence, UAI’05, pp. 396–403. AUAI Press, Arlington, Virginia, USA. http://dl.acm.org/citation.cfm?id=3020336.3020386

  • Murphy K, Weiss Y, Jordan M (1999) Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the fifteenth conference annual conference on uncertainty in artificial intelligence (UAI-99), Stockholm, Sweden. Morgan Kaufmann, pp 467–476

  • Neal RM, Hinton GE (1999) Learning in graphical models. chap. In: A view of the EM algorithm that justifies incremental, sparse, and other variants, MIT Press, Cambridge, pp 355–368

  • Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 849–856

  • Nguyen DT, Chen L, Chan CK (2012) Clustering with multiviewpoint-based similarity measure. IEEE Trans Knowl Data Eng 24(6):988–1001

    Article  Google Scholar 

  • Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Burlington

    MATH  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Petersen KB, Winther O, Hansen LK (2005) On the slow convergence of EM and VBEM in low-noise linear models. Neural Comput 17(9):1921–1926

    Article  MathSciNet  Google Scholar 

  • Potetz B (2007) Efficient belief propagation for vision using linear constraint nodes. In: Proceeding of IEEE conference on computer vision and pattern recognition (CVPR’07), IEEE computer society, Minneapolis, MN, USA, pp 1–8

  • Rasmussen CE (2000) The infinite Gaussian mixture model. In: Advances in neural information processing systems, pp. 554–560

  • Rawashdeh A, Ralescu AL (2015) Similarity measure for social networks—A brief survey. In: Proceedings of the 26th modern AI and cognitive science conference 2015, Greensboro, NC, USA, 25–26 April 2015, pp 153–159

  • Roosta T, Wainwright MJ, Sastry SS (2008) Convergence analysis of reweighted sum-product algorithms. IEEE Trans Signal Process 56(9):4293–4305

    Article  MathSciNet  Google Scholar 

  • Rossi F, Van Beek P, Walsh T (2006) Handbook of constraint programming. Elsevier, Amsterdam

    MATH  Google Scholar 

  • Ruiz C, Spiliopoulou M, Menasalvas E (2010) Density-based semi-supervised clustering. Data Min Knowl Disc 21(3):345–370

    Article  MathSciNet  Google Scholar 

  • Sander J, Ester M, Kriegel HP, Xu X (1998) Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Disc 2(2):169–194

    Article  Google Scholar 

  • Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681

    Article  Google Scholar 

  • Shang F, Jiao L, Shi J, Wang F, Gong M (2012) Fast affinity propagation clustering: a multilevel approach. Pattern Recogn 45(1):474–486

    Article  Google Scholar 

  • Singla P, Nath A, Domingos P (2010) Approximate lifted belief propagation. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence, Atlanta, Georgia, USA, 11–15 July 2010. AAAI Press, pp 92–97

  • Strack B, DeShazo JP, Gennings C, Olmo JL, Ventura S, Cios KJ, Clore JN (2014) Impact of hba1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international 2014

  • Sun L, Guo C (2014) Incremental affinity propagation clustering based on message passing. IEEE Trans Knowl Data Eng 26(11):2731–2744

    Article  Google Scholar 

  • Tarlow D, Zemel RS, Frey BJ (2008) Flexible priors for exemplar-based clustering. In: Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence. AUAI Press, pp 537–545

  • Teh YW, Jordan MI, Beal MJ, Blei DM (2005) Sharing clusters among related groups: hierarchical Dirichlet processes. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems, vol 17. MIT Press, Cambridge, pp 1385–1392

    Google Scholar 

  • Wang CD, Lai JH, Suen CY, Zhu JY (2013) Multi-exemplar affinity propagation. IEEE Trans Pattern Anal Mach Intell 35(9):2223–2237

    Article  Google Scholar 

  • Weiss Y (1997) Belief propagation and revision in networks with loops. Technical Report

  • Winn JM, Bishop CM (2005) Variational message passing. J Mach Learn Res 6:661–694

    MathSciNet  MATH  Google Scholar 

  • Wu CJ (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103

    Article  MathSciNet  Google Scholar 

  • Xu X, Ester M, Kriegel HP, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. In: 14th international conference on data engineering, 1998. Proceedings IEEE, pp 324–331

  • Yang Y, Chu X, Liang F, Huang TS (2012) Pairwise exemplar clustering. In: Twenty-sixth AAAI conference on artificial intelligence

  • Yedidia J, Freeman W, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inf Theory 7:2282–2312

    Article  MathSciNet  Google Scholar 

  • Yu J, Jia C (2009) Convergence analysis of affinity propagation. In: International conference on knowledge science, engineering and management. Springer, Berlin, pp 54–65

  • Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473

    Article  Google Scholar 

  • Zhang X, Furtlehner C, Germain-Renaud C, Sebag M (2014) Data stream clustering with affinity propagation. IEEE Trans Knowl Data Eng 26(7):1644–1656

    Article  Google Scholar 

  • Zopf M, Mencía EL, Fürnkranz J (2016) Sequential clustering and contextual importance measures for incremental update summarization. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 1071–1082

Download references

Acknowledgements

We acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC) for the financial support of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Hamza Ibrahim.

Additional information

Responsible editor: Fei Wang

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ibrahim, M.H., Missaoui, R. An exemplar-based clustering using efficient variational message passing. Data Min Knowl Disc 35, 248–289 (2021). https://doi.org/10.1007/s10618-020-00720-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00720-w

Keywords

Navigation