Abstract
In deterministic graphs, an edge between two vertices denotes a certain link. In contrast, in probabilistic graph, a link between two vertices merely implies the possibility of its existence based on probability. Probabilistic data results from uncertainties due to the preprocessing, data collection process, or the inherent nature of the problem which results in uncertain outcomes. These types of graphs are common in the real-world applications such as protein–protein interactions and identifying links in social media. Clustering probabilistic graphs is a challenging task since computing traditional metrics (like distance, paths, etc.) will all be probabilistic. Therefore, determining a valid clustering or making the data deterministic is an important research problem. We propose a new clustering algorithm for probabilistic graphs using the ant colony optimization (ACO) technique. The algorithm uses multiple versions of the probabilistic graph and employs a modified ACO to optimize the objective function. Moreover, heuristics are proposed to guide the algorithm for better accuracy and faster convergence. The proposed approach is tested against two real-world probabilistic graphs and five synthetic datasets using multiple cluster validity indices. Results show that ACO with heuristic guidance can produce good solutions that are comparable to or better than other traditional approaches.
Similar content being viewed by others
References
Gotlieb CC, Kumar S (1968) Semantic clustering of index terms. J ACM (JACM) 15:493–513
Pacheco TM, Gonçalves LB, Ströele V, Soares SSR (2018) An ant colony optimization for automatic data clustering problem. In: 2018 IEEE congress on evolutionary computation (CEC). IEEE, pp 1–8
Hussain SF, Haris M (2019) A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data. Expert Syst Appl 118:20–34
Hussain SF (2011) Bi-clustering gene expression data using co-similarity. In: Proceedings of the international conferences on advanced data mining and applications (ADMA). Beijing, China, pp 190–200
Zhao B, Wang J, Li M et al (2014) Detecting protein complexes based on uncertain graph model. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 11:486–497
Vu K, Zheng R (2011) Robust coverage under uncertainty in wireless sensor networks. In: Proceedings of IEEE international conference on computer communications (INFOCOM). IEEE, pp 2015–2023
Ahmed NM, Chen L (2016) An efficient algorithm for link prediction in temporal uncertain social networks. Inf Sci 331:120–136
Chen X, Chen M, Shi W et al (2019) Embedding uncertain knowledge graphs. In: Proceedings of the AAAI conference on artificial intelligence pp 3363–3370
Halim Z, Waqas M, Hussain SF (2015) Clustering large probabilistic graphs using multi-population evolutionary algorithm. Inf Sci 317:78–95
İnkaya T, Kayalıgil S, Özdemirel NE (2015) Ant colony optimization based clustering methodology. Appl Soft Comput 28:301–311
Shelokar PS, Jayaraman VK, Kulkarni BD (2004) An ant colony approach for clustering. Anal Chim Acta 509:187–195
Jahanshahi M, Maleki E, Ghiami A (2017) On the efficiency of artificial neural networks for plastic analysis of planar frames in comparison with genetic algorithms and ant colony systems. Neural Comput Appl 28:3209–3227
AlFarraj O, AlZubi A, Tolba A (2019) Optimized feature selection algorithm based on fireflies with gravitational ant colony algorithm for big data predictive analytics. Neural Comput Appl 31:1391–1403
Gao W, Hu L, Zhang P (2018) Class-specific mutual information variation for feature selection. Pattern Recogn 79:328–339
Agrawal P, Sarma AD, Ullman J, Widom J (2010) Foundations of uncertain-data integration. In: Proceedings of the VLDB endowment 3, pp 1080–1090
Aggarwal CC (2013) A survey of uncertain data clustering algorithms. Taylor and Francis, England
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings 2nd international conference on knowledge discovery and data mining (KDD), pp 226–231
Kriegel H-P, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, pp 672–677
Kriegel H-P, Pfeifle M (2005) Hierarchical density-based clustering of uncertain data. In: Fifth IEEE international conference on data mining (ICDM’05) IEEE, p 4
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM SIGMOD 28, pp 49–60
Chau M, Cheng R, Kao B, Ng J (2006) Uncertain data mining: an example in clustering location data. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 199–204
Ngai WK, Kao B, Chui CK et al (2006) Efficient clustering of uncertain data. In: Sixth international conference on data mining (ICDM’06). IEEE, pp 436–445
Cormode G, McGregor A (2008) Approximation algorithms for clustering uncertain data. In: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems pp 191–200
Foggia P, Percannella G, Sansone C, Vento M (2007) A graph-based clustering method and its applications. In: International symposium on brain, vision, and artificial intelligence. Springer, pp 277–287
Pfeiffer, J. and Neville, J., (2011) Methods to determine node centrality and clustering in graphs with uncertain structure. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 5, No. 1, pp. 590-593).
Pelekis N, Kopanakis I, Kotsifakos EE et al (2011) Clustering uncertain trajectories. Knowl Inf Syst 28:117–147
Di Mauro N, Taranto C, Esposito F (2014) Link classification with probabilistic graphs. J Intell Inf Syst 42:181–206
Kollios G, Potamias M, Terzi E (2011) Clustering large probabilistic graphs. IEEE Trans Knowl Data Eng 25:325–336
Symeonidis P, Iakovidou N, Mantas N, Manolopoulos Y (2013) From biological to social networks: link prediction based on multi-way spectral clustering. Data Knowl Eng 87:226–242
Halim Z, Waqas M, Baig AR, Rashid A (2017) Efficient clustering of large uncertain graphs using neighborhood information. Int J Approx Reason 90:274–291
Dadaneh BZ, Markid HY, Zakerolhosseini A (2016) Unsupervised probabilistic feature selection using ant colony optimization. Expert Syst Appl 53:27–42
Kassiano V, Gounaris A, Papadopoulos AN, Tsichlas K (2016) Mining uncertain graphs: an overview. In: International workshop of algorithmic aspects of cloud computing. Springer, pp 87–116
Ceccarello M, Fantozzi C, Pietracaprina A et al (2017) Clustering uncertain graphs. In: Proceedings of the VLDB endowment 11, pp 472–484
Han K, Gui F, Xiao X et al (2019) Efficient and effective algorithms for clustering uncertain graphs. In: Proceedings of the VLDB endowment 12, pp 667–680
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Networks 16:645–678
Buhmann JM (2003) Data clustering and learning. In: The handbook of brain theory and neural networks, pp 278–281
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31:264–323
Hussain SF, Iqbal S (2018) CCGA: co-similarity based co-clustering using genetic algorithm. Appl Soft Comput 72:30–42
Gambardella LM, Dorigo M (2000) An ant colony system hybridized with a new local search for the sequential ordering problem. Informs J Comput 12:237–255
Stutzle T, Hoos H (1997) Max-min ant system and local search for combinatorial optimization. In: 2nd international conference on metaheuristics, Sophie-Antipolis, France
Chiaravalloti AD, Greco G, Guzzo A, Pontieri L (2006) An information-theoretic framework for high-order co-clustering of heterogeneous objects. Lect Notes Comput Sci 4212:598
Davis JV, Kulis B, Jain P et al (2007) Information-theoretic metric learning. In: Proceedings of the 24th international conference on Machine learning. p 216
Shang C, Li M, Feng S et al (2013) Feature selection via maximizing global information gain for text classification. Knowl Based Syst 54:298–309
Hussain SF, Maab I (2021) Clustering probabilistic graphs using neighborhood paths. Inform Sci Appear. https://doi.org/10.1016/j.ins.2021.03.057
Krogan NJ, Cagney G, Yu H et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440:637–643
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–274
Hussain SF (2019) A novel robust kernel for classifying high-dimensional data using support vector machines. Expert Syst Appl 131:116–131
Glenn TC, Zare A, Gader PD (2014) Bayesian fuzzy clustering. IEEE Trans Fuzzy Syst 23:1545–1561
Hussain SF, Pervaiz A, Hussain M (2020) Co-clustering optimization using artificial bee colony (ABC) algorithm. Appl Soft Comput 97:106725
Li M (2015) Efficiency improvement of ant colony optimization in solving the moderate LTSP. J Syst Eng Electron 26(6):1300–1308
Acknowledgements
This work was done as part of an MS thesis by Ifra Arif Butt. The author wishes to acknowledge the Ghulam Ishaq Khan Institute of Engineering Sciences and Technology for providing a funded scholarship for her MS studies.
Funding
None.
Author information
Authors and Affiliations
Contributions
Syed Fawad Hussain proposed the main idea of this work and is responsible for writing the major chunk of the manuscript, including the related work, proposed work and discussion related to results. Ifra Arif is responsible for coding the methods (proposed and comparative analysis) and generating the result section. She is also responsible for the initial draft and parts of the related work. Muhammad Hanif has been involved in discussions during the work and contributed to writing Introduction as well as the artwork. Sajid Anwar gave many valuable inputs and critical analysis regarding the result section. He also contributed in the manuscript including parts of the results and discussion section, as well as overall revision and improvement of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hussain, S.F., Butt, I.A., Hanif, M. et al. Clustering uncertain graphs using ant colony optimization (ACO). Neural Comput & Applic 34, 11721–11738 (2022). https://doi.org/10.1007/s00521-022-07063-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07063-1