Abstract
It is well known that relational databases still play an important role for many companies around the world. For this reason, the use of data mining methods to discover knowledge in large relational databases has become an interesting research issue. In the context of unsupervised data mining, for instance, the conventional clustering algorithms cannot handle the particularities of the relational databases in an efficient way. There are some clustering algorithms for relational datasets proposed in the literature. However, most of these methods apply complex and/or specific procedures to handle the relational nature of data, or the relational-based methods do not capture the relational nature in an efficient way. Aiming to contribute to this important topic, in this paper, we will present two simple and generic approaches to handle relational-based data for clustering algorithms. One of them treats the relational data through the use of a hierarchical structure, while the second approach applies a weight structure based on relationship and attribute information. In presenting these two approaches, we aim to tackle relational-based dataset in a simple and efficient way, improving the efficiency of corporations that handle relational-based in the unsupervised data mining context. In order to evaluate the effectiveness of the presented approaches, a comparative analysis will be conducted, comparing the proposed approaches with some existing approaches and with a baseline approach. In all analyzed approaches, we will use two well-known types of clustering algorithms (agglomerative hierarchical and K-means). In order to perform this analysis, we will use two internal and one external clusters as validity measures.







Similar content being viewed by others
References
Alfred R, Kazakov D (2007) Clustering approach to generalized pattern identification based on multi-instanced objects with DARA. ADBIS Res Commun 2007:38–49
Banerjee A, Abu-Mahfouz I (2018) Evolutionary clustering algorithms for relational data. Procedia Comput Sci 140:276–283
Bhattacharya I, Getoor L (2007) Collective entity resolution in relational data. ACM Trans Knowl Discov Data 1(1):5
Campello R (2007) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett 28(7):833–841
De Carvalho FDAT, Lechevallier Y, De Melo FM (2013) Relational partitioning fuzzy clustering algorithms based on multiple dissimilarity matrices. Fuzzy Sets Syst 215:1–28
de Gusmão RP, de Carvalho FDA (2019) Clustering of multi-view relational data based on particle swarm optimization. Expert Syst Appl 123:34–53
Dumančić S, Blockeel H (2017) An expressive dissimilarity measure for relational clustering using neighbourhood trees. Mach Learn 106(9):1523–1545
Elmasri R, Navathe S (2010) Fundamentals of database systems, 6th edn. Addison-Wesley Publishing Company, Boston
Ferraro MB, Giordani P (2018) Robust fuzzy relational clustering of non-linear data. In: International conference series on soft methods in probability and statistics. Springer, pp 87–90
Gao Y, Liu D-Y, Sun C-M, Liu H (2008) A two-stage clustering algorithm for multi-type relational data. In: Proceedings of the 2008 ninth ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing. IEEE Computer Society, Washington, pp 376–380
Günter S, Bunke H (2003) Validation indices for graph clustering. Pattern Recognit Lett 24(8):1107–1113
Gusmao RP, Carvalho FAT (2016) Particle swarm optimization applied to relational data clustering. In: IEEE international conference on systems, man, and cybernetics (SMC 2016)
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Clustering validity checking methods: part II. SIGMOD Rec 31(3):19–27
Jang HJ, Hyun KS, Chung J, Jung SY (2019) Nearest base-neighbor search on spatial datasets. Knowl Inf Syst. https://doi.org/10.1007/s10115-019-01360-3
Khalilia MA, Bezdek J, Popescu M, Keller JM (2014) Improvements to the relational fuzzy \(C\)-means clustering algorithm. Pattern Recognit 47(12):3920–3930
Long B, Zhang Z, Yu PS (2007) A probabilistic framework for relational clustering. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, 2007, pp 470–479
Mei J, Chen L (2012) A fuzzy approach for multitype relational data clustering. IEEE Trans Fuzzy Syst 20(2):358–371
Nisbet R, Miner G, Yale K (2017) Handbook of statistical analysis and data mining applications, 2nd edn. Academic Press, Orlando
Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3):487–501
Rashwan S, Faheem T, Sarhan A, Youssef BAB (2010) A relational fuzzy c-means algorithm for detecting protein spots in two-dimensional gel images. In: Arabnia RH (ed) Advances in Computational Biology. Springer, New York, pp 215–227. https://doi.org/10.1007/978-1-4419-5913-3_25
Sinaga KP, Hsieh J-N, Benjamin JB, Yang M-S (2018) Modified relational mountain clustering method. In: International conference on artificial intelligence and soft computing. Springer, pp 690–701
Skabar A, Abdalgader K (2013) Clustering sentence-level text using a novel fuzzy relational clustering algorithm. IEEE Trans Knowl Data Eng 25(1):62–75
Stanley N, Shai S, Taylor D, Mucha PJ (2015) Clustering network layers with the strata multilayer stochastic block model. Computing Research Repository CoRR. arXiv:1507.01826
Wang H, Huang H, Ding C (2011) Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11. ACM, New York, pp 279–284
Witten I, Frank E, Hall H (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington
Xavier-Junior JC (2012) NatalGIS: a multiagent system for recommendation of geographical information based on relational data clustering. PhD thesis, Federal University of Rio Grande do Norte (in Portuguese)
Xavier-Júnior JC, Canuto A, Freitas A, Gonçalves L, Silla C Jr (2011) A hierarchical approach to represent relational data applied to clustering tasks. In: International joint conference on neural networks. IEEE Press, pp 3055–3062
Xavier-Júnior JC, Canuto AMP, Gonves LMG, de Oliveira LAHG (2012) New approach for clustering relational data based on relationship and attribute information. In: Villa AEP, Duch W, Erdi P, Masulli F, Palm G (eds) ICANN (2). Lecture notes in computer science, vol 7553. Springer, Berlin, pp 451–458
Yafooz WM (2017) Model of textual data linking and clustering in relational databases. Res J Inf Technol 9(1):7–17
Acknowledgements
We thank Dr. Alex Freitas for valuable discussions in an early phase of this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xavier-Junior, J.C., Canuto, A.M.P. & Gonçalves, L.M.G. Two approaches for clustering algorithms with relational-based data. Knowl Inf Syst 62, 1229–1253 (2020). https://doi.org/10.1007/s10115-019-01384-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01384-9