Two approaches for clustering algorithms with relational-based data

Xavier-Junior, João C.; Canuto, Anne M. P.; Gonçalves, Luiz M. G.

doi:10.1007/s10115-019-01384-9

Two approaches for clustering algorithms with relational-based data

Regular Paper
Published: 23 July 2019

Volume 62, pages 1229–1253, (2020)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

João C. Xavier-Junior¹,
Anne M. P. Canuto² &
Luiz M. G. Gonçalves³

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

It is well known that relational databases still play an important role for many companies around the world. For this reason, the use of data mining methods to discover knowledge in large relational databases has become an interesting research issue. In the context of unsupervised data mining, for instance, the conventional clustering algorithms cannot handle the particularities of the relational databases in an efficient way. There are some clustering algorithms for relational datasets proposed in the literature. However, most of these methods apply complex and/or specific procedures to handle the relational nature of data, or the relational-based methods do not capture the relational nature in an efficient way. Aiming to contribute to this important topic, in this paper, we will present two simple and generic approaches to handle relational-based data for clustering algorithms. One of them treats the relational data through the use of a hierarchical structure, while the second approach applies a weight structure based on relationship and attribute information. In presenting these two approaches, we aim to tackle relational-based dataset in a simple and efficient way, improving the efficiency of corporations that handle relational-based in the unsupervised data mining context. In order to evaluate the effectiveness of the presented approaches, a comparative analysis will be conducted, comparing the proposed approaches with some existing approaches and with a baseline approach. In all analyzed approaches, we will use two well-known types of clustering algorithms (agglomerative hierarchical and K-means). In order to perform this analysis, we will use two internal and one external clusters as validity measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Associations Rules Between Attribute Value Clusters

A clustering-based feature selection method for automatically generated relational attributes

Article 05 April 2018

Big Data Clustering Techniques: Recent Advances and Survey

Notes

References

Alfred R, Kazakov D (2007) Clustering approach to generalized pattern identification based on multi-instanced objects with DARA. ADBIS Res Commun 2007:38–49
Google Scholar
Banerjee A, Abu-Mahfouz I (2018) Evolutionary clustering algorithms for relational data. Procedia Comput Sci 140:276–283
Article Google Scholar
Bhattacharya I, Getoor L (2007) Collective entity resolution in relational data. ACM Trans Knowl Discov Data 1(1):5
Article Google Scholar
Campello R (2007) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett 28(7):833–841
Article Google Scholar
De Carvalho FDAT, Lechevallier Y, De Melo FM (2013) Relational partitioning fuzzy clustering algorithms based on multiple dissimilarity matrices. Fuzzy Sets Syst 215:1–28
Article MathSciNet Google Scholar
de Gusmão RP, de Carvalho FDA (2019) Clustering of multi-view relational data based on particle swarm optimization. Expert Syst Appl 123:34–53
Article Google Scholar
Dumančić S, Blockeel H (2017) An expressive dissimilarity measure for relational clustering using neighbourhood trees. Mach Learn 106(9):1523–1545
Article MathSciNet Google Scholar
Elmasri R, Navathe S (2010) Fundamentals of database systems, 6th edn. Addison-Wesley Publishing Company, Boston
MATH Google Scholar
Ferraro MB, Giordani P (2018) Robust fuzzy relational clustering of non-linear data. In: International conference series on soft methods in probability and statistics. Springer, pp 87–90
Gao Y, Liu D-Y, Sun C-M, Liu H (2008) A two-stage clustering algorithm for multi-type relational data. In: Proceedings of the 2008 ninth ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing. IEEE Computer Society, Washington, pp 376–380
Günter S, Bunke H (2003) Validation indices for graph clustering. Pattern Recognit Lett 24(8):1107–1113
Article Google Scholar
Gusmao RP, Carvalho FAT (2016) Particle swarm optimization applied to relational data clustering. In: IEEE international conference on systems, man, and cybernetics (SMC 2016)
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Clustering validity checking methods: part II. SIGMOD Rec 31(3):19–27
Article Google Scholar
Jang HJ, Hyun KS, Chung J, Jung SY (2019) Nearest base-neighbor search on spatial datasets. Knowl Inf Syst. https://doi.org/10.1007/s10115-019-01360-3
Khalilia MA, Bezdek J, Popescu M, Keller JM (2014) Improvements to the relational fuzzy $C$-means clustering algorithm. Pattern Recognit 47(12):3920–3930
Article Google Scholar
Long B, Zhang Z, Yu PS (2007) A probabilistic framework for relational clustering. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, 2007, pp 470–479
Mei J, Chen L (2012) A fuzzy approach for multitype relational data clustering. IEEE Trans Fuzzy Syst 20(2):358–371
Article Google Scholar
Nisbet R, Miner G, Yale K (2017) Handbook of statistical analysis and data mining applications, 2nd edn. Academic Press, Orlando
MATH Google Scholar
Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3):487–501
Article Google Scholar
Rashwan S, Faheem T, Sarhan A, Youssef BAB (2010) A relational fuzzy c-means algorithm for detecting protein spots in two-dimensional gel images. In: Arabnia RH (ed) Advances in Computational Biology. Springer, New York, pp 215–227. https://doi.org/10.1007/978-1-4419-5913-3_25
Google Scholar
Sinaga KP, Hsieh J-N, Benjamin JB, Yang M-S (2018) Modified relational mountain clustering method. In: International conference on artificial intelligence and soft computing. Springer, pp 690–701
Skabar A, Abdalgader K (2013) Clustering sentence-level text using a novel fuzzy relational clustering algorithm. IEEE Trans Knowl Data Eng 25(1):62–75
Article Google Scholar
Stanley N, Shai S, Taylor D, Mucha PJ (2015) Clustering network layers with the strata multilayer stochastic block model. Computing Research Repository CoRR. arXiv:1507.01826
Wang H, Huang H, Ding C (2011) Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11. ACM, New York, pp 279–284
Witten I, Frank E, Hall H (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington
Google Scholar
Xavier-Junior JC (2012) NatalGIS: a multiagent system for recommendation of geographical information based on relational data clustering. PhD thesis, Federal University of Rio Grande do Norte (in Portuguese)
Xavier-Júnior JC, Canuto A, Freitas A, Gonçalves L, Silla C Jr (2011) A hierarchical approach to represent relational data applied to clustering tasks. In: International joint conference on neural networks. IEEE Press, pp 3055–3062
Xavier-Júnior JC, Canuto AMP, Gonves LMG, de Oliveira LAHG (2012) New approach for clustering relational data based on relationship and attribute information. In: Villa AEP, Duch W, Erdi P, Masulli F, Palm G (eds) ICANN (2). Lecture notes in computer science, vol 7553. Springer, Berlin, pp 451–458
Google Scholar
Yafooz WM (2017) Model of textual data linking and clustering in relational databases. Res J Inf Technol 9(1):7–17
Google Scholar

Download references

Acknowledgements

We thank Dr. Alex Freitas for valuable discussions in an early phase of this research.

Author information

Authors and Affiliations

Digital Metropolis Institute, Federal University of RN, Natal, RN, Brazil
João C. Xavier-Junior
Informatics and Applied Mathematics Department, Federal University of RN, Natal, RN, Brazil
Anne M. P. Canuto
Computing and Automation Engineering Department, Federal University of RN, Natal, RN, Brazil
Luiz M. G. Gonçalves

Authors

João C. Xavier-Junior
View author publications
You can also search for this author inPubMed Google Scholar
Anne M. P. Canuto
View author publications
You can also search for this author inPubMed Google Scholar
Luiz M. G. Gonçalves
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to João C. Xavier-Junior.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xavier-Junior, J.C., Canuto, A.M.P. & Gonçalves, L.M.G. Two approaches for clustering algorithms with relational-based data. Knowl Inf Syst 62, 1229–1253 (2020). https://doi.org/10.1007/s10115-019-01384-9

Download citation

Received: 06 December 2015
Revised: 06 July 2019
Accepted: 11 July 2019
Published: 23 July 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10115-019-01384-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two approaches for clustering algorithms with relational-based data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mining Associations Rules Between Attribute Value Clusters

A clustering-based feature selection method for automatically generated relational attributes

Big Data Clustering Techniques: Recent Advances and Survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now