Skip to main content
Log in

Two approaches for clustering algorithms with relational-based data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

It is well known that relational databases still play an important role for many companies around the world. For this reason, the use of data mining methods to discover knowledge in large relational databases has become an interesting research issue. In the context of unsupervised data mining, for instance, the conventional clustering algorithms cannot handle the particularities of the relational databases in an efficient way. There are some clustering algorithms for relational datasets proposed in the literature. However, most of these methods apply complex and/or specific procedures to handle the relational nature of data, or the relational-based methods do not capture the relational nature in an efficient way. Aiming to contribute to this important topic, in this paper, we will present two simple and generic approaches to handle relational-based data for clustering algorithms. One of them treats the relational data through the use of a hierarchical structure, while the second approach applies a weight structure based on relationship and attribute information. In presenting these two approaches, we aim to tackle relational-based dataset in a simple and efficient way, improving the efficiency of corporations that handle relational-based in the unsupervised data mining context. In order to evaluate the effectiveness of the presented approaches, a comparative analysis will be conducted, comparing the proposed approaches with some existing approaches and with a baseline approach. In all analyzed approaches, we will use two well-known types of clustering algorithms (agglomerative hierarchical and K-means). In order to perform this analysis, we will use two internal and one external clusters as validity measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets/Movie.

  2. http://archive.ics.uci.edu/ml/datasets/Nursery.

  3. www.cs.waikato.ac.nz/ml/weka/.

References

  1. Alfred R, Kazakov D (2007) Clustering approach to generalized pattern identification based on multi-instanced objects with DARA. ADBIS Res Commun 2007:38–49

    Google Scholar 

  2. Banerjee A, Abu-Mahfouz I (2018) Evolutionary clustering algorithms for relational data. Procedia Comput Sci 140:276–283

    Article  Google Scholar 

  3. Bhattacharya I, Getoor L (2007) Collective entity resolution in relational data. ACM Trans Knowl Discov Data 1(1):5

    Article  Google Scholar 

  4. Campello R (2007) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett 28(7):833–841

    Article  Google Scholar 

  5. De Carvalho FDAT, Lechevallier Y, De Melo FM (2013) Relational partitioning fuzzy clustering algorithms based on multiple dissimilarity matrices. Fuzzy Sets Syst 215:1–28

    Article  MathSciNet  Google Scholar 

  6. de Gusmão RP, de Carvalho FDA (2019) Clustering of multi-view relational data based on particle swarm optimization. Expert Syst Appl 123:34–53

    Article  Google Scholar 

  7. Dumančić S, Blockeel H (2017) An expressive dissimilarity measure for relational clustering using neighbourhood trees. Mach Learn 106(9):1523–1545

    Article  MathSciNet  Google Scholar 

  8. Elmasri R, Navathe S (2010) Fundamentals of database systems, 6th edn. Addison-Wesley Publishing Company, Boston

    MATH  Google Scholar 

  9. Ferraro MB, Giordani P (2018) Robust fuzzy relational clustering of non-linear data. In: International conference series on soft methods in probability and statistics. Springer, pp 87–90

  10. Gao Y, Liu D-Y, Sun C-M, Liu H (2008) A two-stage clustering algorithm for multi-type relational data. In: Proceedings of the 2008 ninth ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing. IEEE Computer Society, Washington, pp 376–380

  11. Günter S, Bunke H (2003) Validation indices for graph clustering. Pattern Recognit Lett 24(8):1107–1113

    Article  Google Scholar 

  12. Gusmao RP, Carvalho FAT (2016) Particle swarm optimization applied to relational data clustering. In: IEEE international conference on systems, man, and cybernetics (SMC 2016)

  13. Halkidi M, Batistakis Y, Vazirgiannis M (2002) Clustering validity checking methods: part II. SIGMOD Rec 31(3):19–27

    Article  Google Scholar 

  14. Jang HJ, Hyun KS, Chung J, Jung SY (2019) Nearest base-neighbor search on spatial datasets. Knowl Inf Syst. https://doi.org/10.1007/s10115-019-01360-3

  15. Khalilia MA, Bezdek J, Popescu M, Keller JM (2014) Improvements to the relational fuzzy \(C\)-means clustering algorithm. Pattern Recognit 47(12):3920–3930

    Article  Google Scholar 

  16. Long B, Zhang Z, Yu PS (2007) A probabilistic framework for relational clustering. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, 2007, pp 470–479

  17. Mei J, Chen L (2012) A fuzzy approach for multitype relational data clustering. IEEE Trans Fuzzy Syst 20(2):358–371

    Article  Google Scholar 

  18. Nisbet R, Miner G, Yale K (2017) Handbook of statistical analysis and data mining applications, 2nd edn. Academic Press, Orlando

    MATH  Google Scholar 

  19. Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3):487–501

    Article  Google Scholar 

  20. Rashwan S, Faheem T, Sarhan A, Youssef BAB (2010) A relational fuzzy c-means algorithm for detecting protein spots in two-dimensional gel images. In: Arabnia RH (ed) Advances in Computational Biology. Springer, New York, pp 215–227. https://doi.org/10.1007/978-1-4419-5913-3_25

    Google Scholar 

  21. Sinaga KP, Hsieh J-N, Benjamin JB, Yang M-S (2018) Modified relational mountain clustering method. In: International conference on artificial intelligence and soft computing. Springer, pp 690–701

  22. Skabar A, Abdalgader K (2013) Clustering sentence-level text using a novel fuzzy relational clustering algorithm. IEEE Trans Knowl Data Eng 25(1):62–75

    Article  Google Scholar 

  23. Stanley N, Shai S, Taylor D, Mucha PJ (2015) Clustering network layers with the strata multilayer stochastic block model. Computing Research Repository CoRR. arXiv:1507.01826

  24. Wang H, Huang H, Ding C (2011) Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11. ACM, New York, pp 279–284

  25. Witten I, Frank E, Hall H (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington

    Google Scholar 

  26. Xavier-Junior JC (2012) NatalGIS: a multiagent system for recommendation of geographical information based on relational data clustering. PhD thesis, Federal University of Rio Grande do Norte (in Portuguese)

  27. Xavier-Júnior JC, Canuto A, Freitas A, Gonçalves L, Silla C Jr (2011) A hierarchical approach to represent relational data applied to clustering tasks. In: International joint conference on neural networks. IEEE Press, pp 3055–3062

  28. Xavier-Júnior JC, Canuto AMP, Gonves LMG, de Oliveira LAHG (2012) New approach for clustering relational data based on relationship and attribute information. In: Villa AEP, Duch W, Erdi P, Masulli F, Palm G (eds) ICANN (2). Lecture notes in computer science, vol 7553. Springer, Berlin, pp 451–458

    Google Scholar 

  29. Yafooz WM (2017) Model of textual data linking and clustering in relational databases. Res J Inf Technol 9(1):7–17

    Google Scholar 

Download references

Acknowledgements

We thank Dr. Alex Freitas for valuable discussions in an early phase of this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João C. Xavier-Junior.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xavier-Junior, J.C., Canuto, A.M.P. & Gonçalves, L.M.G. Two approaches for clustering algorithms with relational-based data. Knowl Inf Syst 62, 1229–1253 (2020). https://doi.org/10.1007/s10115-019-01384-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01384-9

Keywords