Skip to main content

A Comparative Study in Data Mining: Clustering and Classification Capabilities

  • Conference paper
  • First Online:
Information Systems (EMCIS 2017)

Abstract

The ICT evolution has driven on the creation of a capable society, in providing new kinds and type of information. The gathered information is stored continuously, meaning that a great amount of databases has to be created. The problem that arises is whether there is a global manner of managing and gaining knowledge out of the rising variety and volumes of data. Many efforts have been developed for addressing the emerging challenges of data mining based on statistics and machine learning techniques that can significantly boost the ability to analyze data. In this paper, a detailed study on the data mining field takes place, followed by a comparative study between clustering and classification techniques, resulting that the integration of clustering and classification techniques can provide more accurate results than a simple classification technique that classifies datasets with priorly known attributes and classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996)

    Google Scholar 

  2. Chen, M.S., Han, J., Yu, P.S.: Data mining: an overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)

    Article  Google Scholar 

  3. Agarwa, P., Alam, M.A., Biswas, R.: An efficient fuzzy data clustering algorithm for relational databases. Int. J. Eng. Sci. Technol. 1(3), 8281–8288 (2011)

    Google Scholar 

  4. Kumar, V., Rathee, N.: Knowledge discovery from database using an integration of clustering and classification. Int. J. Adv. Comput. Sci. Appli. 2(3), 29–33 (2011)

    Google Scholar 

  5. Data Mining. http://databases.about.com/cs/datamining/g/dmining.htm

  6. Danso, S.O.: An exploration of classification prediction techniques in data mining: the insurance domain. Master Degree thesis, Bournemouth University (2006)

    Google Scholar 

  7. The Primary Tasks of Data Mining. http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/2_tasks.html

  8. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

  9. Massive Data Mining (MDM) on Data Stream using Classification Algorithms. http://www.academia.edu/28451198/MASSIVE_DATA_MINING_MDM_ON_DATA_STREAMS_USING_CLASSIFICATION_ALGORITHMS

  10. Cluster Analysis. http://en.wikipedia.org/wiki/Cluster_analysis

  11. Regression. http://docs.oracle.com/cd/B28359_01/datamine.111/b28129/regress.htm

  12. Chandola, V., Kumar, V.: Summarization-compressing data into an informative representation. In: Fifth IEEE International Conference on Data Mining, p. 8. IEEE (2005)

    Google Scholar 

  13. What kind of Data can be Mined. http://www.sqldatamining.com/index.php/data-mining-techniques/what-kind-of-data-can-be-mined

  14. Tan, P.N.: Introduction to Data Mining. Pearson Education, India (2006)

    Google Scholar 

  15. Data Science Basics: What Types of Patterns can be Mined from Data? http://www.kdnuggets.com/2016/12/data-science-basics-types-patterns-mined-data.html

  16. A Tutorial on Clustering Algorithms. http://home.deib.polimi.it/matteucc/Clustering/tutorial_html

  17. Alfred, R., Kazakov, D.: Aggregating multiple instances in relational database using semi-supervised genetic algorithm-based clustering technique. In: ADBIS Research Communications (2007)

    Google Scholar 

  18. Clustering. http://databases.about.com/od/datamining/g/clustering.htm

  19. Fung, G.: A Comprehensive Overview of Basic Clustering Algorithms (2001)

    Google Scholar 

  20. Data Clustering Algorithms. https://sites.google.com/site/dataclusteringalgorithms/

  21. Cluster Analysis. http://en.wikipedia.org/wiki/Cluster_analysis

  22. Andritsos, P.: Data Clustering Techniques Qualifying Oral Examination Paper, Department of Computer Science, University of Toronto (2002)

    Google Scholar 

  23. Omran, M.G., Engelbrecht, A.P., Salman, A.: An overview of clustering methods. Intell. Data Anal. 11(6), 583–605 (2007)

    Google Scholar 

  24. Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. In: ACM SIGMETRICS Performance Evaluation Review, vol. 33, no. 1, pp. 50–60. ACM (2005)

    Google Scholar 

  25. Han, J., Cai, Y., Cercone, N.: Concept-based data classification in relational databases. In: 1991 AAAI Workshop Knowledge Discovery in Databases, pp. 77–94 (1991)

    Google Scholar 

  26. Classification. http://databases.about.com/od/datamining/g/classification.htm

  27. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

  28. Methods for Classification. http://sundog.stsci.edu/rick/SCMA/node2.html

  29. Data Mining - Evaluation of Classifiers. http://www.cs.put.poznan.pl/jstefanowski/sed/DM-4-evaluatingclassifiersnew.pdf

  30. Types of database management system and their evolution. https://www.analyticsvidhya.com/blog/2014/11/types-databases-evolution/

  31. Different Types of Databases. http://www.my-project-management-expert.com/different-types-of-databases.html

  32. Types of Database Management Systems. http://www.brighthub.com/internet/web-development/articles/110654.aspx

  33. NoSQL Databases: An Overview. https://www.thoughtworks.com/insights/blog/nosql-databases-overview

  34. Methods for Classification. http://sundog.stsci.edu/rick/SCMA/node2.html

  35. Classification Methods. http://www.d.umn.edu/~padhy005/Chapter5.html

  36. Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine learning, neural and statistical classification (1994)

    Google Scholar 

  37. Mahajan, A., Ganpati, A.: Performance evaluation of rule based classification algorithms. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 3(10), 3546–3550 (2014)

    Google Scholar 

  38. Zhang, C., et al.: An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 82, 128–150 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

The CrowdHEALTH project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 727560. Athanasios Kiourtis would also like to acknowledge the financial support from the “Foundation for Education and European Culture (IPEP)”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Argyro Mavrogiorgou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Mavrogiorgou, A., Kiourtis, A., Kyriazis, D., Themistocleous, M. (2017). A Comparative Study in Data Mining: Clustering and Classification Capabilities. In: Themistocleous, M., Morabito, V. (eds) Information Systems. EMCIS 2017. Lecture Notes in Business Information Processing, vol 299. Springer, Cham. https://doi.org/10.1007/978-3-319-65930-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65930-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65929-9

  • Online ISBN: 978-3-319-65930-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics