Abstract
The ICT evolution has driven on the creation of a capable society, in providing new kinds and type of information. The gathered information is stored continuously, meaning that a great amount of databases has to be created. The problem that arises is whether there is a global manner of managing and gaining knowledge out of the rising variety and volumes of data. Many efforts have been developed for addressing the emerging challenges of data mining based on statistics and machine learning techniques that can significantly boost the ability to analyze data. In this paper, a detailed study on the data mining field takes place, followed by a comparative study between clustering and classification techniques, resulting that the integration of clustering and classification techniques can provide more accurate results than a simple classification technique that classifies datasets with priorly known attributes and classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996)
Chen, M.S., Han, J., Yu, P.S.: Data mining: an overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)
Agarwa, P., Alam, M.A., Biswas, R.: An efficient fuzzy data clustering algorithm for relational databases. Int. J. Eng. Sci. Technol. 1(3), 8281–8288 (2011)
Kumar, V., Rathee, N.: Knowledge discovery from database using an integration of clustering and classification. Int. J. Adv. Comput. Sci. Appli. 2(3), 29–33 (2011)
Data Mining. http://databases.about.com/cs/datamining/g/dmining.htm
Danso, S.O.: An exploration of classification prediction techniques in data mining: the insurance domain. Master Degree thesis, Bournemouth University (2006)
The Primary Tasks of Data Mining. http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/2_tasks.html
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Massive Data Mining (MDM) on Data Stream using Classification Algorithms. http://www.academia.edu/28451198/MASSIVE_DATA_MINING_MDM_ON_DATA_STREAMS_USING_CLASSIFICATION_ALGORITHMS
Cluster Analysis. http://en.wikipedia.org/wiki/Cluster_analysis
Regression. http://docs.oracle.com/cd/B28359_01/datamine.111/b28129/regress.htm
Chandola, V., Kumar, V.: Summarization-compressing data into an informative representation. In: Fifth IEEE International Conference on Data Mining, p. 8. IEEE (2005)
What kind of Data can be Mined. http://www.sqldatamining.com/index.php/data-mining-techniques/what-kind-of-data-can-be-mined
Tan, P.N.: Introduction to Data Mining. Pearson Education, India (2006)
Data Science Basics: What Types of Patterns can be Mined from Data? http://www.kdnuggets.com/2016/12/data-science-basics-types-patterns-mined-data.html
A Tutorial on Clustering Algorithms. http://home.deib.polimi.it/matteucc/Clustering/tutorial_html
Alfred, R., Kazakov, D.: Aggregating multiple instances in relational database using semi-supervised genetic algorithm-based clustering technique. In: ADBIS Research Communications (2007)
Clustering. http://databases.about.com/od/datamining/g/clustering.htm
Fung, G.: A Comprehensive Overview of Basic Clustering Algorithms (2001)
Data Clustering Algorithms. https://sites.google.com/site/dataclusteringalgorithms/
Cluster Analysis. http://en.wikipedia.org/wiki/Cluster_analysis
Andritsos, P.: Data Clustering Techniques Qualifying Oral Examination Paper, Department of Computer Science, University of Toronto (2002)
Omran, M.G., Engelbrecht, A.P., Salman, A.: An overview of clustering methods. Intell. Data Anal. 11(6), 583–605 (2007)
Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. In: ACM SIGMETRICS Performance Evaluation Review, vol. 33, no. 1, pp. 50–60. ACM (2005)
Han, J., Cai, Y., Cercone, N.: Concept-based data classification in relational databases. In: 1991 AAAI Workshop Knowledge Discovery in Databases, pp. 77–94 (1991)
Classification. http://databases.about.com/od/datamining/g/classification.htm
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Methods for Classification. http://sundog.stsci.edu/rick/SCMA/node2.html
Data Mining - Evaluation of Classifiers. http://www.cs.put.poznan.pl/jstefanowski/sed/DM-4-evaluatingclassifiersnew.pdf
Types of database management system and their evolution. https://www.analyticsvidhya.com/blog/2014/11/types-databases-evolution/
Different Types of Databases. http://www.my-project-management-expert.com/different-types-of-databases.html
Types of Database Management Systems. http://www.brighthub.com/internet/web-development/articles/110654.aspx
NoSQL Databases: An Overview. https://www.thoughtworks.com/insights/blog/nosql-databases-overview
Methods for Classification. http://sundog.stsci.edu/rick/SCMA/node2.html
Classification Methods. http://www.d.umn.edu/~padhy005/Chapter5.html
Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine learning, neural and statistical classification (1994)
Mahajan, A., Ganpati, A.: Performance evaluation of rule based classification algorithms. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 3(10), 3546–3550 (2014)
Zhang, C., et al.: An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 82, 128–150 (2017)
Acknowledgements
The CrowdHEALTH project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 727560. Athanasios Kiourtis would also like to acknowledge the financial support from the “Foundation for Education and European Culture (IPEP)”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Mavrogiorgou, A., Kiourtis, A., Kyriazis, D., Themistocleous, M. (2017). A Comparative Study in Data Mining: Clustering and Classification Capabilities. In: Themistocleous, M., Morabito, V. (eds) Information Systems. EMCIS 2017. Lecture Notes in Business Information Processing, vol 299. Springer, Cham. https://doi.org/10.1007/978-3-319-65930-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-65930-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65929-9
Online ISBN: 978-3-319-65930-5
eBook Packages: Computer ScienceComputer Science (R0)