Abstract
Data mining technology is applied in kinds of domains in real life. But the diversity of algorithms is bothering many data researchers. Many data mining knowledge bases have been developed, data mining assistants have also been proposed for algorithm selection. However, these studies focus on the description of algorithms, which makes a gap between the raw data and the ontology entities. This paper proposes a dataset characterization ontology, representing dataset characteristics to support the algorithm selection process. An example of dealing with the missing value problem using the proposed ontology is presented.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kietz, J.W., et al.: Towards cooperative planning of data mining workflows (2009)
Peng, Y., Flach, P.A., Soares, C., et al.: Improved dataset characterisation for meta-learning. In: International Conference on Discovery Science, pp. 141–152. Springer, Heidelberg (2002)
Pechenizkiy, M.: Data mining strategy selection via empirical and constructive induction. In: Databases and Applications, pp. 59-64 (2005)
Bhatt, N., Thakkar, A., Ganatra, A.: A survey and current research challenges in meta learning approaches based on dataset characteristics. Int. J. Soft Comput. Eng. 2(10), 234–247 (2012)
Tripathy, M., Panda, A.: A study of algorithm selection in data mining using meta-learning. J. Eng. Sci. Technol. Rev. 10(2), 51–64 (2017)
Pimentel, B.A., de Carvalho, A.C.P.L.F.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)
Pimentel, B.A., de Carvalho, A.C.P.L.F.: A meta-learning approach for recommending the number of clusters for clustering algorithms. Knowl.-Based Syst. 195, 105682 (2020)
Oreski, D., Oreski, S., Klicek, B.: Effects of dataset characteristics on the performance of feature selection techniques. Appl. Soft Comput. 52, 109–119 (2017)
Moustafa Reda, M., Nassef, M., Salah, A.: Categorization of factors affecting classification algorithms selection. Int. J. Data Min. Knowl. Manage. Process (IJDKP) 9, 184–188 (2019)
Keet, C.M., et al.: The data mining optimization ontology. J. Web Seman. 32, 43–53 (2015)
Panov, P., Džeroski, S., Soldatova, L. OntoDM: an ontology of data mining. In: 2008 IEEE International Conference on Data Mining Workshops, pp. 752–760. IEEE (2008)
Panov, P., Soldatova, L., Džeroski, S. OntoDM-KDD: ontology for representing the knowledge discovery process. In: International Conference on Discovery Science, pp. 126–140. Springer, Heidelberg (2013)
Panov, P., Soldatova, L.N., Džeroski, S.: Generic ontology of datatypes. Inf. Sci. 329, 900–920 (2016)
Tianxing, M., Stankova, E., Vodyaho, A., Zhukova, N., Shichkina, Y.: Domain-oriented multilevel ontology for adaptive data processing. In: International Conference on Computational Science and Its Applications, pp. 634–649. Springer, Cham (2020)
Tianxing, M., Nataly, Z., Nikolay, M.: A knowledge-based recommendation system for time series classification. In: Conference of Open Innovations Association, FRUCT. No. 24. FRUCT Oy (2019)
Smith, B., et al.: The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25(11), 1251–1255 (2007)
Mack, C., Su, Z., Westreich, D.: Managing Missing Data in Patient Registries: Addendum to Registries for Evaluating Patient Outcomes: A User’s Guide, [Internet] (2018)
ISO/IEC 11404:2007. Information technology – General-Purpose Datatypes (GPD) (2007). http://www.iso.org/iso/catalogue_detail.htm?csnumber=39479
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tianxing, M., Zhukova, N. (2022). The Data Mining Dataset Characterization Ontology. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 295. Springer, Cham. https://doi.org/10.1007/978-3-030-82196-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-82196-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82195-1
Online ISBN: 978-3-030-82196-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)