Skip to main content

The Data Mining Dataset Characterization Ontology

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 295))

Abstract

Data mining technology is applied in kinds of domains in real life. But the diversity of algorithms is bothering many data researchers. Many data mining knowledge bases have been developed, data mining assistants have also been proposed for algorithm selection. However, these studies focus on the description of algorithms, which makes a gap between the raw data and the ontology entities. This paper proposes a dataset characterization ontology, representing dataset characteristics to support the algorithm selection process. An example of dealing with the missing value problem using the proposed ontology is presented.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Kietz, J.W., et al.: Towards cooperative planning of data mining workflows (2009)

    Google Scholar 

  2. Peng, Y., Flach, P.A., Soares, C., et al.: Improved dataset characterisation for meta-learning. In: International Conference on Discovery Science, pp. 141–152. Springer, Heidelberg (2002)

    Google Scholar 

  3. Pechenizkiy, M.: Data mining strategy selection via empirical and constructive induction. In: Databases and Applications, pp. 59-64 (2005)

    Google Scholar 

  4. Bhatt, N., Thakkar, A., Ganatra, A.: A survey and current research challenges in meta learning approaches based on dataset characteristics. Int. J. Soft Comput. Eng. 2(10), 234–247 (2012)

    Google Scholar 

  5. Tripathy, M., Panda, A.: A study of algorithm selection in data mining using meta-learning. J. Eng. Sci. Technol. Rev. 10(2), 51–64 (2017)

    Article  Google Scholar 

  6. Pimentel, B.A., de Carvalho, A.C.P.L.F.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)

    Article  Google Scholar 

  7. Pimentel, B.A., de Carvalho, A.C.P.L.F.: A meta-learning approach for recommending the number of clusters for clustering algorithms. Knowl.-Based Syst. 195, 105682 (2020)

    Article  Google Scholar 

  8. Oreski, D., Oreski, S., Klicek, B.: Effects of dataset characteristics on the performance of feature selection techniques. Appl. Soft Comput. 52, 109–119 (2017)

    Article  Google Scholar 

  9. Moustafa Reda, M., Nassef, M., Salah, A.: Categorization of factors affecting classification algorithms selection. Int. J. Data Min. Knowl. Manage. Process (IJDKP) 9, 184–188 (2019)

    Google Scholar 

  10. Keet, C.M., et al.: The data mining optimization ontology. J. Web Seman. 32, 43–53 (2015)

    Article  Google Scholar 

  11. Panov, P., Džeroski, S., Soldatova, L. OntoDM: an ontology of data mining. In: 2008 IEEE International Conference on Data Mining Workshops, pp. 752–760. IEEE (2008)

    Google Scholar 

  12. Panov, P., Soldatova, L., Džeroski, S. OntoDM-KDD: ontology for representing the knowledge discovery process. In: International Conference on Discovery Science, pp. 126–140. Springer, Heidelberg (2013)

    Google Scholar 

  13. Panov, P., Soldatova, L.N., Džeroski, S.: Generic ontology of datatypes. Inf. Sci. 329, 900–920 (2016)

    Article  Google Scholar 

  14. Tianxing, M., Stankova, E., Vodyaho, A., Zhukova, N., Shichkina, Y.: Domain-oriented multilevel ontology for adaptive data processing. In: International Conference on Computational Science and Its Applications, pp. 634–649. Springer, Cham (2020)

    Google Scholar 

  15. Tianxing, M., Nataly, Z., Nikolay, M.: A knowledge-based recommendation system for time series classification. In: Conference of Open Innovations Association, FRUCT. No. 24. FRUCT Oy (2019)

    Google Scholar 

  16. Smith, B., et al.: The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25(11), 1251–1255 (2007)

    Article  Google Scholar 

  17. Mack, C., Su, Z., Westreich, D.: Managing Missing Data in Patient Registries: Addendum to Registries for Evaluating Patient Outcomes: A User’s Guide, [Internet] (2018)

    Google Scholar 

  18. ISO/IEC 11404:2007. Information technology – General-Purpose Datatypes (GPD) (2007). http://www.iso.org/iso/catalogue_detail.htm?csnumber=39479

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tianxing, M., Zhukova, N. (2022). The Data Mining Dataset Characterization Ontology. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 295. Springer, Cham. https://doi.org/10.1007/978-3-030-82196-8_17

Download citation

Publish with us

Policies and ethics