Skip to main content

Representing Entities in the OntoDM Data Mining Ontology

  • Chapter
  • First Online:

Abstract

Motivated by the need for unification of the domain of data mining and the demand for formalized representation of outcomes of data mining investigations, we address the task of constructing an ontology of data mining. Our heavy-weight ontology, named OntoDM, is based on a recently proposed general framework for data mining. It represent entites such as data, data mining tasks and algorithms, and generalizations (resulting from the latter), and allows us to cover much of the diversity in data mining research, including recently developed approaches to mining structured data and constraint-based data mining. OntoDM is compliant to best practices in ontology engineering, and can consequently be linked to other domain ontologies: It thus represents a major step towards an ontology of data mining investigations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, pages 207–216. ACM Press, 1993.

    Google Scholar 

  2. A. Bernstein, F. Provost, and S. Hill. Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering, 17(4):503–518, 2005.

    Article  Google Scholar 

  3. H. Blockeel. Experiment databases: A novel methodology for experimental research. In Proc. 4th Intl. Wshp. on Knowledge Discovery in Inductive Databases, LNCS 3933:72–85. Springer, 2006.

    Google Scholar 

  4. H. Blockeel and J. Vanschoren. Experiment databases: Towards an improved experimental methodology in machine learning. In Proc. 11th European Conf. on Principles and Practices of Knowledge Discovery in Databases, LNCS 4702:6–17. Springer, 2007.

    Google Scholar 

  5. P. Brezany, I. Janciak, and A. M. Tjoa. Ontology-Based Construction of Grid Data Mining Workflows. In H.O. Nigro, S. Gonzales Cisaro and D. Xodo, editors, Data Mining with Ontologies: Implementations, Findings and Frameworks, pages 182–210, IGI Global, 2007.

    Google Scholar 

  6. R. R. Brinkman, M. Courtot, D. Derom, J. M. Fostel, Y. He, P. Lord, J. Malone, H. Parkinson, B. Peters, P. Rocca-Serra, A. Ruttenberg, S-A. A. Sansone, L. N. Soldatova, C. J. Stoeckert, J. A. Turner, J. Zheng, and OBI consortium. Modeling biomedical experimental processes with OBI. Journal of Biomedical Semantics, 1(Suppl 1):S7+, 2010.

    Google Scholar 

  7. P. Buitelaar and P. Cimiano, editors. Ontology Learning and Population: Bridging the Gap between Text and Knowledge. IOS Press, 2008.

    Google Scholar 

  8. M. Cannataro and C. Comito. A data mining ontology for grid programming. In Proc. 1st Intl. Wshop. on Semantics in Peer-to-Peer and Grid Computing, pages 113–134. IWWWC, 2003.

    Google Scholar 

  9. M. Cannataro and D. Talia. The knowledge GRID. Communications of the ACM, 46(1):89–93, 2003.

    Article  Google Scholar 

  10. M. Courtot, F. Gibson, A. L. Lister, R. R. Brinkman J. Malone, D. Schober, and A. Ruttenberg. MIREOT: The Minimum Information to Reference an External Ontology Term. In Proc. Intl. Conf. on Biomedical Ontology, 2009.

    Google Scholar 

  11. C. Diamantini and D. Potena. Semantic annotation and services for KDD tools sharing and reuse. In Proc. IEEE International Conference on Data Mining Workshops, pages 761–770, IEEE Computer Society, 2008.

    Google Scholar 

  12. C. Diamantini, D. Potena, and E. Storti. KDDONTO: An ontology for discovery and composition of KDD algorithms. In Proc. 2nd Intl. Wshp. on Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery, pages 13–25. ECML/PKDD 2009.

    Google Scholar 

  13. S. Džeroski. Towards a general framework for data mining. In Proc. 5th Intl. Wshp. on Knowledge Discovery in Inductive Databases, LNCS 4747:259–300, Springer, 2007

    Google Scholar 

  14. A. Brazma et al. Minimum information about a microarray experiment (MIAME) – toward standards for microarray data. Nature Genetics, 29(4):365–371, 2001.

    Article  Google Scholar 

  15. B. Smith et al. The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology, 25(11):1251–1255, 2007.

    Article  Google Scholar 

  16. C.F. Taylor et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nature Biotechnology, 26(8):889–896, 2008.

    Article  Google Scholar 

  17. W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus. Knowledge discovery in databases: An overview. In G. Piatetsky-Shapiro and W. J. Frawley, editors. Knowledge Discovery in Databases, pages 1–30. AAAI/MIT Press, 1991.

    Google Scholar 

  18. T. Gaertner. A survey of kernels for structured data. SIGKDD Explorations, 2003.

    Google Scholar 

  19. A. Gangemi, N. Guarino, C. Masolo, A. Oltramari, and L. Schneider. Sweetening ontologies with DOLCE. In Proc. 13th Intl. Conf. on Knowledge Engineering and Knowledge Management, Ontologies and the Semantic Web, LNCS 2473:166–181, Springer, 2002.

    Google Scholar 

  20. P. Grenon and B. Smith. SNAP and PAN: Towards dynamic spatial ontology. Spatial Cognition & Computation, 4(1):69–104, 2004.

    Article  Google Scholar 

  21. D. J. Hand, P. Smyth, and H. Mannila. Principles of Data Mining. MIT Press, 2001.

    Google Scholar 

  22. M. Hilario, A. Kalousis, P. Nguyen, and A. Woznica. A data mining ontology for algorithm selection and Meta-Mining. In Proc. 2nd Intl. Wshp. on Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery, pages 76–88. ECML/PKDD, 2009.

    Google Scholar 

  23. M. F. Hornick, E. Marcadé, and S. Venkayala. Java Data Mining: Strategy, Standard, and Practice. Morgan Kaufmann, 2006.

    Google Scholar 

  24. A. Kalousis, A. Bernstein, and M. Hilario. Meta-learning with kernels and similarity functions for planning of data mining workflows. In Proc. 2nd Intl. Wshp. on Planning to Learn, pages 23–28. ICML/COLT/UAI, 2008.

    Google Scholar 

  25. L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Interscience, 1990.

    Google Scholar 

  26. J. Kietz, F. Serban, A. Bernstein, and S. Fischer. Towards cooperative planning of data mining workflows. In Proc. 2nd Intl. Wshp. on Third Generation Data Mining: Towards Service- Oriented Knowledge Discovery, pages 1–13. ECML/PKDD, 2009.

    Google Scholar 

  27. J-U. Kietz, A. Bernstein F. Serban, and S. Fischer. Data mining workflow templates for intelligent discovery assistance and Auto-Experimentation. In Proc. 2nd Intl. Wshop. Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery, pages 1–12. ECML/PKDD, 2010.

    Google Scholar 

  28. R.D. King, J. Rowland, S. G. Oliver, M. Young, W. Aubrey, E. Byrne, M. Liakata, M. Markham, P. Pir, L. N. Soldatova, A. Sparkes, K.E. Whelan, and A. Clare. The Automation of Science. Science, 324(5923):85–89, 2009.

    Article  Google Scholar 

  29. A. Lister, Ph. Lord, M. Pocock, and A. Wipat. Annotation of SBML models through rulebased semantic integration. Journal of Biomedical Semantics, 1(Suppl 1):S3, 2010

    Google Scholar 

  30. A. Maccagnan, M. Riva, E. Feltrin, B. Simionati, T. Vardanega, G. Valle, and N. Cannata. Combining ontologies and workflows to design formal protocols for biological laboratories. Automated Experimentation, 2:3, 2010.

    Article  Google Scholar 

  31. E. Malaia. Engineering Ontology: Domain Acquisition Methodology and Pactice. VDM Verlag, 2009.

    Google Scholar 

  32. B. Meek. A taxonomy of datatypes. SIGPLAN Notes, 29(9):159–167, 1994.

    Article  Google Scholar 

  33. R. Mizoguchi. Tutorial on ontological engineering - part 3: Advanced course of ontological engineering. New Generation Computing, 22(2):193–220, 2004.

    Article  MATH  Google Scholar 

  34. I. Niles and A. Pease. Towards a standard upper ontology. In Proc. Intl. Conf. Formal Ontology in Information Systems, pages 2–9. ACM Press, 2001.

    Google Scholar 

  35. P. Panov, S. Džeroski, and L. N. Soldatova. OntoDM: An ontology of data mining. In Proc. IEEE International Conference on Data Mining Workshops, pages 752–760. IEEE Computer Society, 2008.

    Google Scholar 

  36. P. Panov, L. N. Soldatova, and S. Džeroski. Towards an ontology of data mining investigations. In Proc. 12th Intl. Conf. on Discovery Science, LNCS 5808:257–271. Springer, 2009.

    Google Scholar 

  37. Y. Peng, G. Kou, Y. Shi, and Z. Chen. A descriptive framework for the field of data mining and knowledge discovery. International Journal of Information Technology and Decision Making, 7(4):639–682, 2008.

    Article  Google Scholar 

  38. D. Qi, R. King, G. R. Bickerton A. Hopkins, and L. Soldatova. An ontology for description of drug discovery investigations. Journal of Integrative Bioinformatics, 7(3):126, 2010.

    Google Scholar 

  39. D. Schober, W. Kusnierczyk, S. E Lewis, and J. Lomax. Towards naming conventions for use in controlled vocabulary and ontology engineering. In Proc. BioOntologies SIG, pages 29–32. ISMB, 2007.

    Google Scholar 

  40. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.

    Google Scholar 

  41. B. Smith. Ontology. In Luciano Floridi, editor, Blackwell Guide to the Philosophy of Computing and Information, pages 155–166. Oxford Blackwell, 2003.

    Google Scholar 

  42. B. Smith, W. Ceusters, B. Klagges, J. Kohler, A. Kumar, J. Lomax, C. Mungall, F. Neuhaus, A. L. Rector, and C. Rosse. Relations in biomedical ontologies. Genome Biology, 6:R46, 2005.

    Google Scholar 

  43. L. N. Soldatova, W. Aubrey, R. D. King, and A. Clare. The EXACT description of biomedical protocols. Bioinformatics, 24(13):i295–i303, 2008.

    Article  Google Scholar 

  44. L. N. Soldatova and R. D. King. Are the current ontologies in biology good ontologies? Nature Biotechnology, 23(9):1095–1098, 2005.

    Article  Google Scholar 

  45. L. N. Soldatova and R. D. King. An ontology of scientific experiments. Journal of the Royal Society Interface, 3(11):795–803, 2006.

    Article  Google Scholar 

  46. J. Vanschoren, H. Blockeel, B. Pfahringer, and G. Holmes. Experiment databases: Creating a new platform for meta-learning research. In Proc. 2nd Intl. Wshp. on Planning to Learn, pages 10–15. ICML/COLT/UAI, 2008.

    Google Scholar 

  47. J. Vanschoren and L. Soldatova. Exposé: An ontology for data mining experiments. In Proc. 3rd Intl. Wshp. on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery, pages 31–44. ECML/PKDD, 2010.

    Google Scholar 

  48. C. Vens, J. Struyf, L. Schietgat, S. Džeroski, and H. Blockeel. Decision trees for hierarchical multi-label classification. Machine Learning, 73(2):185–214, 2008.

    Article  Google Scholar 

  49. M. Žáková, P. Kremen, F. Zelezny, and N. Lavrač. Planning to learn with a knowledge discovery ontology. In Proc. 2nd Intl. Wshop. Planning to Learn, pages 29–34. ICML/COLT/UAI, 2008.

    Google Scholar 

  50. M. Žáková, V. Podpecan, F. Železný, and N. Lavrač. Advancing data mining workflow construction: A framework and cases using the orange toolkit. In V. Podpečan, N. Lavrač, J.N. Kok, and J. de Bruin, editors, Proc. 2nd Intl. Wshop. Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery, pages 39–52. ECML/PKDD 2009.

    Google Scholar 

  51. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. 2nd ed., Morgan Kaufmann, 2005.

    Google Scholar 

  52. Q. Yang and X. Wu. 10 challenging problems in data mining research. International Journal of Information Technology and Decision Making, 5(4):597–604, 2006.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panče Panov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Panov, P., Džeroski, S., Soldatova, L.N. (2010). Representing Entities in the OntoDM Data Mining Ontology. In: Džeroski, S., Goethals, B., Panov, P. (eds) Inductive Databases and Constraint-Based Data Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7738-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-7738-0_2

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-7737-3

  • Online ISBN: 978-1-4419-7738-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics