Abstract
With the expanding of the Semantic Web and the availability of numerous ontologies which provide domain background knowledge and semantic descriptors to the data, the amount of semantic data is rapidly growing. The data mining community is faced with a paradigm shift: instead of mining the abundance of empirical data supported by the background knowledge, the new challenge is to mine the abundance of knowledge encoded in domain ontologies, constrained by the heuristics computed from the empirical data collection. We address this challenge by an approach, named semantic data mining, where domain ontologies define the hypothesis search space, and the data is used as means of constraining and guiding the process of hypothesis search and evaluation. The use of prototype semantic data mining systems SEGS and g-SEGS is demonstrated in a simple semantic data mining scenario and in two real-life functional genomics scenarios of mining biological ontologies with the support of experimental microarray data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.C., Wang, H. (eds.): Managing and Mining Graph Data. Springer, US (2010)
Aronis, J.M., Provost, F.J., Buchanan, B.G.: Exploiting background knowledge in automated discovery. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 355–358 (1996)
Brisson, L., Collard, M.: How to semantically enhance a data mining process? In. In: Filipe, J., Cordeiro, J. (eds.) ICEIS 2008. LNBIP, vol. 19, pp. 103–116. Springer, Heidelberg (2009)
Clearwater, S.H., Provost, F.J.: Rl4: A tool for knowledge-based induction. In: Proc. of the 2nd International IEEE Conference on Tools for Artificial Intelligence, pp. 24–30 (November 1990)
De Raedt, L.: Logical and Relational Learning. Springer, Heidelberg (2008)
Garriga, G.C., Ukkonen, A., Mannila, H.: Feature selection in taxonomies with applications to paleontology. In: Boulicaut, J.-F., Berthold, M.R., Horváth, T. (eds.) DS 2008. LNCS (LNAI), vol. 5255, pp. 112–123. Springer, Heidelberg (2008)
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoringt. Science 286, 531–537 (1999)
Gottgtroy, P., Kasabov, N., MacDonell, S.: An ontology driven approach for knowledge discovery in biomedicine. In: Proc. of the VIII Pacific Rim International Conferences on Artificial Intelligence, PRICAI (2004)
Kim, S.Y., Volsky, D.J.: Page: Parametric analysis of gene set enrichment. BMC Bioinformatics 6(144) (2005)
Lavrač, N., Kavšek, B., Flach, P.A., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)
Lehmann, J., Haase, C.: Ideal Downward Refinement in the \(\mathcal{EL}\) Description Logic. In: De Raedt, L. (ed.) ILP 2009. LNCS, vol. 5989, pp. 73–87. Springer, Heidelberg (2010)
Liu, H.: Towards semantic data mining. In: Proc. of the 9th International Semantic Web Conference (ISWC 2010) (November 2010)
Michalski, R.S.: A theory and methodology of inductive learning. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 83–129. Tioga Publishing Company, Palo Alto (1983)
Mozetič, I., Lavrač, N., Podpečan, V., Kralj Novak, P., et al.: Bisociative knowledge discovery for microarray data analysis. In: Proc. of the First Intl. Conf. on Computational Creativity, pp. 190–199. Springer, Heidelberg (2010)
Demšar, J., Zupan, B., Leban, G.: Orange: From experimental machine learning to interactive data mining, white paper. Faculty of Computer and Information Science, University of Ljubljana (2004), www.ailab.si/orange
Podpečcan, V., Juršič, M., Žakova, M., Lavrač, N.: Towards a service-oriented knowledge discovery platform. In: Proc. of the ECML/PKDD Workshop on Third-Generation Data Mining: Towards Service-Oriented Knowledge Discovery, pp. 25–36 (2009)
Subramanian, P., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A.: Gene set enrichment analysis: A knowledge based approach for interpreting genome-wide expression profiles. Proc. of the National Academy of Science, USA 102(43), 15545–15550 (2005)
Svátek, V., Rauch, J., Ralbovský, M.: Ontology-enhanced association mining. In: Ackermann, M., Berendt, B., Grobelnik, M., Hotho, A., Mladenič, D., Semeraro, G., Spiliopoulou, M., Stumme, G., Svátek, V., van Someren, M. (eds.) EWMF 2005 and KDO 2005. LNCS (LNAI), vol. 4289, pp. 163–179. Springer, Heidelberg (2006)
Trajkovski, I., Lavrač, N., Tolar, J.: SEGS: Search for enriched gene sets in microarray data. Journal of Biomedical Informatics 41(4), 588–601 (2008)
Witten, I.H., Frank, E.: Data Mining Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lavrač, N., Vavpetič, A., Soldatova, L., Trajkovski, I., Novak, P.K. (2011). Using Ontologies in Semantic Data Mining with SEGS and g-SEGS. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-24477-3_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24476-6
Online ISBN: 978-3-642-24477-3
eBook Packages: Computer ScienceComputer Science (R0)