Semantic subgroup explanations

Vavpetič, Anže; Podpečan, Vid; Lavrač, Nada

doi:10.1007/s10844-013-0292-1

Semantic subgroup explanations

Published: 06 December 2013

Volume 42, pages 233–254, (2014)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Anže Vavpetič^1,2,
Vid Podpečan¹ &
Nada Lavrač^1,2,3

415 Accesses
12 Citations
Explore all metrics

Abstract

Subgroup discovery (SD) methods can be used to find interesting subsets of objects of a given class. While subgroup describing rules are themselves good explanations of the subgroups, domain ontologies can provide additional descriptions to data and alternative explanations of the constructed rules. Such explanations in terms of higher level ontology concepts have the potential of providing new insights into the domain of investigation. We show that this additional explanatory power can be ensured by using recently developed semantic SD methods. We present a new approach to explaining subgroups through ontologies and demonstrate its utility on a motivational use case and on a gene expression profiling use case where groups of patients, identified through SD in terms of gene expression, are further explained through concepts from the Gene Ontology and KEGG orthology. We qualitatively compare the methodology with the supporting factors technique for characterizing subgroups. The developed tools are implemented within a new browser-based data mining platform ClowdFlows.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simple Explanations to Summarise Subgroup Discovery Outcomes: A Case Study Concerning Patient Phenotyping

Inferring disease subtypes from clusters in explanation space

Article Open access 30 July 2020

Community-Based Semantic Subgroup Discovery

Notes

References

Angiulli, F., Fassetti, F., Palopoli, L. (2013). Discovering characterizations of the behavior of anomalous subpopulations. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1280–1292. doi:10.1109/TKDE.2012.58.
Article Google Scholar
Atzmüller, M., & Puppe, F. (2006). SD-Map—a fast algorithm for exhaustive subgroup discovery. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD ’06) (pp. 6–17). Springer.
Bay, S.D., & Pazzani, M.J. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.
Article MATH Google Scholar
Demšar, J., Zupan, B., Leban, G. (2004). Orange: from experimental machine learning to interactive data mining, white paper. Faculty of Computer and Information Science, University of Ljubljana. www.ailab.si/orange.
Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: discovering trends and differences. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99) (pp. 43–52).
Elston, C.W., & Ellis, I.O. (1991). Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology, 19(5), 403–410.
Article Google Scholar
Eronen, L., & Toivonen, H. (2012). Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinformatics, 13, 119.
Article Google Scholar
Galea, M., Blamey, R., Elston, C., Ellis, I. (1992). The Nottingham prognostic index in primary breast cancer. Breast Cancer Research and Treatment, 22, 207–219.
Article Google Scholar
Gamberger, D., & Lavrač, N. (2002). Expert-guided subgroup discovery: methodology and application. Journal of Artificial Intelligence Research (JAIR), 17, 501–527.
MATH Google Scholar
Gamberger, D., & Lavrač, N. (2003). Active subgroup mining: a case study in coronary heart disease risk group detection. Artificial Intelligence in Medicine, 28(1), 27–57.
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H. (2009). The WEKA data mining software: an update. SIGKDD Explor Newsl, 11, 10–18.
Article Google Scholar
Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A. (2011). Ontology-based meta-mining of knowledge discovery workflows. In N. Jankowski, W. Duch, K. Grabczewski (Eds.), Meta-learning in computational intelligence, studies in computational intelligence (Vol. 358, pp. 273–315). Berlin Heidelberg: Springer.
Chapter Google Scholar
Jovanoski, V., & Lavrač, N. (2001). Classification rule learning with APRIORI-C. In P. Brazdil, & A. Jorge (Eds.), EPIA, lecture notes in computer science (Vol. 2258, pp. 44–51). Berlin Heidelberg: Springer.
Kavšek, B., & Lavrač, N. (2006). APRIORI-SD: adapting association rule learning to subgroup discovery. Applied Artificial Intelligence, 20(7), 543–583.
Article Google Scholar
Klösgen, W. (1996). Explora: a multipattern and multistrategy discovery assistant. In Advances in knowledge discovery and data mining, (pp. 249–271). Menlo Park: American Association for Artificial Intelligence.
Google Scholar
Kralj Novak, P., Lavrač, N., Webb, G.I. (2009). Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377–403.
MATH Google Scholar
Kranjc, J., Podpečan, V., Lavrač, N. (2012). Clowdflows: a cloud based scientific workflow platform. In P.A. Flach, T.D. Bie, N. Cristianini (Eds.), ECML/PKDD (2), lecture notes in computer science (Vol. 7524, pp. 816–819). Berlin Heidelberg: Springer.
Google Scholar
Langohr, L., Podpečan, V., Petek, M., Mozetič, I., Gruden, K., Lavrač, N., Toivonen, H. (2013). Contrasting subgroup discovery. Computer Journal, 56(3), 289–303.
Article Google Scholar
Lavrač, N., Kavšek, B., Flach, P.A., Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–188.
Google Scholar
Lavrač, N., Vavpetič, A., Soldatova, L., Trajkovski, I., Kralj Novak, P. (2011). Using ontologies in semantic data mining with SEGS and g-SEGS. In Proceedings of the international conference on discovery science (DS ’11) (pp. 165–178). Springer.
Lawrynowicz, A., & Potoniec, J. (2011). Fr-ont: an algorithm for frequent concept mining with formal ontologies. In M. Kryszkiewicz, H. Rybinski, A. Skowron, Z.W. Ras (Eds.), ISMIS, lecture notes in computer science (Vol. 6804, pp. 428–437). Berlin Heidelberg: Springer.
Google Scholar
Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T. (2005). Entrez gene: gene-centered information at NCBI. Nucleic Acids Research, 33(Database issue).
McCall, M.N., Bolstad, B.M., Irizarry, R.A. (2010). Frozen robust multiarray analysis (fRMA). Biostatistics, 11(2), 242–253.
Article Google Scholar
Podpečan, V., Juršič, M., žakova, M., Lavrač, N. (2009). Towards a service-oriented knowledge discovery platform. In V. Podpečan & N. Lavrač (Eds.), Third-generation data mining: towards service-oriented knowledge discovery (pp. 25–36).
Podpečan, V., Lavrač, N., Mozetič, I., Kralj Novak, P., Trajkovski, I., Langohr, L., Kulovesi, K., Toivonen, H., Petek, M., Motaln, H., Gruden, K. (2011a). SegMine workflows for semantic microarray data analysis in Orange4WS. BMC Bioinformatics, 12, 416.
Article Google Scholar
Podpečan, V., Zemenova, M., Lavrač, N. (2011b). Orange4WS environment for service-oriented data mining. The Computer Journal. doi:10.1093/comjnl/bxr077. Accessed 7 Aug 2011.
Robnik-Šikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 53, 23–69.
Article MATH Google Scholar
Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M.J., Bergh, J., Piccart, M., Delorenzi, M. (2006). Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute, 98(4), 262–272.
Google Scholar
Srinivasan, A. (2007). Aleph manual. http://www.cs.ox.ac.uk/activities/machinelearning/Aleph/.
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15,545–15,550.
Article Google Scholar
Suzuki, E. (1997). Autonomous discovery of reliable exception rules. In Proceedings of the third international conference on knowledge discovery and data mining (pp. 259–262).
Suzuki, E. (2006). Data mining methods for discovering interesting exceptions from an unsupervised table. Journal of Universal Computer Science, 12(6), 627–653.
Google Scholar
Taminau, J., Steenhoff, D., Coletta, A., Meganck, S., Lazar, C., de Schaetzen, V., Duque, R., Molter, C., Bersini, H., Nowé, A., Weiss Solís, D.Y. (2011). InSilicoDB: an R/Bioconductor package for accessing human Affymetrix expert-curated datasets from GEO. Bioinformatics. doi:10.1093/bioinformatics/btr529.
Trajkovski, I., Lavrač, N., Tolar, J. (2008). SEGS: search for enriched gene sets in microarray data. Journal of Biomedical Informatics, 41(4), 588–601.
Article Google Scholar
Vavpetič, A., & Lavrač, N. (2013). Semantic subgroup discovery systems and workflows in the SDM-Toolkit. Computer Journal, 56(3), 304–320.
Article Google Scholar
Vavpetič, A., Podpečan, V., Meganck, S., Lavrač, N. (2012). Explaining subgroups through ontologies. In P. Anthony, M. Ishizuka, D. Lukose (Eds.), Proceedings of PRICAI, lecture notes in computer science (Vol. 7458, pp. 625–636). Berlin Heidelberg: Springer.
Google Scholar
Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N. (2013). Semantic data mining of financial news articles. In Proceedings of the international conference on discovery science (DS ’13). Springer.
Webb, G.I., Butler, S.M., Newlands, D. (2003). On detecting differences between groups. In Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-03) (pp. 256–265).
Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Proceedings of the first European conference on principles of data mining and knowledge discovery (PKDD ’97) (pp. 78–87). Springer.
Žáková, M., Železný, F., García-Sedano, J.A., Tissot, C.M., Lavrač, N., Kremen, P., Molina, J. (2006). Relational data mining applied to virtual engineering of product designs. In Proceedings of the 16th international conference on inductive logic programming (ILP’06) (pp. 439–453). Berlin/Heidelberg, Germany, Santiago de Compostela, Spain: Springer-Verlag.
Google Scholar

Download references

Acknowledgments

This work was supported by the Slovenian Ministry of Higher Education, Science and Technology [grant number P-103], the Slovenian Research Agency [grant number PR-04431], the SemDM project (Development and application of new semantic data mining methods in life sciences) [grant number J2-5478] and the FP7 European Commission project MUSE (Machine understanding for interactive storytelling) [grant number 296703].

Author information

Authors and Affiliations

Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
Anže Vavpetič, Vid Podpečan & Nada Lavrač
Jožef Stefan International Postgraduate School, Ljubljana, Slovenia
Anže Vavpetič & Nada Lavrač
University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač

Authors

Anže Vavpetič
View author publications
You can also search for this author in PubMed Google Scholar
Vid Podpečan
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anže Vavpetič.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vavpetič, A., Podpečan, V. & Lavrač, N. Semantic subgroup explanations. J Intell Inf Syst 42, 233–254 (2014). https://doi.org/10.1007/s10844-013-0292-1

Download citation

Received: 08 May 2013
Revised: 20 September 2013
Accepted: 19 November 2013
Published: 06 December 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s10844-013-0292-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic subgroup explanations

Abstract

Access this article

Similar content being viewed by others

Simple Explanations to Summarise Subgroup Discovery Outcomes: A Case Study Concerning Patient Phenotyping

Inferring disease subtypes from clusters in explanation space

Community-Based Semantic Subgroup Discovery

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic subgroup explanations

Abstract

Access this article

Similar content being viewed by others

Simple Explanations to Summarise Subgroup Discovery Outcomes: A Case Study Concerning Patient Phenotyping

Inferring disease subtypes from clusters in explanation space

Community-Based Semantic Subgroup Discovery

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation