[KD3] A Workflow-Based Application for Exploration of Biomedical Data Sets

Dander, Andreas; Handler, Michael; Netzer, Michael; Pfeifer, Bernhard; Seger, Michael; Baumgartner, Christian

doi:10.1007/978-3-642-23740-9_7

[KD³] A Workflow-Based Application for Exploration of Biomedical Data Sets

Andreas Dander^21,22,23,
Michael Handler²⁴,
Michael Netzer²⁴,
Bernhard Pfeifer²⁴,
Michael Seger²⁴ &
…
Christian Baumgartner²⁴

Chapter

450 Accesses

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 6990))

Abstract

Based on the biotechnological revolution in the past years, molecular biology has become increasingly data-driven. Knowledge Discovery in Databases, a well-known process in the field of bioinformatics, is supporting the biological research process from data integration, knowledge mining to data interpretation.

This work proposes a new software suite, termed Knowledge Discovery in Databases Designer (KD³), covering the complete Knowledge Discovery in Databases process using a workflow-oriented architecture. Three different application-oriented modules are implemented in KD³: First, the Designer for designing specific workflows. These workflows can be used by the Interpreter, which allows to load and parameterize existing workflows. The Launcher encapsulates one dedicated workflow into an independent application to answer one specific biomedical question. KD³ offers a variety of implemented methods, which can be easily extended with new customized components using functional objects. All components can be connected to workflows, which may contain elements of other applications.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

IBM SPSS Modeling Family, http://www.spss.com/software/modeling/
Java Statistical Classes, http://www.jsc.nildram.co.uk/
TIBCO Spotfire Miner, http://spotfire.tibco.com/products/data-mining-applications.aspx
Altman, D.: Practical Statistics for Medical Research. Chapman & Hall/CRC (1991)
Google Scholar
Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. SIGMOD Rec. 28(2), 49–60 (1999)
Article Google Scholar
Barrett, T., Troup, D., Wilhite, S., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I., Soboleva, A., Tomashevsky, M., Marshall, K., et al.: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Research 37(database issue), D885 (2009)
Article Google Scholar
Baumgartner, C., Lewis, G., Netzer, M., Pfeifer, B., Gerszten, R.: A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury. Bioinformatics 26(14), 1745–1751 (2010)
Article Google Scholar
Berthold, M., Cebron, N., Dill, F., Gabriel, T., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. In: Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer, Heidelberg (2007)
Google Scholar
Demsar, J., Zupan, B., Leban, G.: Orange: From Experimental Machine Learning to Interactive Data Mining. Tech. rep., Faculty of Computer and Information Science, University of Ljubljana (2004)
Google Scholar
Dougherty, G.: Digital Image Processing for Medical Applications. Cambridge University Press, New York (2009)
Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Magazine 17, 37–54 (1996)
Google Scholar
Gasteiger, J., Engl, T.: Chemoinformatics: A Textbook, 1st edn. WILEY-VCH, Chichester (2003)
Book Google Scholar
Gentleman, R.: R Programming for Bioinformatics. Chapman & Hall/CRC Computer Science and Data Analysis (2008)
Google Scholar
Grinstein, G., Ward, M.: Introduction to data visualization. In: Fayyad, U., Grinstein, G., Wierse, A. (eds.) Information Visualization in Data Mining and Knowledge Discovery, vol. 1, pp. 21–45. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Article Google Scholar
Herbig, J., Seger, M., Kohl, I., Mayramhof, G., Titzmann, T., Preinfalk, A., Winkler, K., Dunkl, J., Pfeifer, B., Baumgartner, C., Hansel, A.: Online breath sampling with PTR-MS - A setup for large screening studies. In: Proc. 4th Int. Conf. on Proton Transfer Reaction Mass Spectrometry and Its Applications (2009)
Google Scholar
Hornuss, C., Praun, S., Villinger, J., Dornauer, A., Moehnle, P., Dolch, M., Weninger, E., Chouker, A., Feil, C., Briegel, J., et al.: Real-time monitoring of propofol in expired air in humans undergoing total intravenous anesthesia. Anesthesiology 106(4), 665 (2007)
Article Google Scholar
Johnson, S.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)
Article Google Scholar
Kaminsky, F., Benneyan, J., Davis, R., Burke, R.: Statistical control charts based on a geometric distribution. Journal of Quality Technology 24(2), 63–69 (1992)
Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Chapter Google Scholar
Kruskal, W., Wallis, W.: Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association 47, 583–621 (1952)
Article MATH Google Scholar
Leinonen, R., Sugawara, H., Shumway, M.: The sequence read archive. Nucleic Acids Res. [Epub ahead of print] (2010)
Google Scholar
MacKay, D.: An Example Inference Task: Clustering. In: MacKay, D. (ed.) Information Theory, Inference, and Learning Algorithms, vol. 1, pp. 284–292. Cambridge University Press, Cambridge (2003)
Google Scholar
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid Prototyping for Complex Data Mining Tasks. In: Ungar, L., Craven, M., Gunopulos, D., Eliassi-Rad, T. (eds.) KDD 2006: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–940. ACM, New York (2006)
Google Scholar
Millonig, G., Praun, S., Netzer, M., Baumgartner, C., Dornauer, A., Mueller, S., Villinger, J., Vogel, W.: Non-invasive diagnosis of liver diseases by breath analysis using an optimized ion-molecule reaction-mass spectrometry approach: a pilot study. Biomarkers 15(4), 297–306 (2010)
Article Google Scholar
Netzer, M., Millonig, G., Osl, M., Pfeifer, B., Praun, S., Villinger, J., Vogel, W., Baumgartner, C.: A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry. Bioinformatics 25(7), 941–947 (2009)
Article Google Scholar
Parkinson, H., Kapushesky, M., Kolesnikov, N., Rustici, G., Shojatalab, M., Abeygunawardena, N., Berube, H., Dylag, M., Emam, I., Farne, A., et al.: ArrayExpress update–from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Research 37(database issue), D868 (2009)
Article Google Scholar
Pfeifer, B., Aschaber, J., Baumgartner, C., Modre, R., Dreiseitl, S., Schreier, G., Tilg, B.: A data warehouse for prostate cancer biomarker discovery. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, p. 9. Springer, Heidelberg (2007)
Google Scholar
Pfeifer, B., Kugler, K., Tejada, M., Baumgartner, C., Seger, M., Osl, M., Netzer, M., Handler, M., Dander, A., Wurz, M., Graber, A., Tilg, B.: A Cellular Automaton Framework for Infectious Disease Spread Simulation. The Open Medical Informatics Journal 2, 58–69 (2008)
Article Google Scholar
Pfeifer, B., Tejada, M., Kugler, K., Osl, M., Netzer, M., Seger, M., Modre-Osprian, R., Schreier, G., Tilg, B.: A Biomedical Knowledge Discovery in Databases Design Tool - Turning Data into Information. In: eHealth (2008)
Google Scholar
Quinlan, R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Rakotomalala, R.: TANAGRA: un logiciel gratuit pour l’enseignement et la recherche. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds.) EGC 2005. LNCS, vol. 3470, pp. 697–702. Springer, Heidelberg (2005)
Google Scholar
Williamson, D., Parker, R., Kendrick, J.: The Box Plot: A Simple Visual Method to Interpret Data. Annals of Internal Medicine 110(11), 916–921 (1989)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Bioinformatics and Translational Research, UMIT, Hall in Tirol, Austria
Andreas Dander
Oncotyrol, Center for Personalized Cancer Medicine, Innsbruck, Austria
Andreas Dander
Biocenter, Division for Bioinformatics, Innsbruck Medical University, Innsbruck, Austria
Andreas Dander
Institute of Electrical, Electronic and Bioengineering, UMIT, Hall in Tirol, Austria
Michael Handler, Michael Netzer, Bernhard Pfeifer, Michael Seger & Christian Baumgartner

Authors

Andreas Dander
View author publications
You can also search for this author in PubMed Google Scholar
Michael Handler
View author publications
You can also search for this author in PubMed Google Scholar
Michael Netzer
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Pfeifer
View author publications
You can also search for this author in PubMed Google Scholar
Michael Seger
View author publications
You can also search for this author in PubMed Google Scholar
Christian Baumgartner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut de Recherche en Informatique de Toulouse (IRIT), Paul Sabatier University, 118, route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
University of Linz, FAW, Altenbergerstraße 69, 4040, Linz, Austria
Josef Küng & Roland Wagner &
Department of Computer Science, Ludwig-Maximilians-Universität, Oettingenstrasse 67, 80538, München, Germany
Christian Böhm
Institut für Informatik-Systeme, Alpen Adria Universität Klagenfurt, Universitätsstr. 65, 9020, Klagenfurt, Austria
Johann Eder
Department of Scientific Computing, Florida State University, 400 Dirac Science Library, 32306-4120, Tallahassee, FL, USA
Claudia Plant

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dander, A., Handler, M., Netzer, M., Pfeifer, B., Seger, M., Baumgartner, C. (2011). [KD³] A Workflow-Based Application for Exploration of Biomedical Data Sets. In: Hameurlain, A., Küng, J., Wagner, R., Böhm, C., Eder, J., Plant, C. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems IV. Lecture Notes in Computer Science, vol 6990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23740-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-23740-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23739-3
Online ISBN: 978-3-642-23740-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics