Skip to main content

[KD3] A Workflow-Based Application for Exploration of Biomedical Data Sets

  • Chapter
  • 450 Accesses

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 6990))

Abstract

Based on the biotechnological revolution in the past years, molecular biology has become increasingly data-driven. Knowledge Discovery in Databases, a well-known process in the field of bioinformatics, is supporting the biological research process from data integration, knowledge mining to data interpretation.

This work proposes a new software suite, termed Knowledge Discovery in Databases Designer (KD3), covering the complete Knowledge Discovery in Databases process using a workflow-oriented architecture. Three different application-oriented modules are implemented in KD3: First, the Designer for designing specific workflows. These workflows can be used by the Interpreter, which allows to load and parameterize existing workflows. The Launcher encapsulates one dedicated workflow into an independent application to answer one specific biomedical question. KD3 offers a variety of implemented methods, which can be easily extended with new customized components using functional objects. All components can be connected to workflows, which may contain elements of other applications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. IBM SPSS Modeling Family, http://www.spss.com/software/modeling/

  2. Java Statistical Classes, http://www.jsc.nildram.co.uk/

  3. TIBCO Spotfire Miner, http://spotfire.tibco.com/products/data-mining-applications.aspx

  4. Altman, D.: Practical Statistics for Medical Research. Chapman & Hall/CRC (1991)

    Google Scholar 

  5. Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. SIGMOD Rec. 28(2), 49–60 (1999)

    Article  Google Scholar 

  6. Barrett, T., Troup, D., Wilhite, S., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I., Soboleva, A., Tomashevsky, M., Marshall, K., et al.: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Research 37(database issue), D885 (2009)

    Article  Google Scholar 

  7. Baumgartner, C., Lewis, G., Netzer, M., Pfeifer, B., Gerszten, R.: A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury. Bioinformatics 26(14), 1745–1751 (2010)

    Article  Google Scholar 

  8. Berthold, M., Cebron, N., Dill, F., Gabriel, T., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. In: Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer, Heidelberg (2007)

    Google Scholar 

  9. Demsar, J., Zupan, B., Leban, G.: Orange: From Experimental Machine Learning to Interactive Data Mining. Tech. rep., Faculty of Computer and Information Science, University of Ljubljana (2004)

    Google Scholar 

  10. Dougherty, G.: Digital Image Processing for Medical Applications. Cambridge University Press, New York (2009)

    Google Scholar 

  11. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Magazine 17, 37–54 (1996)

    Google Scholar 

  12. Gasteiger, J., Engl, T.: Chemoinformatics: A Textbook, 1st edn. WILEY-VCH, Chichester (2003)

    Book  Google Scholar 

  13. Gentleman, R.: R Programming for Bioinformatics. Chapman & Hall/CRC Computer Science and Data Analysis (2008)

    Google Scholar 

  14. Grinstein, G., Ward, M.: Introduction to data visualization. In: Fayyad, U., Grinstein, G., Wierse, A. (eds.) Information Visualization in Data Mining and Knowledge Discovery, vol. 1, pp. 21–45. Morgan Kaufmann Publishers Inc., San Francisco (2002)

    Google Scholar 

  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)

    Article  Google Scholar 

  16. Herbig, J., Seger, M., Kohl, I., Mayramhof, G., Titzmann, T., Preinfalk, A., Winkler, K., Dunkl, J., Pfeifer, B., Baumgartner, C., Hansel, A.: Online breath sampling with PTR-MS - A setup for large screening studies. In: Proc. 4th Int. Conf. on Proton Transfer Reaction Mass Spectrometry and Its Applications (2009)

    Google Scholar 

  17. Hornuss, C., Praun, S., Villinger, J., Dornauer, A., Moehnle, P., Dolch, M., Weninger, E., Chouker, A., Feil, C., Briegel, J., et al.: Real-time monitoring of propofol in expired air in humans undergoing total intravenous anesthesia. Anesthesiology 106(4), 665 (2007)

    Article  Google Scholar 

  18. Johnson, S.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)

    Article  Google Scholar 

  19. Kaminsky, F., Benneyan, J., Davis, R., Burke, R.: Statistical control charts based on a geometric distribution. Journal of Quality Technology 24(2), 63–69 (1992)

    Google Scholar 

  20. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  21. Kruskal, W., Wallis, W.: Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association 47, 583–621 (1952)

    Article  MATH  Google Scholar 

  22. Leinonen, R., Sugawara, H., Shumway, M.: The sequence read archive. Nucleic Acids Res. [Epub ahead of print] (2010)

    Google Scholar 

  23. MacKay, D.: An Example Inference Task: Clustering. In: MacKay, D. (ed.) Information Theory, Inference, and Learning Algorithms, vol. 1, pp. 284–292. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  24. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid Prototyping for Complex Data Mining Tasks. In: Ungar, L., Craven, M., Gunopulos, D., Eliassi-Rad, T. (eds.) KDD 2006: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–940. ACM, New York (2006)

    Google Scholar 

  25. Millonig, G., Praun, S., Netzer, M., Baumgartner, C., Dornauer, A., Mueller, S., Villinger, J., Vogel, W.: Non-invasive diagnosis of liver diseases by breath analysis using an optimized ion-molecule reaction-mass spectrometry approach: a pilot study. Biomarkers 15(4), 297–306 (2010)

    Article  Google Scholar 

  26. Netzer, M., Millonig, G., Osl, M., Pfeifer, B., Praun, S., Villinger, J., Vogel, W., Baumgartner, C.: A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry. Bioinformatics 25(7), 941–947 (2009)

    Article  Google Scholar 

  27. Parkinson, H., Kapushesky, M., Kolesnikov, N., Rustici, G., Shojatalab, M., Abeygunawardena, N., Berube, H., Dylag, M., Emam, I., Farne, A., et al.: ArrayExpress update–from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Research 37(database issue), D868 (2009)

    Article  Google Scholar 

  28. Pfeifer, B., Aschaber, J., Baumgartner, C., Modre, R., Dreiseitl, S., Schreier, G., Tilg, B.: A data warehouse for prostate cancer biomarker discovery. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, p. 9. Springer, Heidelberg (2007)

    Google Scholar 

  29. Pfeifer, B., Kugler, K., Tejada, M., Baumgartner, C., Seger, M., Osl, M., Netzer, M., Handler, M., Dander, A., Wurz, M., Graber, A., Tilg, B.: A Cellular Automaton Framework for Infectious Disease Spread Simulation. The Open Medical Informatics Journal 2, 58–69 (2008)

    Article  Google Scholar 

  30. Pfeifer, B., Tejada, M., Kugler, K., Osl, M., Netzer, M., Seger, M., Modre-Osprian, R., Schreier, G., Tilg, B.: A Biomedical Knowledge Discovery in Databases Design Tool - Turning Data into Information. In: eHealth (2008)

    Google Scholar 

  31. Quinlan, R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  32. Rakotomalala, R.: TANAGRA: un logiciel gratuit pour l’enseignement et la recherche. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds.) EGC 2005. LNCS, vol. 3470, pp. 697–702. Springer, Heidelberg (2005)

    Google Scholar 

  33. Williamson, D., Parker, R., Kendrick, J.: The Box Plot: A Simple Visual Method to Interpret Data. Annals of Internal Medicine 110(11), 916–921 (1989)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Dander, A., Handler, M., Netzer, M., Pfeifer, B., Seger, M., Baumgartner, C. (2011). [KD3] A Workflow-Based Application for Exploration of Biomedical Data Sets. In: Hameurlain, A., Küng, J., Wagner, R., Böhm, C., Eder, J., Plant, C. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems IV. Lecture Notes in Computer Science, vol 6990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23740-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23740-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23739-3

  • Online ISBN: 978-3-642-23740-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics