Skip to main content

Extravaganza Tutorial on Hot Ideas for Interactive Knowledge Discovery and Data Mining in Biomedical Informatics

  • Conference paper
Book cover Brain Informatics and Health (BIH 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8609))

Included in the following conference series:

Abstract

Biomedical experts are confronted with ”Big data”, driven by the trend towards precision medicine. Despite the fact that humans are excellent at pattern recognition in dimensions of ≤ 3, most biomedical data is in dimensions much higher than 3, making manual analysis often impossible. Experts in daily routine are decreasingly capable of dealing with such data. Efficient, useable and useful computational methods, algorithms and tools to interactively gain insight into such data are a commandment of the time. A synergistic combination of methodologies of two areas may be of great help here: Human–Computer Interaction (HCI) and Knowledge Discovery/Data Mining (KDD), with the goal of supporting human intelligence with machine learning. Mapping higher dimensional data into lower dimensions is a major task in HCI, and a concerted effort including recent advances from graph-theory and algebraic topology may contribute to finding solutions. Moreover, much biomedical data is sparse, noisy and time-dependent, hence entropy is also amongst promising topics. This tutorial gives an overview of the HCI-KDD approach and focuses on 3 topics: graphs, topology and entropy. The goal of this intro tutorial is to motivate and stimulate further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics - state-of-the-art, future challenges and research directions. BMC Bioinformatics 15, I1 (2014)

    Google Scholar 

  2. Holzinger, A.: Biomedical Informatics: Discovering Knowledge in Big Data. Springer, New York (2014)

    Book  Google Scholar 

  3. Wu, X.D., Zhu, X.Q., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Transactions on Knowledge and Data Engineering 26, 97–107 (2014)

    Article  Google Scholar 

  4. Huppertz, B., Holzinger, A.: Biobanks – A source of large biological data sets: Open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 317–330. Springer, Heidelberg (2014)

    Google Scholar 

  5. Mattmann, C.A.: Computing: A vision for data science. Nature 493, 473–475 (2013)

    Article  Google Scholar 

  6. Otasek, D., Pastrello, C., Holzinger, A., Jurisica, I.: Visual data mining: Effective exploration of the biological universe. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 19–33. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  7. Hatcher, A.: Algebraic Topology. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  8. Edelsbrunner, H., Kirkpatrick, D., Seidel, R.: On the shape of a set of points in the plane. IEEE Transactions on Information Theory 29, 551–559 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  9. Edelsbrunner, H., Mucke, E.P.: 3-dimensional alpha-shapes. ACM Transactions on Graphics 13, 43–72 (1994)

    Article  MATH  Google Scholar 

  10. Albou, L.P., Schwarz, B., Poch, O., Wurtz, J.M., Moras, D.: Defining and characterizing protein surface using alpha shapes. Proteins-Structure Function and Bioinformatics 76, 1–12 (2009)

    Article  Google Scholar 

  11. Frosini, P., Landi, C.: Persistent betti numbers for a noise tolerant shape-based approach to image retrieval. Pattern Recognition Letters 34, 863–872 (2013)

    Article  Google Scholar 

  12. Cook, D., Holder, L.B.: Mining Graph Data. Wiley Interscience (2007)

    Google Scholar 

  13. Chakrabarti, D., Faloutsos, C.: Graph mining: Laws, generators, and algorithms. ACM Computing Surveys (CSUR) 38, 2 (2006)

    Article  Google Scholar 

  14. Whitehead, G.W.: Elements of homotopy theory. Springer (1978)

    Google Scholar 

  15. Munkres, J.R.: Elements of algebraic topology, vol. 2. Addison-Wesley Reading (1984)

    Google Scholar 

  16. Dorogovtsev, S., Mendes, J.: Evolution of networks: From biological nets to the Internet and WWW. Oxford University Press (2003)

    Google Scholar 

  17. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, vol. 2. Wiley, New York (2000)

    Google Scholar 

  18. Cook, D.J., Holder, L.B.: Graph-based data mining. IEEE Intelligent Systems and their Applications 15, 32–41 (2000)

    Article  Google Scholar 

  19. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)

    Google Scholar 

  20. Edelsbrunner, H., Harer, J.: Persistent homology - a survey. Contemporary Mathematics Series, vol. 453, pp. 257–282. Amer Mathematical Soc., Providence (2008)

    Google Scholar 

  21. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)

    Article  Google Scholar 

  22. Emmert-Streib, F., Dehmer, M.: Networks for systems biology: Conceptual connection of data and function. IET Systems Biology 5, 185–207 (2011)

    Article  Google Scholar 

  23. Koslicki, D.: Topological entropy of dna sequences. Bioinformatics 27, 1061–1067 (2011)

    Article  Google Scholar 

  24. Ghrist, R.: Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society 45, 61–75 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  25. Holzinger, A.: Human-computer interaction and knowledge discovery (HCI-KDD): What is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  26. Holzinger, A.: On knowledge discovery and interactive intelligent visualization of biomedical data - challenges in human–computer interaction and biomedical informatics. In: DATA 2012, Rome, Italy, pp. 9–20 (2012)

    Google Scholar 

  27. Holzinger, A., Jurisica, I.: Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 1–18. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  28. Holzinger, A., Bruschi, M., Eder, W.: On interactive data visualization of physiological low-cost-sensor data with focus on mental stress. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 469–480. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  29. Wong, B.L.W., Xu, K., Holzinger, A.: Interactive visualization for information analysis in medical diagnosis. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 109–120. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  30. Wiltgen, M., Holzinger, A., Tilz, G.P.: Interactive analysis and visualization of macromolecular interfaces between proteins. In: Holzinger, A. (ed.) USAB 2007. LNCS, vol. 4799, pp. 199–212. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  31. Preuss, M., Dehmer, M., Pickl, S., Holzinger, A.: On terrain coverage optimization by using a network approach for universal graph-based data mining and knowledge discovery. In: Proceedings of the Active Media Technology - 10th International Conference, AMT 2014, Warsaw, Poland, August 11-14. LNCS, vol. 8610, Springer, Heidelberg (2014)

    Google Scholar 

  32. Holzinger, A., Ofner, B., Dehmer, M.: Multi-touch graph-based interaction for knowledge discovery on mobile devices: State-of-the-art and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 241–254. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  33. Holzinger, A., Malle, B., Aigner, R., Giuliani, N.: On graph extraction from image data. In: Slezak, D., Schaefer, G., Vuong, T.S., Kim, Y.S. (eds.) Active Media Technology AMT 2014. LNCS, vol. 8610, Springer, Heidelberg (2014)

    Google Scholar 

  34. Holzinger, A., Ofner, B., Stocker, C., Calero Valdez, A., Schaar, A.K., Ziefle, M., Dehmer, M.: On graph entropy measures for knowledge discovery from publication network data. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 354–362. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  35. Holzinger, A., Hörtenhuber, M., Mayer, C., Bachler, M., Wassertheurer, S., Pinho, A.J., Koslicki, D.: On entropy-based data mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 209–226. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  36. Holzinger, A., Stocker, C., Bruschi, M., Auinger, A., Silva, H., Gamboa, H., Fred, A.: On applying approximate entropy to ECG signals for knowledge discovery on the example of big sensor data. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds.) AMT 2012. LNCS, vol. 7669, pp. 646–657. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  37. Holzinger, A.: On topological data mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 331–356. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  38. Kieseberg, P., Hobel, H., Schrittwieser, S., Weippl, E., Holzinger, A.: Protecting anonymity in data-driven biomedical science. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 301–316. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  39. Harary, F.: Structural models. An introduction to the theory of directed graphs. Wiley (1965)

    Google Scholar 

  40. Strogatz, S.: Exploring complex networks. Nature 410, 268–276 (2001)

    Article  Google Scholar 

  41. Dehmer, M., Mowshowitz, A.: A history of graph entropy measures. Information Sciences 181, 57–78 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  42. Barabasi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)

    Article  MathSciNet  Google Scholar 

  43. Kleinberg, J.: Navigation in a small world. Nature 406, 845–845 (2000)

    Article  Google Scholar 

  44. Koontz, W., Narendra, P., Fukunaga, K.: A graph-theoretic approach to nonparametric cluster analysis. IEEE Transactions on Computers 100, 936–944 (1976)

    Article  MathSciNet  Google Scholar 

  45. Wittkop, T., Emig, D., Truss, A., Albrecht, M., Boecker, S., Baumbach, J.: Comprehensive cluster analysis with transitivity clustering. Nature Protocols 6, 285–295 (2011)

    Article  Google Scholar 

  46. Holzinger, A., Malle, B., Bloice, M., Wiltgen, M., Ferri, M., Stanganelli, I., Hofmann-Wellenhof, R.: On the generation of point cloud data sets: Step one in the knowledge discovery process. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 57–80. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  47. Canutescu, A.A., Shelenkov, A.A., Dunbrack, R.L.: A graph-theory algorithm for rapid protein side-chain prediction. Protein science 12, 2001–2014 (2003)

    Article  Google Scholar 

  48. Jiang, C., Coenen, F., Sanderson, R., Zito, M.: Text classification using graph mining-based feature extraction. Knowledge-Based Systems 23, 302–308 (2010)

    Article  Google Scholar 

  49. Washio, T., Motoda, H.: State of the art of graph-based data mining. ACM SIGKDD Explorations Newsletter 5, 59 (2003)

    Google Scholar 

  50. Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Int. Res. 1, 231–255 (1994)

    Google Scholar 

  51. Yoshida, K., Motoda, H., Indurkhya, N.: Graph-based induction as a unified learning framework. Applied Intelligence 4, 297–316 (1994)

    Article  Google Scholar 

  52. Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3, 7–36 (1999)

    Article  Google Scholar 

  53. Windridge, D., Bober, M.: A kernel-based framework for medical big-data analytics. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 197–208. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  54. Zhou, X., Han, H., Chankai, I., Prestrud, A., Brooks, A.: Approaches to text mining for clinical medical records. In: Proceedings of the 2006 ACM symposium on Applied computing - SAC 2006, p. 235. ACM Press, New York (2006)

    Google Scholar 

  55. Corley, C.D., Cook, D.J., Mikler, A.R., Singh, K.P.: Text and structural data mining of influenza mentions in Web and social media. International journal of environmental research and public health 7, 596–615 (2010)

    Article  Google Scholar 

  56. Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC bioinformatics 5, 147 (2004)

    Article  Google Scholar 

  57. Barabási, A., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12, 56–68 (2011)

    Article  Google Scholar 

  58. Cannon, J.W.: The recognition problem: what is a topological manifold? Bulletin of the American Mathematical Society 84, 832–866 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  59. Zomorodian, A.: Chapman & Hall/CRC Applied Algorithms and Data Structures series. In: Computational Topology, pp. 1–31. Chapman and Hall, Boca Raton (2010), doi:10.1201/9781584888215-c3.

    Google Scholar 

  60. Epstein, C., Carlsson, G., Edelsbrunner, H.: Topological data analysis. Inverse Problems 27, 120201 (2011)

    Article  Google Scholar 

  61. Wagner, H., Dlotko, P.: Towards topological analysis of high-dimensional feature spaces. Computer Vision and Image Understanding 121, 21–26 (2014)

    Article  Google Scholar 

  62. Kobayashi, M., Aono, M.: Vector space models for search and cluster mining. In: Berry, M.W. (ed.) Survey of Text Mining: Clustering, Classification, and Retrieval, pp. 103–122. Springer, New York (2004)

    Chapter  Google Scholar 

  63. Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical text mining: State-of-the-art, open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 271–300. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  64. Wagner, H., Dlotko, P., Mrozek, M.: Computational topology in text mining. In: Ferri, M., Frosini, P., Landi, C., Cerri, A., Di Fabio, B. (eds.) CTIC 2012. LNCS, vol. 7309, pp. 68–78. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  65. Nicolau, M., Levine, A.J., Carlsson, G.: Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences of the United States of America 108, 7265–7270 (2011)

    Article  Google Scholar 

  66. Carlsson, G.: Topology and Data. Bull. Amer. Math. Soc. 46, 255–308 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  67. Zhu, X.: Persistent homology: An introduction and a new text representation for natural language processing. In: Rossi, F. (ed.) IJCAI. IJCAI/AAAI (2013)

    Google Scholar 

  68. Cerri, A., Fabio, B.D., Ferri, M., Frosini, P., Landi, C.: Betti numbers in multidimensional persistent homology are stable functions. Mathematical Methods in the Applied Sciences 36, 1543–1557 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  69. Bubenik, P., Kim, P.T.: A statistical approach to persistent homology. Homology, Homotopy and Applications 9, 337–362 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  70. Mowshowitz, A.: Entropy and the complexity of graphs: I. an index of the relative complexity of a graph. The Bulletin of Mathematical Biophysics 30, 175–204 (1968)

    Article  MATH  MathSciNet  Google Scholar 

  71. Körner, J.: Coding of an information source having ambiguous alphabet and the entropy of graphs. In: 6th Prague Conference on Information Theory, pp. 411–425 (1973)

    Google Scholar 

  72. Holzinger, A., Ofner, B., Stocker, C., Calero Valdez, A., Schaar, A.K., Ziefle, M., Dehmer, M.: On graph entropy measures for knowledge discovery from publication network data. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 354–362. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  73. Adler, R.L., Konheim, A.G., McAndrew, M.H.: Topological entropy. Transactions of the American Mathematical Society 114, 309–319 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  74. Adler, R., Downarowicz, T., Misiurewicz, M.: Topological entropy. Scholarpedia 3, 2200 (2008)

    Article  Google Scholar 

  75. Hornero, R., Aboy, M., Abasolo, D., McNames, J., Wakeland, W., Goldstein, B.: Complex analysis of intracranial hypertension using approximate entropy. Crit. Care Med. 34, 87–95 (2006)

    Article  Google Scholar 

  76. Pincus, S.M.: Approximate entropy as a measure of system complexity. Proceedings of the National Academy of Sciences 88, 2297–2301 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  77. Holzinger, A., Stocker, C., Peischl, B., Simonic, K.M.: On using entropy for enhancing handwriting preprocessing. Entropy 14, 2324–2350 (2012)

    Article  Google Scholar 

  78. Holzinger, K., Palade, V., Rabadan, R., Holzinger, A.: Darwin or lamarck? Future challenges in evolutionary algorithms for knowledge discovery and data mining. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 35–56. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  79. Holzinger, A., Jurisica, I.: Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 1–18. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Holzinger, A. (2014). Extravaganza Tutorial on Hot Ideas for Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. In: Ślȩzak, D., Tan, AH., Peters, J.F., Schwabe, L. (eds) Brain Informatics and Health. BIH 2014. Lecture Notes in Computer Science(), vol 8609. Springer, Cham. https://doi.org/10.1007/978-3-319-09891-3_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09891-3_46

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09890-6

  • Online ISBN: 978-3-319-09891-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics