Abstract
Humans are very good at pattern recognition in dimensions of ≤ 3. However, most of data, e.g. in the biomedical domain, is in dimensions much higher than 3, which makes manual analyses awkward, sometimes practically impossible. Actually, mapping higher dimensional data into lower dimensions is a major task in Human–Computer Interaction and Interactive Data Visualization, and a concerted effort including recent advances in computational topology may contribute to make sense of such data. Topology has its roots in the works of Euler and Gauss, however, for a long time was part of theoretical mathematics. Within the last ten years computational topology rapidly gains much interest amongst computer scientists. Topology is basically the study of abstract shapes and spaces and mappings between them. It originated from the study of geometry and set theory. Topological methods can be applied to data represented by point clouds, that is, finite subsets of the n-dimensional Euclidean space. We can think of the input as a sample of some unknown space which one wishes to reconstruct and understand, and we must distinguish between the ambient (embedding) dimension n, and the intrinsic dimension of the data. Whilst n is usually high, the intrinsic dimension, being of primary interest, is typically small. Therefore, knowing the intrinsic dimensionality of data can be seen as one first step towards understanding its structure. Consequently, applying topological techniques to data mining and knowledge discovery is a hot and promising future research area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics state-of-the-art, future challenges and research directions. BMC Bioinformatics 15(suppl. 6), I1 (2014)
Edelsbrunner, H., Harer, J.L.: Computational Topology: An Introduction. American Mathematical Society, Providence (2010)
De Silva, V.: Geometry and topology of point cloud data sets: a statement of my research interests (2004), http://pomona.edu
Hatcher, A.: Algebraic Topology. Cambridge University Press, Cambridge (2002)
Edelsbrunner, H., Kirkpatrick, D., Seidel, R.: On the shape of a set of points in the plane. IEEE Transactions on Information Theory 29(4), 551–559 (1983)
Edelsbrunner, H., Mucke, E.P.: 3-dimensional alpha-shapes. ACM Transactions on Graphics 13(1), 43–72 (1994)
Albou, L.P., Schwarz, B., Poch, O., Wurtz, J.M., Moras, D.: Defining and characterizing protein surface using alpha shapes. Proteins-Structure Function and Bioinformatics 76(1), 1–12 (2009)
Frosini, P., Landi, C.: Persistent betti numbers for a noise tolerant shape-based approach to image retrieval. Pattern Recognition Letters 34(8), 863–872 (2013)
Goodman, J.E., O’Rourke, J.: Handbook of Discrete and Computational Geometry. Chapman and Hall/CRC, Boca Raton (2010)
Cignoni, P., Montani, C., Scopigno, R.: Dewall: A fast divide and conquer delaunay triangulation algorithm in ed. Computer-Aided Design 30(5), 333–341 (1998)
Bass, H.: Euler characteristics and characters of discrete groups. Inventiones Mathematicae 35(1), 155–196 (1976)
Whitehead, G.W.: Elements of homotopy theory. Springer (1978)
Alexandroff, P., Hopf, H.: Topologie I. Springer, Berlin (1935)
Munkres, J.R.: Elements of algebraic topology, vol. 2. Addison-Wesley, Reading (1984)
Edelsbrunner, H., Harer, J.: Persistent Homology - a Survey. Contemporary Mathematics Series, vol. 453, pp. 257–282. Amer Mathematical Soc., Providence (2008)
Doraiswamy, H., Natarajan, V.: Efficient algorithms for computing reeb graphs. Computational Geometry 42(67), 606–616 (2009)
Edelsbrunner, H., Harer, J., Mascarenhas, A., Pascucci, V., Snoeyink, J.: Time-varying reeb graphs for continuous space-time data. Computational Geometry-Theory and Applications 41(3), 149–166 (2008)
Biasotti, S., Giorgi, D., Spagnuolo, M., Falcidieno, B.: Reeb graphs for shape analysis and applications. Theoretical Computer Science 392(13), 5–22 (2008)
Euler, L.: Solutio problematis ad geometriam situs pertinentis. Commentarii Academiae Scientiarum Petropolitanae 8(1741), 128–140
Listing, J.B.: Vorstudien zur Topologie. Vandenhoeck und Ruprecht, Goettingen (1848)
Listing, J.B.: Der Census rauumlicher Complexe: oder Verallgemeinerung des euler’schen Satzes von den Polyedern, vol. 10. Dieterich, Goettingen (1862)
Moebius, A.F.: Theorie der elementaren verwandtschaft. Berichte der Saechsischen Akademie der Wissensschaften 15, 18–57 (1863)
Blackmore, D., Peters, T.J.: Computational topology, pp. 491–545. Elsevier, Amsterdam (2007)
Tourlakis, G., Mylopoulos, J.: Some results in computational topology. Journal of the ACM (JACM) 20(3), 439–455 (1973)
Bubenik, P., Kim, P.T.: A statistical approach to persistent homology. Homology, Homotopy and Applications 9(2), 337–362 (2007)
Burton, B.A.: Computational topology with Regina: Algorithms, heuristics and implementations, vol. 597, pp. 195–224. American Mathematical Society, Providence (2013)
Carlsson, G.: Topology and data. Bulletin of the American Mathematical Society 46(2), 255–308 (2009)
Dey, T.K., Edelsbrunner, H., Guha, S.: Computational topology. Contemporary Mathematics 223, 109–144 (1999)
Dunfield, N.M., Gukov, S., Rasmussen, J.: The superpolynomial for knot homologies. Experimental Mathematics 15(2), 129–159 (2006)
Cerri, A., Fabio, B.D., Ferri, M., Frosini, P., Landi, C.: Betti numbers in multidimensional persistent homology are stable functions. Mathematical Methods in the Applied Sciences 36(12), 1543–1557 (2013)
Ghrist, R.: Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society 45(1), 61–75 (2008)
Edelsbrunner, H., Morozov, D., Pascucci, V.: Persistence-sensitive simplification functions on 2-manifolds. In: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry, pp. 127–134. ACM (2006)
Kaczynski, T., Mischaikow, K., Mrozek, M.: Computational homology, vol. 157. Springer (2004)
Pascucci, V., Tricoche, X., Hagen, H., Tierny, J.: Topological Methods in Data Analysis and Visualization: Theory, Algorithms, and Applications (Mathematics+Visualization). Springer, Heidelberg (2011)
Robins, V., Abernethy, J., Rooney, N., Bradley, E.: Topology and intelligent data analysis. In: Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 111–122. Springer, Heidelberg (2003)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Zomorodian, A.: Topology for computing, vol. 16. Cambridge University Press, Cambridge (2005)
Holzinger, A., Malle, B., Bloice, M., Wiltgen, M., Ferri, M., Stanganelli, I., Hofmann-Wellenhof, R.: On the generation of point cloud data sets: the first step in the knowledge discovery process. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 57–80. Springer, Heidelberg (2014)
Holzinger, A., Stocker, C., Peischl, B., Simonic, K.M.: On using entropy for enhancing handwriting preprocessing. Entropy 14(11), 2324–2350 (2012)
Mémoli, F., Sapiro, G.: A theoretical and computational framework for isometry invariant recognition of point cloud data. Foundations of Computational Mathematics 5(3), 313–347 (2005)
Canutescu, A.A., Shelenkov, A.A., Dunbrack, R.L.: A graph-theory algorithm for rapid protein side-chain prediction. Protein Science 12(9), 2001–2014 (2003)
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 620 (1975)
Holzinger, A.: Biomedical Informatics: Computational Sciences meets Life Sciences. BoD, Norderstedt (2012)
Wagner, H., Dłotko, P., Mrozek, M.: Computational topology in text mining. In: Ferri, M., Frosini, P., Landi, C., Cerri, A., Di Fabio, B. (eds.) CTIC 2012. LNCS, vol. 7309, pp. 68–78. Springer, Heidelberg (2012)
Cannon, J.W.: The recognition problem: what is a topological manifold? Bulletin of the American Mathematical Society 84(5), 832–866 (1978)
Zomorodian, A.: Chapman & Hall/CRC Applied Algorithms and Data Structures series. In: Computational Topology, pp. 1–31. Chapman and Hall/CRC, Boca Raton (2010), doi:10.1201/9781584888215-c3.
Carlsson, G.: Topological pattern recognition for point cloud data (2013)
Epstein, C., Carlsson, G., Edelsbrunner, H.: Topological data analysis. Inverse Problems 27(12), 120201 (2011)
Aurenhammer, F.: Voronoi diagrams a survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR) 23(3), 345–405 (1991)
Epstein, C., Carlsson, G., Edelsbrunner, H.: Topological data analysis. Inverse Problems 27(12) (2011)
Zomorodian, A.: Topological Data Analysis, vol. 70, pp. 1–39 (2012)
Blumberg, A., Mandell, M.: Quantitative homotopy theory in topological data analysis. Foundations of Computational Mathematics 13(6), 885–911 (2013)
Tourlaki, G., Mylopoul, J.: Some results in computational topology. Journal of the ACM (JACM) 20(3), 439–455 (1973)
Kong, T.Y., Rosenfeld, A.: Digtial topology - introduction and survey. Computer Vision Graphics and Image Processing 48(3), 357–393 (1989)
Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical text mining: State-of-the-art, open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 271–300. Springer, Berlin (2014)
Washio, T., Motoda, H.: State of the art of graph-based data mining. ACM SIGKDD Explorations Newsletter 5(1), 59 (2003)
Jiang, C., Coenen, F., Sanderson, R., Zito, M.: Text classification using graph mining-based feature extraction. Knowledge-Based Systems 23(4), 302–308 (2010)
Melcuk, I.: Dependency Syntax: Theory and Practice. State University of New York Press (1988)
Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Int. Res. 1(1), 231–255 (1994)
Yoshida, K., Motoda, H., Indurkhya, N.: Graph-based induction as a unified learning framework. Applied Intelligence 4(3), 297–316 (1994)
Dehaspe, L., Toivonen, H.: Discovery of frequent DATALOG patterns. Data Mining and Knowledge Discovery 3(1), 7–36 (1999)
Fischer, I., Meinl, T.: Graph based molecular data mining – an overview. In: SMC, vol. 5, pp. 4578–4582. IEEE (2004)
Morales, L.P., Esteban, A.D., Gervás, P.: Concept-graph based biomedical automatic summarization using ontologies. In: Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing. TextGraphs-3, pp. 53–56. Association for Computational Linguistics, Stroudsburg (2008)
Yan, X., Mehan, M.R., Huang, Y., Waterman, M.S., Yu, P.S., Zhou, X.J.: A graph-based approach to systematically reconstruct human transcriptional regulatory modules. Bioinformatics 23(13), i577–i586 (2007)
Agirre, E., Soroa, A., Stevenson, M.: Graph-based word sense disambiguation of biomedical documents. Bioinformatics 26(22), 2889–2896 (2010)
Liu, H., Hunter, L., Keselj, V., Verspoor, K.: Approximate subgraph matching-based literature mining for biomedical events and relations. PLoS One 8(4) (April 2013)
Liu, H., Komandur, R., Verspoor, K.: From graphs to events: A subgraph matching approach for information extraction from biomedical text. In: Proceedings of BioNLP Shared Task 2011 Workshop, pp. 164–172. Association for Computational Linguistics (2011)
Nicolau, M., Levine, A.J., Carlsson, G.: Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences of the United States of America 108(17), 7265–7270 (2011)
Carlsson, G.: Topology and Data. Bull. Amer. Math. Soc. 46, 255–308 (2009)
Zhu, X.: Persistent homology: An introduction and a new text representation for natural language processing. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1953–1959. AAAI Press (2013)
Zhou, X., Han, H., Chankai, I., Prestrud, A., Brooks, A.: Approaches to text mining for clinical medical records. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC 2006, p. 235–239. ACM Press, New York (2006)
Corley, C.D., Cook, D.J., Mikler, A.R., Singh, K.P.: Text and structural data mining of influenza mentions in Web and social media. International Journal of Environmental Research and Public Health 7(2), 596–615 (2010)
Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics 5(1), 147 (2004)
Barabási, A., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12(1), 56–68 (2011)
Delfinado, C.J.A., Edelsbrunner, H.: An incremental algorithm for betti numbers of simplicial complexes on the 3-sphere. Computer Aided Geometric Design 12(7), 771–784 (1995)
Delfinado, C.J.A., Edelsbrunner, H.: An incremental algorithm for betti numbers of simplicial complexes. In: Proceedings of the Ninth Annual Symposium on Computational Geometry, pp. 232–239. ACM (1993)
Ellis, G.: Homological Algebra Programming. Contemporary Mathematics Series, vol. 470, pp. 63–74. Amer Mathematical Soc., Providence (2008)
Dumas, J.G., Gautier, T., Giesbrecht, M., Giorgi, P., Hovinen, B., Kaltofen, E., Saunders, B.D., Turner, W.J., Villard, G.: Linbox: A generic library for exact linear algebra. In: Cohen, A.M., Gao, X.S., Takayama, N. (eds.) 1st International Congress of Mathematical Software (ICMS 2002), pp. 40–50. World Scientific (2002)
Singh, G., Memoli, F., Carlsson, G.: Topological methods for the analysis of high dimensional data sets and 3d object recognition. In: Botsch, M., Pajarola, R. (eds.) Eurographics Symposium on Point-Based Graphics, vol. 22, pp. 91–100. Euro Graphics (2007)
Kobayashi, M.: Resources for studying statistical analysis of biomedical data and R. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 183–195. Springer, Heidelberg (2014)
Tausz, A., Vejdemo-Johansson, M., Adams, H.: Javaplex: A research software package for persistent (co) homology (2011), http://code.google.com/javaplex
Vegter, G.: Computational topology, pp. 517–536. CRC Press, Inc., Boca Raton (2004)
Volodin, I., Kuznetsov, V., Fomenko, A.T.: The problem of discriminating algorithmically the standard three-dimensional sphere. Russian Mathematical Surveys 29(5), 71 (1974)
Brehm, U., Khnel, W.: Combinatorial manifolds with few vertices. Topology 26(4), 465–473 (1987)
Sarkaria, K.S.: Heawood inequalities. Journal of Combinatorial Theory, Series A 46(1), 50–78 (1987)
Otasek, D., Pastrello, C., Holzinger, A., Jurisica, I.: Visual Data Mining: Effective Exploration ofthe Biological Universe. In: Holzinger, A., Jurisica, I. (eds.) Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 19–33. Springer, Heidelberg (2014)
Holzinger, A.: Human Computer Interaction & Knowledge Discovery (HCI-KDD): What is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)
Morozov, D., Weber, G.: Distributed merge trees. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, vol. 48, pp. 93–102 (August 2013)
Rieck, B., Mara, H., Leitte, H.: Multivariate data analysis using persistence-based filtering and topological signatures. IEEE Transactions on Visualization and Computer Graphics 18(12), 2382–2391 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Holzinger, A. (2014). On Topological Data Mining. In: Holzinger, A., Jurisica, I. (eds) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. Lecture Notes in Computer Science, vol 8401. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43968-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-662-43968-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43967-8
Online ISBN: 978-3-662-43968-5
eBook Packages: Computer ScienceComputer Science (R0)