Elsevier

Future Generation Computer Systems

Volume 67, February 2017, Pages 286-296
Future Generation Computer Systems

Geometrical and topological approaches to Big Data

https://doi.org/10.1016/j.future.2016.06.005Get rights and content
Under a Creative Commons license
open access

Highlights

  • An overview of state-of-the-art in geometrical and topological approach to Big Data.

  • Trends in geometrical and topological approach to Big Data.

  • Big Data visualization.

  • Discussion of current techniques and future trends to address the applications.

Abstract

Modern data science uses topological methods to find the structural features of data sets before further supervised or unsupervised analysis. Geometry and topology are very natural tools for analysing massive amounts of data since geometry can be regarded as the study of distance functions. Mathematical formalism, which has been developed for incorporating geometric and topological techniques, deals with point cloud data sets, i.e. finite sets of points. It then adapts tools from the various branches of geometry and topology for the study of point cloud data sets. The point clouds are finite samples taken from a geometric object, perhaps with noise. Topology provides a formal language for qualitative mathematics, whereas geometry is mainly quantitative. Thus, in topology, we study the relationships of proximity or nearness, without using distances. A map between topological spaces is called continuous if it preserves the nearness structures. Geometrical and topological methods are tools allowing us to analyse highly complex data. These methods create a summary or compressed representation of all of the data features to help to rapidly uncover particular patterns and relationships in data. The idea of constructing summaries of entire domains of attributes involves understanding the relationship between topological and geometric objects constructed from data using various features.

A common thread in various approaches for noise removal, model reduction, feasibility reconstruction, and blind source separation, is to replace the original data with a lower dimensional approximate representation obtained via a matrix or multi-directional array factorization or decomposition. Besides those transformations, a significant challenge of feature summarization or subset selection methods for Big Data will be considered by focusing on scalable feature selection. Lower dimensional approximate representation is used for Big Data visualization.

The cross-field between topology and Big Data will bring huge opportunities, as well as challenges, to Big Data communities. This survey aims at bringing together state-of-the-art research results on geometrical and topological methods for Big Data.

Keywords

Big Data
Industry 4.0
Topological data analysis
Persistent homology
Dimensionality reduction
Big Data visualization

Cited by (0)

Václav Snášel studied numerical mathematics at Palacky University in Olomouc, Ph.D. degree obtained at Masaryk University in Brno; he is a professor at VŠB—Technical University of Ostrava. In 2001–2009, he worked as a researcher at The Institute of Computer Science of Academy of Sciences of the Czech Republic. Since 2009, he works as a head of research programme at IT4Innovation National Supercomputing Center; currently, he is Dean Faculty of Electrical Engineering and Computer Science. He works in a multi-disciplinary environment involving artificial intelligence, bioinformatics, Big Data, knowledge management, machine intelligence, neural network nature and biologically inspired computing, data mining, and applied to various real world problems.

Jana Nowaková received her M.Sc. in measurement and control from VŠB—Technical University of Ostrava, Faculty of Electrical Engineering and Computer Science in 2012. Presently, she continues her studies in technical cybernetics. In addition to fuzzy modelling, data processing, knowledge management, and bio-inspired computing, she is interested in statistical data processing in cooperation with University Hospital Ostrava. She works as a researcher in Faculty of Electrical Engineering and Computer Science, VŠB—Technical University of Ostrava and in IT4Innovation National Supercomputing Center.

Fatos Xhafa received his Ph.D. in computer science in 1998 from the Department of Computer Science of the Technical University of Catalonia (UPC), Spain. Currently, he holds a permanent position of Professor Titular (Hab. Full Professor) at UPC. He was a visiting professor at University of London, UK, 2009–2010 and a research associate at Drexel University, USA, 2004/2005. Prof. Xhafa has published in international journals, conferences/workshops, chapters, books and proceedings. He is Editor-in-Chief of IJGUC and IJSSC, Inderscience and the Elsevier Book Series “Intelligent Data-Centric Systems”. His research interests include parallel and distributed algorithms, massive data processing and collective intelligence, optimization, networking, P2P, Cloud computing, security and trustworthy computing, among others.

Leonard Barolli received his B.E. and Ph.D. from Tirana University, Albania and Yamagata University, Japan in 1989 and 1997, respectively. He has been working as a JSPS Post Doctor Fellow Researcher and Research Associate at Yamagata University, Assistant Professor at Saitama Institute of Technology (SIT) and Associate Professor at Fukuoka Institute of Technology (FIT), Japan. He is currently a full professor at Department of Information and Communication Engineering, FIT. He has published more than 600 papers in refereed journals and international conference proceedings. He is the Steering Committee Co-Chair of IEEE AINA, BWCCA, 3PGCIC, NBiS, INCoS, CISIS, and IMIS international conferences. His research interests include network traffic control, network protocols fuzzy control, genetic algorithms, ad hoc and sensor networks, IoT, big data, web-based applications and P2P systems. He is a member of SOFT, IPSJ, and IEEE.