ABSTRACT
In the age of network sciences and machine learning, efficient algorithms are now in higher demand more than ever before. Big Data fundamentally challenges the classical notion of efficient algorithms: Algorithms that used to be considered efficient, according to polynomial-time characterization, may no longer be adequate for solving today»s problems. It is not just desirable, but essential, that efficient algorithms should be scalable. In other words, their complexity should be nearly linear or sub-linear with respect to the problem size. Thus, scalability, not just polynomial-time computability, should be elevated as the central complexity notion for characterizing efficient computation. In this talk, I will highlight a family of fundamental algorithmic techniques for designing provably-good scalable algorithms: (1) scalable primitives and scalable reduction, (2) spectral approximation of graphs and matrices, (3) sparsification by multilevel structures, (4) advanced sampling, (5) local network exploration. For the first, I will focus on the emerging Laplacian Paradigm, that has led to breakthroughs in scalable algorithms for several fundamental problems in network analysis, machine learning, and scientific computing. I will then illustrate these algorithmic techniques with four recent applications: (1) sampling from graphic models, (2) network centrality approximation, (3) social-influence analysis (4) local clustering. Mathematical and algorithmic solution to these problems exemplify the fusion of combinatorial, numerical, and statistical thinking in data and network analysis.
- Stephen A. Cook . 1971. The Complexity of Theorem-proving Procedures. In the Third Annual ACM Symposium on Theory of Computing. ACM, 151--158. Google ScholarDigital Library
- Jack Edmonds . 1965. Maximum Matching and a Polyhedron with $0,1$ Vertices. Journal of Research at the National Bureau of Standards Vol. 69 B (1965), 125--130.Google ScholarCross Ref
- J. Hartmanis and R. Stearns . 1965. On the Computational Complexity of Algorithms. Trans. Amer. Math. Soc. Vol. 117 (1965), 285 -- 306.Google ScholarCross Ref
- R. Karp . 1972. Reducibility among combinatorial problems. Complexity of Computer Computations, bibfieldeditorR. Miller and J. Thatcher (Eds.). Plenum Press, 85--103.Google Scholar
- L. Levin . 1973. Universal sorting problems. Problems of Information Transmission Vol. 9 (1973), 265 -- 266.Google Scholar
- Daniel A. Spielman and Shang-Hua Teng . 2014. Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems. SIAM J. Matrix Anal. Appl. Vol. 35, 3 (2014), 835 -- 885.Google ScholarDigital Library
- Shang-Hua Teng . 2016. Scalable Algorithms for Data and Network Analysis. Foundations and Trends in Theoretical Computer Science, Vol. 12, 1--2 (2016), 1--261. Google ScholarDigital Library
Index Terms
- Scalable Algorithms in the Age of Big Data and Network Sciences: Characterization, Primitives, and Techniques
Recommendations
Scalable Algorithms for Data and Network Analysis
In the age of Big Data, efficient algorithms are now in higher demand more than ever before. While Big Data takes us into the asymptotic world envisioned by our pioneers, it also challenges the classical notion of efficient algorithms: Algorithms that ...
Advanced scalable algorithms for advanced architectures
CompSysTech '09: Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in ComputingWith the latest advances in the area of advanced computer architectures we are seeing already large scale machines at petascale level and we are discussing exascale computing. All these require efficient scalable algorithms in order to bridge the ...
Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing
The growing gap between users and the Big Data analytics requires innovative tools that address the challenges faced by big data volume, variety, and velocity. Therefore, it becomes computationally inefficient to analyze such massive volume of data. ...
Comments