Compressed linear algebra for large-scale machine learning

Elgohary, Ahmed; Boehm, Matthias; Haas, Peter J.; Reiss, Frederick R.; Reinwald, Berthold

doi:10.1007/s00778-017-0478-1

Compressed linear algebra for large-scale machine learning

Special Issue Paper
Published: 12 September 2017

Volume 27, pages 719–744, (2018)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Ahmed Elgohary²,
Matthias Boehm ORCID: orcid.org/0000-0003-1344-3663¹,
Peter J. Haas¹,
Frederick R. Reiss¹ &
…
Berthold Reinwald¹

1163 Accesses
12 Citations
6 Altmetric
Explore all metrics

Abstract

Large-scale machine learning algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable fast matrix-vector operations on in-memory data. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Therefore, we initiate work—inspired by database compression and sparse matrix formats—on value-based compressed linear algebra (CLA), in which heterogeneous, lightweight database compression techniques are applied to matrices, and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representation. We contribute effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Our experiments show that CLA achieves in-memory operations performance close to the uncompressed case and good compression ratios, which enables fitting substantially larger datasets into available memory. We thereby obtain significant end-to-end performance improvements up to \(9.2\mathrm{x}\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

Article Open access 19 April 2023

Big data analytics on Apache Spark

Article 13 October 2016

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Notes

Dummy coding transforms a categorical feature having d possible values into d Boolean features, each indicating the rows in which a given value occurs. The larger the value of d, the greater the sparsity (from adding \(d-1\) zeros per row).
The results with native BLAS libraries would be similar because memory bandwidth and I/O are the bottlenecks.
For consistency with previously published results [32], we use Snappy, which was the default codec in Spark 1.x. However, we also include LZ4, which is the default in Spark 2.x.
For Mnist with its original 10 classes, we created the labels with \(\mathbf {y} \leftarrow (\mathbf {y}==7)\) (i.e., class 7 against the rest), whereas for ImageNet with its 1000 classes, we created the labels with \(\mathbf {y}\leftarrow (\mathbf {y}_0 > (\max (\mathbf {y}_0) - (\max (\mathbf {y}_0)-\min (\mathbf {y}_0))/2))\), where we derived \(\mathbf {y}_0 = \mathbf {X}\mathbf {w}\) from the data \(\mathbf {X}\) and a random model \(\mathbf {w}\).
We enabled code generation for cell-wise operations only because SystemML 0.14 does not yet support operator fusion, i.e., code generation, for compressed matrices.

References

Abadi, D.J., et al.: Integrating compression and execution in column-oriented database systems. In: SIGMOD (2006)
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. In: CoRR (2016)
Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: DCC (2001)
Alexandrov, A., et al.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)
Article Google Scholar
American Statistical Association (ASA). Airline on-time performance dataset. http://stat-computing.org/dataexpo/2009/the-data.html
Ashari, A., et al.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: ICS (2014)
Ashari, A., et al.: On optimizing machine learning workloads via kernel fusion. In: PPoPP (2015)
Bandyopadhyay, B., et al.: Topological graph sketching for incremental and scalable analytics. In: CIKM (2016)
Bassiouni, M.A.: Data compression in scientific and statistical databases. Trans. Softw. Eng. (TSE) 11(10), 1047–1058 (1985)
Article Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC (2009)
Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: SciPy (2010)
Beyer, K.S., et al.: On synopses for distinct-value estimation under multiset operations. In: SIGMOD (2007)
Bhattacharjee, B., et al.: Efficient index compression in DB2 LUW. PVLDB 2(2), 1462–1473 (2009)
Google Scholar
Bhattacherjee, S., et al.: PStore: an efficient storage framework for managing scientific data. In: SSDBM (2014)
Binnig, C., et al.: Dictionary-based order-preserving string compression for main memory column stores. In: SIGMOD (2009)
Boehm, M., et al.: SystemML: declarative machine learning on spark. PVLDB 9(13), 1425–1436 (2016)
Google Scholar
Boehm, M., et al.: Declarative machine learning—a classification of basic properties and types. In: CoRR (2016)
Bolosky, W.J., Scott, M.L.: False sharing and its effect on shared memory performance. In: SEDMS (1993)
Bottou, L.: The infinite MNIST dataset. http://leon.bottou.org/projects/infimnist
Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: WSDM (2008)
Charikar, M., et al.: Towards estimation error guarantees for distinct values. In: SIGMOD (2000)
Chen, L., et al.: Towards linear algebra over normalized data. PVLDB 10(11), 1214–1225 (2017)
Google Scholar
Chitta, R., et al.: Approximate kernel k-means: solution to large scale kernel clustering. In: KDD (2011)
Cohen, J., et al.: MAD skills: new analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)
Google Scholar
Constantinescu, C., Lu, M.: Quick estimation of data compression and de-duplication for large storage systems. In: CCP (2011)
Cormack, G.V.: Data compression on a database system. Commun. ACM 28(12), 1336–1342 (1985)
Article Google Scholar
Damme, P., et al.: Lightweight data compression algorithms: an experimental survey. In: EDBT (2017)
Das, S., et al.: Ricardo: integrating R and hadoop. In: SIGMOD (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI (2004)
Elgamal, T., et al.: sPCA: scalable principal component analysis for big data on distributed platforms. In: SIGMOD (2015)
Elgamal, T., et al.: SPOOF: sum-product optimization and operator fusion for large-scale machine learning. In: CIDR (2017)
Elgohary, A., et al.: Compressed linear algebra for large-scale machine learning. PVLDB 9(12), 960–971 (2016)
Google Scholar
Fan, W., et al.: Query preserving graph compression. In: SIGMOD (2012)
Ghoting, A., et al.: SystemML: declarative machine learning on MapReduce. In: ICDE (2011)
Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264 (1953)
Article MathSciNet Google Scholar
Graefe, G., Shapiro, L.D.: Data compression and database performance. In: Applied Computing (1991)
Haas, P.J., Stokes, L.: Estimating the number of classes in a finite population. J. Am. Stat. Assoc. 93(444), 1475–1487 (1998)
Article MathSciNet Google Scholar
Halko, N., et al.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Article MathSciNet Google Scholar
Harnik, D., et al.: Estimation of deduplication ratios in large data sets. In: MSST (2012)
Harnik, D., et al.: To zip or not to zip: effective resource usage for real-time compression. In: FAST (2013)
Huang,B., et al.: Cumulon: optimizing statistical data analysis in the cloud. In: SIGMOD (2013)
Huang,B., et al.: Resource elasticity for large-scale machine learning. In: SIGMOD (2015)
Idreos, S., et al.: Estimating the compression fraction of an index using sampling. In: ICDE (2010)
Intel. MKL: Math Kernel Library. https://software.intel.com/en-us/intel-mkl/
Johnson, D.S., et al.: Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J. Comput. 3(4), 299–325 (1974)
Article MathSciNet Google Scholar
Johnson, N.L., et al.: Univariate Discrete Distributions, 2nd edn. Wiley, New York (1992)
MATH Google Scholar
Kang, D., et al.: NoScope: Optimizing deep CNN-based queries over video streams at scale. PVLDB 10(11), 1586–1597 (2017)
Google Scholar
Karakasis, V., et al.: An extended compression format for the optimization of sparse matrix-vector multiplication. Trans. Parallel Distrib. Syst. (TPDS) 24(10), 1930–1940 (2013)
Article Google Scholar
Kernert, D., et al.: SLACID—sparse linear algebra in a column-oriented in-memory database system. In: SSDBM (2014)
Kim, M.: TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations. Ph.D. Thesis, ASU (2014)
Kimura, H., et al.: Compression aware physical database design. PVLDB 4(10), 657–668 (2011)
Google Scholar
Kourtis, K., et al.: Optimizing sparse matrix-vector multiplication using index and value compression. In: CF (2008)
Kumar, A., et al.: Demonstration of Santoku: optimizing machine learning over normalized data. PVLDB 8(12), 1864–1867 (2015)
Google Scholar
Kumar, A., et al.: Learning generalized linear models over normalized data. In: SIGMOD (2015)
Lang, H., et al.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD (2016)
Larson, P., et al.: SQL server column store indexes. In: SIGMOD (2011)
Lecun, Y.: Deep learning. Nature 521, 436–444 (2015)
Article MathSciNet Google Scholar
Li, F., et al.: When Lempel–Ziv–Welch meets machine learning: a case study of accelerating machine learning using coding. In: CoRR (2017)
Lichman, M.: UCI machine learning repository: higgs, covertype, US Census (1990). https://archive.ics.uci.edu/ml/
Luo, S., et al.: Scalable linear algebra on a relational database system. In: ICDE (2017)
Maccioni, A., Abadi, D.J.: Scalable pattern matching over compressed graphs via dedensification. In: KDD (2016)
Maneth, S., Peternek, F.: A survey on methods and systems for graph compression. In: CoRR (2015)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)
Book Google Scholar
NVIDIA. cuSPARSE: CUDA Sparse Matrix Library. https://docs.nvidia.com/cuda/cusparse/
Olteanu, D., Schleich, M.: F: Regression models over factorized views. PVLDB 9(13), 1573–1576 (2016)
Google Scholar
O’Neil, P.E.: Model 204 architecture and performance. In: High Performance Transaction Systems (1989)
Or, A., Rosen, J.: Unified memory management in spark 1.6, SPARK-10000 design document (2015)
Oracle. Data Warehousing Guide, 11g Release 1 (2007)
Papadopoulos, S., et al.: The TileDB array data storage manager. PVLDB 10(4), 349–360 (2016)
Google Scholar
Qin, C., Rusu,F.: Speculative approximations for terascale analytics. In: CoRR (2015)
Raman, V., Swart, G.: How to wring a table dry: entropy compression of relations and querying of compressed relations. In: VLDB (2006)
Raman, V., et al.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)
Google Scholar
Raskhodnikova, S., et al.: Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM J. Comput. 39(3), 813–842 (2009)
Article MathSciNet Google Scholar
Rendle, S.: Scaling factorization machines to relational data. PVLDB 6(5), 337–348 (2013)
Google Scholar
Rohrmann, T., et al.: Gilbert: declarative sparse linear algebra on massively parallel dataflow systems. In: BTW (2017)
Saad, Y: SPARSKIT: a basic tool kit for sparse matrix computations—Version 2 (1994)
Satuluri, V., et al.: Local graph sparsification for scalable clustering. In: SIGMOD (2011)
Schelter, S., et al.: Samsara: declarative machine learning on distributed dataflow systems. In: NIPS Workshop MLSystems (2016)
Schlegel, B., et al.: Memory-efficient frequent-itemset mining. In: EDBT (2011)
Schleich, M., et al.: Learning linear regression models over factorized joins. In: SIGMOD (2016)
Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: VLDB (2005)
Stonebraker, M., et al.: The Architecture of SciDB. In: SSDBM (2011)
Sysbase. IQ 15.4 System Administration Guide (2013)
Tabei, Y., et al.: Scalable partial least squares regression on grammar-compressed data matrices. In: KDD (2016)
Tepper, M., Sapiro, G.: Compressed nonnegative matrix factorization is fast and accurate. IEEE Trans. Signal Process. 64(9), 2269–2283 (2016)
Article MathSciNet Google Scholar
Tian, Y., et al.: Scalable and numerically stable descriptive statistics in SystemML. In: ICDE (2012)
Valiant, G., Valiant, P.: Estimating the unseen: an n/log(n)-sample estimator for entropy and support size. In: STOC, Shown Optimal via New CLTs (2011)
Wang, W., et al.: Database meets deep learning: challenges and opportunities. SIGMOD Rec. 45(2), 17–22 (2016)
Article Google Scholar
Westmann, T., et al.: The implementation and performance of compressed databases. SIGMOD Rec. 29(3), 55–67 (2000)
Article Google Scholar
Willhalm, T., et al.: SIMD-Scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2(1), 385–394 (2009)
Google Scholar
Williams, S., et al.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: SC (2007)
Wu, K., et al.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)
Article Google Scholar
Yu, L., et al.: Exploiting matrix dependency for efficient distributed matrix computation. In: SIGMOD (2015)
Zadeh, R. B., et al.: Matrix computations and optimization in apache spark. In: KDD (2016)
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012)
Zhang, C., et al.: Materialization optimizations for feature selection workloads. In: SIGMOD (2014)
Zukowski, M., et al.: Super-scalar RAM-CPU cache compression. In: ICDE (2006)

Download references

Acknowledgements

We thank Alexandre Evfimievski and Prithviraj Sen for thoughtful discussions on compressed linear algebra and code generation, Srinivasan Parthasarathy for pointing us to the related work on graph compression, as well as our reviewers for their valuable comments and suggestions.

Author information

Authors and Affiliations

IBM Research – Almaden, San Jose, CA, USA
Matthias Boehm, Peter J. Haas, Frederick R. Reiss & Berthold Reinwald
University of Maryland, College Park, MD, USA
Ahmed Elgohary

Authors

Ahmed Elgohary
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Boehm
View author publications
You can also search for this author in PubMed Google Scholar
Peter J. Haas
View author publications
You can also search for this author in PubMed Google Scholar
Frederick R. Reiss
View author publications
You can also search for this author in PubMed Google Scholar
Berthold Reinwald
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Boehm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elgohary, A., Boehm, M., Haas, P.J. et al. Compressed linear algebra for large-scale machine learning. The VLDB Journal 27, 719–744 (2018). https://doi.org/10.1007/s00778-017-0478-1

Download citation

Received: 29 January 2017
Revised: 06 August 2017
Accepted: 11 August 2017
Published: 12 September 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s00778-017-0478-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compressed linear algebra for large-scale machine learning

Abstract

Access this article

Similar content being viewed by others

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

Big data analytics on Apache Spark

Learning from imbalanced data: open challenges and future directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Compressed linear algebra for large-scale machine learning

Abstract

Access this article

Similar content being viewed by others

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

Big data analytics on Apache Spark

Learning from imbalanced data: open challenges and future directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation