Distributing Frank–Wolfe via map-reduce

Moharrer, Armin; Ioannidis, Stratis

doi:10.1007/s10115-018-1294-7

Distributing Frank–Wolfe via map-reduce

Regular Paper
Published: 18 December 2018

Volume 60, pages 665–690, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

255 Accesses
Explore all metrics

Abstract

Large-scale optimization problems abound in data mining and machine learning applications, and the computational challenges they pose are often addressed through parallelization. We identify structural properties under which a convex optimization problem can be massively parallelized via map-reduce operations using the Frank–Wolfe (FW) algorithm. The class of problems that can be tackled this way is quite broad and includes experimental design, AdaBoost, and projection to a convex hull. Implementing FW via map-reduce eases parallelization and deployment via commercial distributed computing frameworks. We demonstrate this by implementing FW over Spark, an engine for parallel data processing, and establish that parallelization through map-reduce yields significant performance improvements: We solve problems with 20 million variables using 350 cores in 79 min; the same operation takes 48 h when executed serially.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Big data analytics on Apache Spark

Article 13 October 2016

Salman Salloum, Ruslan Dautov, … Joshua Zhexue Huang

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

Introduction to Machine Learning

Notes

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al (2016) Tensorflow: A system for large-scale machine learning. In: OSDI
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. In: VLDB
Beck A, Shtern S (2015) Linearly convergent away-step conditional gradient for non-strongly convex functions. Math Progr 164:1–27
Article MathSciNet MATH Google Scholar
Bellet A, Liang Y, Garakani AB, Balcan M-F, Sha F (2015) A distributed Frank–Wolfe algorithm for communication-efficient sparse learning. In: SDM
Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont
MATH Google Scholar
Bian Y, Mirzasoleiman B, Buhmann JM, Krause A (2017) Guaranteed non-convex optimization: Submodular maximization over continuous domains. In: AISTATS
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Article MATH Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York
Book MATH Google Scholar
Calinescu G, Chekuri C, Pál M, Vondrák J (2011) Maximizing a monotone submodular function subject to a matroid constraint. SIAM J Comput 40(6):1740–1766
Article MathSciNet MATH Google Scholar
Canon M, Cullum C (1968) A tight upper bound on the rate of convergence of Frank–Wolfe algorithm. SIAM J Control 6(4):509–516
Article MathSciNet MATH Google Scholar
Chandrasekaran V, Recht B, Parrilo PA, Willsky AS (2012) The convex geometry of linear inverse problems. Found Comput Math 12(6):805–849
Article MathSciNet MATH Google Scholar
Chen S, Banerjee A (2015) Structured estimation with atomic norms: General bounds and applications. In: NIPS
Chu C-T, Kim SK, Lin Y-A, Yu Y, Bradski G, Ng AY, Olukotun K (2006) Map-reduce for machine learning on multicore. In: NIPS
Clarkson K L (2010) Coresets, sparse greedy approximation, and the Frank–Wolfe algorithm. ACM Trans Algorithms 6(4):63:1–63:30
Article MathSciNet MATH Google Scholar
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Dudik M, Harchaoui Z, Malick J (2012) Lifted coordinate descent for learning with trace-norm regularization. In: AISTATS
Dunn JC (1979) Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals. SIAM J Control Optim 17(2):187–211
Article MathSciNet MATH Google Scholar
Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res Logist Q 3(1–2):95–110
Article MathSciNet Google Scholar
Garber D, Hazan E (2016) A linearly convergent variant of the conditional gradient algorithm under strong convexity, with applications to online and stochastic optimization. SIAM J Optim 26(3):1493–1528
Article MathSciNet MATH Google Scholar
Garber D, Meshi O (2016) Linear-memory and decomposition-invariant linearly convergent conditional gradient algorithm for structured polytopes In: Advances in neural information processing systems, pp 1001–1009
Guélat J, Marcotte P (1986) Some comments on Wolfe’s away step. Math Progr 35(1):110–119
Article MathSciNet MATH Google Scholar
Harchaoui Z, Douze M, Paulin M, Dudik M, Malick J (2012) Large-scale image classification with trace-norm regularization. In: CVPR
Harper F M, Konstan J A (2015) The movielens datasets: history and context. ACM Trans Interact Intell Syst 5(4):19:1–19:19
Article Google Scholar
Hazan E, Kale S (2012) Projection-free online learning. In: ICML
Hazan E, Luo H (2016) Variance-reduced and projection-free stochastic optimization. In: ICML
Jaggi M (2013) Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: ICML
Joulin A, Tang K, Fei-Fei L (2014) Efficient image and video co-localization with Frank–Wolfe algorithm. In: ECVV
Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30
Article Google Scholar
Kumar R, Moseley B, Vassilvitskii S, Vattani A (2015) Fast greedy algorithms in mapreduce and streaming. ACM Trans Parallel Comput 2(3):14
Article Google Scholar
Lacoste-Julien S, Jaggi M (2015) On the global linear convergence of Frank–Wolfe optimization variants. In: NIPS
Lacoste-Julien S, Jaggi M, Schmidt M, Pletscher P (2013) Block-coordinate Frank–Wolfe optimization for structural SVMs. In: Proceedings of ICML
Lan G, Zhou Y (2016) Conditional gradient sliding for convex optimization. SIAM J Optim 26(2):1379–1409
Article MathSciNet MATH Google Scholar
Leighton F T (2014) Introduction to parallel algorithms and architectures: trees hypercubes. Elsevier, Amsterdam
MATH Google Scholar
Li M, Zhou L, Yang Z, Li A, Xia F, Andersen DG, Smola A (2013) Parameter server for distributed machine learning. In: NIPS workshop
Lichman M (2013) UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA
Google Scholar
Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: ICML
Osokin A, Alayrac J-B, Lukasewitz I, Dokania PK, Lacoste-Julien S (2016) Minding the gaps for block Frank–Wolfe optimization of structured SVMs. In: ICML
Recht B, Re C, Wright S, Niu F (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: NIPS
Reddi SJ, Sra S, Póczós B, Smola A (2016) Stochastic Frank–Wolfe methods for non-convex optimization. In: Allerton
Shah P, Bhaskar BN, Tang G, Recht B (2012) Linear system identification via atomic norm regularization. In: CDC
Sherman J, Morrison WJ (1950) Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann Math Stat 21(1):124–127
Article MathSciNet MATH Google Scholar
Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: WWW
Tewari A, Ravikumar PK, Dhillon IS (2011) Greedy algorithms for structurally constrained high dimensional problems. In: NIPS
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 73:267–288
MathSciNet MATH Google Scholar
Tran NL, Peel T, Skhiri S (2015) Distributed Frank–Wolfe under pipelined stale synchronous parallelism. In: IEEE international conference on big data (Big Data)
Wang Y-X, Sadhanala V, Dai W, Neiswanger W, Sra S, Xing E (2016) Parallel and distributed block-coordinate Frank–Wolfe algorithms. In: Proceedings of ICML
White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc
Wolfe P (1970) Convergence theory in nonlinear programming. In: Abadie J (ed) Integer and nonlinear programming. North-Holland Publishing Company, Amsterdam
Google Scholar
Yang H-C, Dasdan A, Hsiao R-L, Parker DS (2007) Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD
Yang T (2013) Trading computation for communication: distributed stochastic dual coordinate ascent. In: NIPS
Ying Y, Li P (2012) Distance metric learning with eigenvalue optimization. J Mach Learn Res 13:1–26
MathSciNet MATH Google Scholar
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: HotCloud
Zhang L, Wang G, Romero D, Giannakis GB (2017) Randomized block Frank–Wolfe for convergent large-scale learning. IEEE Trans Signal Proc 65(4):6448–6461
Article MathSciNet MATH Google Scholar
Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent. In: NIPS

Download references

Acknowledgements

We kindly thank our reviewers for their very useful comments and suggestions. The work was supported by National Science Foundation (NSF) CAREER grant CCF-1750539.

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, Northeastern University, Boston, MA, 02155, USA
Armin Moharrer & Stratis Ioannidis

Authors

Armin Moharrer
View author publications
You can also search for this author in PubMed Google Scholar
Stratis Ioannidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Armin Moharrer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moharrer, A., Ioannidis, S. Distributing Frank–Wolfe via map-reduce. Knowl Inf Syst 60, 665–690 (2019). https://doi.org/10.1007/s10115-018-1294-7

Download citation

Received: 20 December 2017
Revised: 14 November 2018
Accepted: 24 November 2018
Published: 18 December 2018
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s10115-018-1294-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Distributing Frank–Wolfe via map-reduce

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Data clustering: application and trends

Introduction to Machine Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributing Frank–Wolfe via map-reduce

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Data clustering: application and trends

Introduction to Machine Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation