Skip to main content
Log in

Distributing Frank–Wolfe via map-reduce

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Large-scale optimization problems abound in data mining and machine learning applications, and the computational challenges they pose are often addressed through parallelization. We identify structural properties under which a convex optimization problem can be massively parallelized via map-reduce operations using the Frank–Wolfe (FW) algorithm. The class of problems that can be tackled this way is quite broad and includes experimental design, AdaBoost, and projection to a convex hull. Implementing FW via map-reduce eases parallelization and deployment via commercial distributed computing frameworks. We demonstrate this by implementing FW over Spark, an engine for parallel data processing, and establish that parallelization through map-reduce yields significant performance improvements: We solve problems with 20 million variables using 350 cores in 79 min; the same operation takes 48 h when executed serially.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://www.github.com/neu-spiral/FrankWolfe.

  2. https://www.cvxopt.org.

  3. https://archive.ics.uci.edu/ml/index.php

  4. https://www.webscope.sandbox.yahoo.com.

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al (2016) Tensorflow: A system for large-scale machine learning. In: OSDI

  2. Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. In: VLDB

  3. Beck A, Shtern S (2015) Linearly convergent away-step conditional gradient for non-strongly convex functions. Math Progr 164:1–27

    Article  MathSciNet  MATH  Google Scholar 

  4. Bellet A, Liang Y, Garakani AB, Balcan M-F, Sha F (2015) A distributed Frank–Wolfe algorithm for communication-efficient sparse learning. In: SDM

  5. Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  6. Bian Y, Mirzasoleiman B, Buhmann JM, Krause A (2017) Guaranteed non-convex optimization: Submodular maximization over continuous domains. In: AISTATS

  7. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  MATH  Google Scholar 

  8. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  9. Calinescu G, Chekuri C, Pál M, Vondrák J (2011) Maximizing a monotone submodular function subject to a matroid constraint. SIAM J Comput 40(6):1740–1766

    Article  MathSciNet  MATH  Google Scholar 

  10. Canon M, Cullum C (1968) A tight upper bound on the rate of convergence of Frank–Wolfe algorithm. SIAM J Control 6(4):509–516

    Article  MathSciNet  MATH  Google Scholar 

  11. Chandrasekaran V, Recht B, Parrilo PA, Willsky AS (2012) The convex geometry of linear inverse problems. Found Comput Math 12(6):805–849

    Article  MathSciNet  MATH  Google Scholar 

  12. Chen S, Banerjee A (2015) Structured estimation with atomic norms: General bounds and applications. In: NIPS

  13. Chu C-T, Kim SK, Lin Y-A, Yu Y, Bradski G, Ng AY, Olukotun K (2006) Map-reduce for machine learning on multicore. In: NIPS

  14. Clarkson K L (2010) Coresets, sparse greedy approximation, and the Frank–Wolfe algorithm. ACM Trans Algorithms 6(4):63:1–63:30

    Article  MathSciNet  MATH  Google Scholar 

  15. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  16. Dudik M, Harchaoui Z, Malick J (2012) Lifted coordinate descent for learning with trace-norm regularization. In: AISTATS

  17. Dunn JC (1979) Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals. SIAM J Control Optim 17(2):187–211

    Article  MathSciNet  MATH  Google Scholar 

  18. Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res Logist Q 3(1–2):95–110

    Article  MathSciNet  Google Scholar 

  19. Garber D, Hazan E (2016) A linearly convergent variant of the conditional gradient algorithm under strong convexity, with applications to online and stochastic optimization. SIAM J Optim 26(3):1493–1528

    Article  MathSciNet  MATH  Google Scholar 

  20. Garber D, Meshi O (2016) Linear-memory and decomposition-invariant linearly convergent conditional gradient algorithm for structured polytopes In: Advances in neural information processing systems, pp 1001–1009

  21. Guélat J, Marcotte P (1986) Some comments on Wolfe’s away step. Math Progr 35(1):110–119

    Article  MathSciNet  MATH  Google Scholar 

  22. Harchaoui Z, Douze M, Paulin M, Dudik M, Malick J (2012) Large-scale image classification with trace-norm regularization. In: CVPR

  23. Harper F M, Konstan J A (2015) The movielens datasets: history and context. ACM Trans Interact Intell Syst 5(4):19:1–19:19

    Article  Google Scholar 

  24. Hazan E, Kale S (2012) Projection-free online learning. In: ICML

  25. Hazan E, Luo H (2016) Variance-reduced and projection-free stochastic optimization. In: ICML

  26. Jaggi M (2013) Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: ICML

  27. Joulin A, Tang K, Fei-Fei L (2014) Efficient image and video co-localization with Frank–Wolfe algorithm. In: ECVV

  28. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30

    Article  Google Scholar 

  29. Kumar R, Moseley B, Vassilvitskii S, Vattani A (2015) Fast greedy algorithms in mapreduce and streaming. ACM Trans Parallel Comput 2(3):14

    Article  Google Scholar 

  30. Lacoste-Julien S, Jaggi M (2015) On the global linear convergence of Frank–Wolfe optimization variants. In: NIPS

  31. Lacoste-Julien S, Jaggi M, Schmidt M, Pletscher P (2013) Block-coordinate Frank–Wolfe optimization for structural SVMs. In: Proceedings of ICML

  32. Lan G, Zhou Y (2016) Conditional gradient sliding for convex optimization. SIAM J Optim 26(2):1379–1409

    Article  MathSciNet  MATH  Google Scholar 

  33. Leighton F T (2014) Introduction to parallel algorithms and architectures: trees hypercubes. Elsevier, Amsterdam

    MATH  Google Scholar 

  34. Li M, Zhou L, Yang Z, Li A, Xia F, Andersen DG, Smola A (2013) Parameter server for distributed machine learning. In: NIPS workshop

  35. Lichman M (2013) UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA

    Google Scholar 

  36. Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: ICML

  37. Osokin A, Alayrac J-B, Lukasewitz I, Dokania PK, Lacoste-Julien S (2016) Minding the gaps for block Frank–Wolfe optimization of structured SVMs. In: ICML

  38. Recht B, Re C, Wright S, Niu F (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: NIPS

  39. Reddi SJ, Sra S, Póczós B, Smola A (2016) Stochastic Frank–Wolfe methods for non-convex optimization. In: Allerton

  40. Shah P, Bhaskar BN, Tang G, Recht B (2012) Linear system identification via atomic norm regularization. In: CDC

  41. Sherman J, Morrison WJ (1950) Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann Math Stat 21(1):124–127

    Article  MathSciNet  MATH  Google Scholar 

  42. Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: WWW

  43. Tewari A, Ravikumar PK, Dhillon IS (2011) Greedy algorithms for structurally constrained high dimensional problems. In: NIPS

  44. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 73:267–288

    MathSciNet  MATH  Google Scholar 

  45. Tran NL, Peel T, Skhiri S (2015) Distributed Frank–Wolfe under pipelined stale synchronous parallelism. In: IEEE international conference on big data (Big Data)

  46. Wang Y-X, Sadhanala V, Dai W, Neiswanger W, Sra S, Xing E (2016) Parallel and distributed block-coordinate Frank–Wolfe algorithms. In: Proceedings of ICML

  47. White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc

  48. Wolfe P (1970) Convergence theory in nonlinear programming. In: Abadie J (ed) Integer and nonlinear programming. North-Holland Publishing Company, Amsterdam

    Google Scholar 

  49. Yang H-C, Dasdan A, Hsiao R-L, Parker DS (2007) Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD

  50. Yang T (2013) Trading computation for communication: distributed stochastic dual coordinate ascent. In: NIPS

  51. Ying Y, Li P (2012) Distance metric learning with eigenvalue optimization. J Mach Learn Res 13:1–26

    MathSciNet  MATH  Google Scholar 

  52. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: HotCloud

  53. Zhang L, Wang G, Romero D, Giannakis GB (2017) Randomized block Frank–Wolfe for convergent large-scale learning. IEEE Trans Signal Proc 65(4):6448–6461

    Article  MathSciNet  MATH  Google Scholar 

  54. Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent. In: NIPS

Download references

Acknowledgements

We kindly thank our reviewers for their very useful comments and suggestions. The work was supported by National Science Foundation (NSF) CAREER grant CCF-1750539.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Armin Moharrer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moharrer, A., Ioannidis, S. Distributing Frank–Wolfe via map-reduce. Knowl Inf Syst 60, 665–690 (2019). https://doi.org/10.1007/s10115-018-1294-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1294-7

Keywords

Navigation