Skip to main content
Log in

Adaptive Optimization \(l_1\)-Minimization Solvers on GPU

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

\(l_1\)-minimization (\(l_1\)-min) algorithms for the \(l_1\)-min problem have been widely developed. For most \(l_1\)-min algorithms, their main components include dense matrix-vector multiplications such as Ax and \(A^Tx\), and vector operations. We propose a novel warp-based implementation of the matrix-vector multiplication (Ax) on the graphics processing unit (GPU), called the GEMV kernel, and a novel thread-based implementation of the matrix-vector multiplication (\(A^Tx\)) on the GPU, called the GEMV-T kernel. For the GEMV kernel, a self-adaptive warp allocation strategy is used to assign the optimal warp number for each matrix row. Similar to the GEMV kernel, we design a self-adaptive thread allocation strategy to assign the optimal thread number to each matrix row for the GEMV-T kernel. Two popular \(l_1\)-min algorithms, fast iterative shrinkage-thresholding algorithm and augmented Lagrangian multiplier method, are taken for example. Based on the GEMV and GEMV-T kernels, we present two highly parallel \(l_1\)-min solvers on the GPU utilizing the technique of merging kernels and the sparsity of the solution of the \(l_1\)-min algorithms. Furthermore, we design a concurrent multiple \(l_1\)-min solver on the GPU, and optimize its performance by using new features of GPU such as the shuffle instruction and read-only data cache. The experimental results have validated high efficiency and good performance of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Bruckstein, A., Donoho, D., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review 51(1), 34–81 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Mallat, S.: A Wavelet Tour of Signal Processing—The Sparse Way, 3rd edn. Academic, Cambridge (2009)

    MATH  Google Scholar 

  3. Donoho, D.L., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc.Natl. Acad. Sci. 100(5), 2197–2202 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  4. Donoho, D.L., Elad, M.: On the stability of the basis pursuit in the presence of noise. Signal Process. 86(3), 511–532 (2006)

    Article  MATH  Google Scholar 

  5. Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  6. Tropp, J.A.: Just relax: convex programming methods for subset selection and sparse approximation. IEEE Trans. Inf. Theory 52(3), 1030–1051 (2006)

    Article  MATH  Google Scholar 

  7. Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  8. Candès, E., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. 31(2), 210–227 (2009)

    Article  Google Scholar 

  10. Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. 35(11), 2765–2781 (2013)

    Article  Google Scholar 

  11. Elhamifar, E., Vidal, R.: Sparse subspace clustering: computer vision and pattern recognition. In: IEEE Conference on CVPR 2009, pp. 2790–2797 (2009)

  12. Wright, J., Ma, Y., Mairal, J., et al.: Sparse representation for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)

    Article  Google Scholar 

  13. Zibulevsky, M., Elad, M.: L1–L2 optimization in signal and image processing. IEEE Signal Proc. Mag. 27(3), 76–88 (2010)

    Article  Google Scholar 

  14. Figueiredo, M.A.T., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. STSP 1(4), 586–597 (2007)

    Google Scholar 

  15. Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale 1-regularized least squares. IEEE J. STSP 1(4), 606–617 (2007)

    Google Scholar 

  16. Donoho, D.L., Tsaig. Y.: Fast solution of L1-norm minimization problems when the solution may be sparse. Stanford University, Technical Report (2006)

  17. Nesterov, Y.: Gradient methods for minimizing composite objective function. Gen. Inf. 38(3), 768–785 (2007)

    MathSciNet  Google Scholar 

  18. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  19. Bertsekas, D.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1982)

    MATH  Google Scholar 

  20. Yang, A.Y., Zhou, Z., Balasubramanian, A.G., Sastry, S.S., Ma, Y.: Fast \(l1\)-minimization algorithms for robust face recognition. IEEE Trans. Image Process. 22(8), 3234–3246 (2013)

    Article  Google Scholar 

  21. Stephen, B., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    MathSciNet  Google Scholar 

  22. Yang, A.Y., Sastry, S.S., Ganesh, A., Ma, Y.: Fast \(l1\)-minimization algorithms and an application in robust face recognition: a review. In: 17th IEEE International Conference on Image Processing (ICIP), pp.1849–1852 (2010)

  23. NVIDIA: CUDA C Programming Guide 6.5. http://docs.nvidia.com/cuda/cuda-c-programming-guide/ (2014)

  24. NVIDIA: CUBLAS Library 6.5. http://docs.nvidia.com/cuda/cublas-library/ (2014)

  25. Nagesh, P., Gowda, R., Li, B.: Fast GPU implementation of large scale dictionary and sparse representation based vision problems. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp.1570–1573 (2010)

  26. Shia, V., Yang, A.Y., Sastry, S.S.: Fast \(l1\)-minimization and parallelization for face recognition. In: 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp.1199–1203 (2011)

  27. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  28. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (2003)

    MATH  Google Scholar 

  29. Gao, J., Liang, R., Wang, J.: Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU. J. Parallel Distrib. Comput. 74(2), 2088–2098 (2014)

    Article  Google Scholar 

  30. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC09). Portland, Oregon, USA: ACM, November, pp.14–19 (2009)

Download references

Acknowledgments

We gratefully acknowledge the comments from anonymous reviewers, which greatly helped us to improve the contents of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaquan Gao.

Additional information

This work is supported by the National Science Foundation of China under Grant Number 61379017 and the Science and Technology Plan Project of Zhejiang Province, China under Grant Number 2014C33077.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, J., Li, Z., Liang, R. et al. Adaptive Optimization \(l_1\)-Minimization Solvers on GPU. Int J Parallel Prog 45, 508–529 (2017). https://doi.org/10.1007/s10766-016-0430-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0430-9

Keywords

Navigation