Abstract
\(l_1\)-minimization (\(l_1\)-min) algorithms for the \(l_1\)-min problem have been widely developed. For most \(l_1\)-min algorithms, their main components include dense matrix-vector multiplications such as Ax and \(A^Tx\), and vector operations. We propose a novel warp-based implementation of the matrix-vector multiplication (Ax) on the graphics processing unit (GPU), called the GEMV kernel, and a novel thread-based implementation of the matrix-vector multiplication (\(A^Tx\)) on the GPU, called the GEMV-T kernel. For the GEMV kernel, a self-adaptive warp allocation strategy is used to assign the optimal warp number for each matrix row. Similar to the GEMV kernel, we design a self-adaptive thread allocation strategy to assign the optimal thread number to each matrix row for the GEMV-T kernel. Two popular \(l_1\)-min algorithms, fast iterative shrinkage-thresholding algorithm and augmented Lagrangian multiplier method, are taken for example. Based on the GEMV and GEMV-T kernels, we present two highly parallel \(l_1\)-min solvers on the GPU utilizing the technique of merging kernels and the sparsity of the solution of the \(l_1\)-min algorithms. Furthermore, we design a concurrent multiple \(l_1\)-min solver on the GPU, and optimize its performance by using new features of GPU such as the shuffle instruction and read-only data cache. The experimental results have validated high efficiency and good performance of our methods.
Similar content being viewed by others
References
Bruckstein, A., Donoho, D., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review 51(1), 34–81 (2009)
Mallat, S.: A Wavelet Tour of Signal Processing—The Sparse Way, 3rd edn. Academic, Cambridge (2009)
Donoho, D.L., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc.Natl. Acad. Sci. 100(5), 2197–2202 (2003)
Donoho, D.L., Elad, M.: On the stability of the basis pursuit in the presence of noise. Signal Process. 86(3), 511–532 (2006)
Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)
Tropp, J.A.: Just relax: convex programming methods for subset selection and sparse approximation. IEEE Trans. Inf. Theory 52(3), 1030–1051 (2006)
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998)
Candès, E., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)
Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. 31(2), 210–227 (2009)
Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. 35(11), 2765–2781 (2013)
Elhamifar, E., Vidal, R.: Sparse subspace clustering: computer vision and pattern recognition. In: IEEE Conference on CVPR 2009, pp. 2790–2797 (2009)
Wright, J., Ma, Y., Mairal, J., et al.: Sparse representation for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)
Zibulevsky, M., Elad, M.: L1–L2 optimization in signal and image processing. IEEE Signal Proc. Mag. 27(3), 76–88 (2010)
Figueiredo, M.A.T., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. STSP 1(4), 586–597 (2007)
Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale 1-regularized least squares. IEEE J. STSP 1(4), 606–617 (2007)
Donoho, D.L., Tsaig. Y.: Fast solution of L1-norm minimization problems when the solution may be sparse. Stanford University, Technical Report (2006)
Nesterov, Y.: Gradient methods for minimizing composite objective function. Gen. Inf. 38(3), 768–785 (2007)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bertsekas, D.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1982)
Yang, A.Y., Zhou, Z., Balasubramanian, A.G., Sastry, S.S., Ma, Y.: Fast \(l1\)-minimization algorithms for robust face recognition. IEEE Trans. Image Process. 22(8), 3234–3246 (2013)
Stephen, B., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Yang, A.Y., Sastry, S.S., Ganesh, A., Ma, Y.: Fast \(l1\)-minimization algorithms and an application in robust face recognition: a review. In: 17th IEEE International Conference on Image Processing (ICIP), pp.1849–1852 (2010)
NVIDIA: CUDA C Programming Guide 6.5. http://docs.nvidia.com/cuda/cuda-c-programming-guide/ (2014)
NVIDIA: CUBLAS Library 6.5. http://docs.nvidia.com/cuda/cublas-library/ (2014)
Nagesh, P., Gowda, R., Li, B.: Fast GPU implementation of large scale dictionary and sparse representation based vision problems. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp.1570–1573 (2010)
Shia, V., Yang, A.Y., Sastry, S.S.: Fast \(l1\)-minimization and parallelization for face recognition. In: 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp.1199–1203 (2011)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B 58, 267–288 (1996)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (2003)
Gao, J., Liang, R., Wang, J.: Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU. J. Parallel Distrib. Comput. 74(2), 2088–2098 (2014)
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC09). Portland, Oregon, USA: ACM, November, pp.14–19 (2009)
Acknowledgments
We gratefully acknowledge the comments from anonymous reviewers, which greatly helped us to improve the contents of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the National Science Foundation of China under Grant Number 61379017 and the Science and Technology Plan Project of Zhejiang Province, China under Grant Number 2014C33077.
Rights and permissions
About this article
Cite this article
Gao, J., Li, Z., Liang, R. et al. Adaptive Optimization \(l_1\)-Minimization Solvers on GPU. Int J Parallel Prog 45, 508–529 (2017). https://doi.org/10.1007/s10766-016-0430-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0430-9