Skip to main content
Log in

Optimization of multi-class 0/1 knapsack problem on GPUs by improving memory access efficiency

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This work aims to improve the GPU performance for solving the 0/1 knapsack problem, which is a well-known combinatorial optimization problem found in many practical applications, including cryptography, financial decision, electronic design automation, computing resource management, etc. The knapsack problem is NP-hard, but it can be solved efficiently by dynamic programming (DP) algorithms in pseudo-polynomial runtime. The DP knapsack algorithm on GPUs has been presented. However, as the modern GPU architecture provides much higher computing throughput than its memory bandwidth, previous work is bounded by the data access time on GPU memory because its CGMA (Compute to Global Memory Access) ratio is 1, which means every computing operation involves one memory access on average. To address the problem, an innovative approach called Multi-Class 0/1 Knapsack Problem (MCKP), whose items can be classified into groups with equal values or weights is proposed in this paper. By reconstructing the DP equations for solving MCKP, it is able to explore data parallelism and reusability across threads. This made it possible to optimize the computation across iterations (i.e., items), and significantly improve the CGMA ratio by 5-fold after exploring the use of GPU shared memory and registers for reused data. We extensively analyze the performance of our approach on two modern GPU models, NVIDIA Tesla V100 and RTX 3070. Compared to the runtime of previous work, our approach achieves up to 8x and 18x speedup on V100 and RTX 3070 respectively, the latter one being a GPU with lower memory bandwidth. In addition, by comparing the two speedups, we found that we are able to achieve more efficient computing usage when the memory bandwidth is limited such as RTX 3070.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bellman R (1966) Dynamic programming. Science 153(3731):34–37. https://doi.org/10.1126/science.153.3731.34

    Article  MATH  Google Scholar 

  2. Boukedjar A, Lalami ME, El-Baz D (2012) Parallel branch and bound on a cpu-gpu system. In: 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 392–398. https://doi.org/10.1109/PDP.2012.23

  3. Boyer V, El Baz D, Elkihel M (2012) Solving knapsack problems on gpu. Comput Op Res 39(1):42–47

    Article  MathSciNet  Google Scholar 

  4. Carneiro T, Muritiba AE, Negreiros M, Lima de Campos GA (2011) A new parallel schema for branch-and-bound algorithms using gpgpu. In: 2011 23rd International Symposium on Computer Architecture and High Performance Computing, pp. 41–47. https://doi.org/10.1109/SBAC-PAD.2011.20

  5. Ding N, Williams S (2019) An instruction roofline model for gpus. In: 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp. 7–18. https://doi.org/10.1109/PMBS49563.2019.00007

  6. Garey MR, Johnson DS (1990) Computers and intractability; a guide to the theory of NP-completeness. W. H Freeman & Co., New York

    MATH  Google Scholar 

  7. Hajarian M, Shahbahrami A, Hoseini F (2016) A parallel solution for the 0-1 knapsack problem using firefly algorithm. In: 1st Conference on Swarm Intelligence and Evolutionary Computation (CSIEC), pp. 25–30. https://doi.org/10.1109/CSIEC.2016.7482134

  8. HPC Advisory Council: The Top 500 List (2021). https://www.top500.org/lists/top500/2021/06/

  9. Huang S, Xiao S, Feng W (2009) On the energy efficiency of graphics processing units for scientific computing. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–8. https://doi.org/10.1109/IPDPS.2009.5160980

  10. Kelly T (2005) Generalized knapsack solvers for multi-unit combinatorial auctions: Analysis and application to computational resource allocation. In: P. Faratin, J.A. Rodríguez-Aguilar (eds.) Agent-Mediated Electronic Commerce VI. Theories for and Engineering of Distributed Mechanisms and Systems, pp. 73–86. Springer Berlin Heidelberg, Berlin, Heidelberg

  11. Konstantinidis E, Cotronis Y (2015) A practical performance model for compute and memory bound gpu kernels. In: 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 651–658. https://doi.org/10.1109/PDP.2015.51

  12. Kumaraguruparan N, Sivaramakrishnan H, Sapatnekar SS (2012) Residential task scheduling under dynamic pricing using the multiple knapsack method. In: 2012 IEEE PES Innovative Smart Grid Technologies (ISGT), pp. 1–6. https://doi.org/10.1109/ISGT.2012.6175656

  13. Lalami ME, El-Baz D (2012) Gpu implementation of the branch and bound method for knapsack problems. In: IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum, pp. 1769–1777. https://doi.org/10.1109/IPDPSW.2012.219

  14. Lee J, Shragowitz E, Sahni S (1988) A hypercube algorithm for the 0/1 knapsack problem. J Parallel Distrib Comput 5(4):438–456. https://doi.org/10.1016/0743-7315(88)90007-X

    Article  Google Scholar 

  15. Lin J, Storer JA (1991) Processor-efficient hypercube algorithms for the knapsack problem. J Parallel Distrib Comput 13(3):332–337. https://doi.org/10.1016/0743-7315(91)90080-S

    Article  Google Scholar 

  16. Liu H, Shao Z, Wang M, Du J, Xue CJ, Jia Z (2009) Combining coarse-grained software pipelining with dvs for scheduling real-time periodic dependent tasks on multi-core embedded systems. J Signal Process Syst 57(2):249–262. https://doi.org/10.1007/s11265-008-0315-2

    Article  Google Scholar 

  17. National Center for High-performance Computing: TAIWANIA2 (2018). https://www.nchc.org.tw/

  18. Nawaz Z, Stefanov T, Bertels K (2009) Efficient hardware generation for dynamic programming problems. In: 2009 International Conference on Field-Programmable Technology, pp. 348–352. https://doi.org/10.1109/FPT.2009.5377618

  19. NVIDIA: NVIDIA A100 datasheet (2020). https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet.pdf

  20. NVIDIA: Cuda c++ programming guide (2021). https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf

  21. Oak Ridge National Laboratory: SUMMIT (2018). https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/

  22. O’Connell JF, Mumford CL (2014) An exact dynamic programming based method to solve optimisation problems using gpus. In: Second International Symposium on Computing and Networking, pp. 347–353. https://doi.org/10.1109/CANDAR.2014.27

  23. Odlyzko AM (1990) The rise and fall of knapsack cryptosystems. In: In Cryptology and Computational Number Theory, pp. 75–88. A.M.S

  24. O’Leary DE (1995) Financial planning with 0–1 knapsack problems, part i: domination results. Adv Math Program Financ Plan 4:139–150

    Google Scholar 

  25. Pospichal P, Schwarz J, Jaros J (2010) Parallel genetic algorithm solving 0/1 knapsack problem running on the gpu. In: Proceedings of the 16th International Conference on Soft Computing (MENDEL), pp. 64–70

  26. Schryen G (2020) Parallel computational optimization in operations research: a new integrative framework, literature review and research directions. Eur J Oper Res 287(1):1–18. https://doi.org/10.1016/j.ejor.2019.11.033

    Article  MathSciNet  MATH  Google Scholar 

  27. Shen J, Shigeoka K, Ino F, Hagihara K (2017) An out-of-core branch and bound method for solving the 0-1 knapsack problem on a gpu. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 254–267. https://doi.org/10.1007/978-3-319-65482-9_17

  28. Shen J, Shigeoka K, Ino F, Hagihara K (2019) Gpu-based branch-and-bound method to solve large 0–1 knapsack problems with data-centric strategies. Concurr Comput Pract Exp 31(4):e4954

    Article  Google Scholar 

  29. Sun X, Wu CC, Chen LR, Lin JY (2018) Using inter-block synchronization to improve the knapsack problem on gpus. Int J Grid High Perform Comput (IJGHPC) 10(4):83–98

    Article  Google Scholar 

  30. Suri B, Bordoloi UD, Eles P (2012) A scalable gpu-based approach to accelerate the multiple-choice knapsack problem. In: Design, Automation Test in Europe Conference Exhibition (DATE), pp. 1126–1129. https://doi.org/10.1109/DATE.2012.6176665

  31. Thant Sin ST (2021) The parallel processing approach to the dynamic programming algorithm of knapsack problem. In: 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), pp. 2252–2256. https://doi.org/10.1109/ElConRus51938.2021.9396489

  32. Toth P (1980) Dynamic programming algorithms for the zero-one knapsack problem. Computing 25:29–45

    Article  MathSciNet  Google Scholar 

  33. Ulm DR, Baker JW (1996) Solving a 2d knapsack problem on an associative computer augmented with a linear network. In: in Proc. of the International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 29–32

  34. Wang Q, Chu X (2020) Gpgpu performance estimation with core and memory frequency scaling. IEEE Trans Parallel Distrib Syst 31(12):2865–2881. https://doi.org/10.1109/TPDS.2020.3004623

    Article  Google Scholar 

  35. Wen H, Zhang W (2015) Exploring shared memory and cache to improve gpu performance and energy efficiency. In: Sixteenth International Symposium on Quality Electronic Design, pp. 402–405. https://doi.org/10.1109/ISQED.2015.7085459

  36. Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76. https://doi.org/10.1145/1498765.1498785

    Article  Google Scholar 

  37. Xiao S, Feng Wc (2010) Inter-block gpu communication via fast barrier synchronization. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12. https://doi.org/10.1109/IPDPS.2010.5470477

  38. You Y, Zhang Z, Hsieh CJ, Demmel J, Keutzer K (2018) Imagenet training in minutes. In: Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018, pp. 1–10. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3225058.3225069

Download references

Acknowledgements

We thank to National Center for High-performance Computing (NCHC) for providing computational and storage resources. We also thank to Prof. Ing-Jer Huang from National Sun Yat-sen University for providing valuable insights and comments to our work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerry Chou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, EM., Chou, J. Optimization of multi-class 0/1 knapsack problem on GPUs by improving memory access efficiency. J Supercomput 78, 13653–13679 (2022). https://doi.org/10.1007/s11227-022-04425-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04425-3

Keywords

Navigation