Abstract
As Moore’s law approaches its inevitable end, the performance improvement of traditional Von Neumann has encountered challenge. Some dedicated computing architecture for specific domains is seen as one way to meet this challenge, and Ising architecture is one of them, which is mainly used to solve combinatorial optimization problems efficiently. We propose a DRAM-based annealing system (DAS) to realize Ising architecture based on DRAM. The Ising coefficients are transposed and stored in the DRAM cells, and annealing calculations are performed using in-DRAM bulk bitwise operations until the solution to the problem is found. DAS can perform parallel annealing in DRAM, reducing data movement and solution time, making it appropriate for large-scale spin Ising system. We evaluated DAS by segmenting multiple image from the HRSOD dataset and showed that DAS has similar segmentation capabilities to the conventional Onecut method, but improve an average solution time acceleration by 10.2\(\times \) and an average energy consumption of just 0.4349% compared to the conventional method. Furthermore, the design area we added to DAS accounted for only 7% of its total area.
Supported by NSF of Hunan Province No.2022JJ10066 and NSFC No.62272477.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hennessy, J.L., Patterson, D.A.: A new golden age for computer architecture. Commun. ACM 62(2), 48–60 (2019)
Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41(7), 33–38 (2008)
Zhang, J., Chen, S., Wang, Y.: Advancing CMOS-type Ising arithmetic unit into the domain of real-world applications. IEEE Trans. Comput. 67(5), 604–616 (2017)
Yamaoka, M., Yoshimura, C., Hayashi, M., et al.: A 20k-spin Ising chip to solve combinatorial optimization problems with CMOS annealing. IEEE J. Solid-State Circ. 51(1), 303–309 (2015)
Mohseni, N., McMahon, P.L., Byrnes, T.: Ising machines as hardware solvers of combinatorial optimization problems. Nat. Rev. Phys. 4(6), 363–379 (2022)
Oku, D., Tawada, M., Tanaka, S., et al.: How to reduce the bit-width of an Ising model by adding auxiliary spins. IEEE Trans. Comput. 71(1), 223–234 (2020)
Takemoto, T., Hayashi, M., Yoshimura, C., et al.: A 2\(\times \)30k-spin multi-chip scalable CMOS annealing processor based on a processing-in-memory approach for solving large-scale combinatorial optimization problems. IEEE J. Solid-State Circuits 55(1), 145–156 (2019)
Wang, Z., Hu, X., Zhang, J., et al.: AIM: annealing in memory for vision applications. Symmetry 12(3), 480 (2020)
Su, Y., Kim, H., Kim, B.: CIM-spin: a scalable CMOS annealing processor with digital in-memory spin operators and register spins for combinatorial optimization problems. IEEE J. Solid-State Circ. 57(7), 2263–2273 (2022)
Brush, S.G.: History of the Lenz-Ising model. Rev. Mod. Phys. 39(4), 883 (1967)
Yoshimura, T., Shirai, T., Tawada, M., et al.: QUBO matrix distorting method for consumer applications. In: 2022 IEEE International Conference on Consumer Electronics (ICCE), pp. 01–06. (IEEE) (2022)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., et al.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)
Karp, R.M.: Reducibility among combinatorial problems. Complexity of Computer Computations, pp. 85–103. Springer, Boston (1972)
Lee, D., Kim, Y., Pekhimenko, G., et al.: Adaptive-latency DRAM: optimizing DRAM timing for the common-case. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 489–501. IEEE (2015)
Zhang, J., Chen, S., Yang, C., et al.: Double random sources: low-cost method to enhance local optima escaping ability in CMOS-type Ising chips. Electron. Lett. 52(21), 1792–1793 (2016)
Ferreira, J.D., Falcao, G., Gómez-Luna, J., et al.: PLUTo: enabling massively parallel computation in DRAM via lookup tables. In: 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 900–919. IEEE (2022)
Seshadri, V., Kim, Y., Fallin, C., et al.: RowClone: fast and energy-efficient in-DRAM bulk da-ta copy and initialization. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 185–197 (2013)
Seshadri, V., Lee, D., Mullins, T., et al.: Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology. In: 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 273–287. IEEE (2017)
Seshadri, V., Hsieh, K., Boroum, A., et al.: Fast bulk bitwise AND and OR in DRAM. IEEE Comput. Archit. Lett. 14(2), 127–131 (2015)
Kim, Y., Seshadri, V., Lee, D., et al.: A case for exploiting subarray-level parallelism (SALP) in DRAM. In: 2012 39th Annual International Symposium on Computer Architecture (ISCA), pp. 368–379. IEEE (2012)
Chang, K.K., Nair, P.J., Lee, D., et al.: Low-cost inter-linked subarrays (LISA): enabling fast inter-subarray data movement in DRAM. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 568–580. IEEE (2016)
Lim, K.N., Jang, W.J., Won, H.S., et al.: A 1.2 V 23nm 6F 2 4Gb DDR3 SDRAM with local-bitline sense amplifier, hybrid LIO sense amplifier and dummy-less array architecture. In: 2012 IEEE International Solid-State Circuits Conference, pp. 42–44. IEEE (2012)
Takahashi, T., Sekiguchi, T., Takemura, R., et al.: A multigigabit DRAM technology with 6F/sup 2/open-bitline cell, distributed overdriven sensing, and stacked-flash fuse. IEEE J. Solid-State Circ. 36(11), 1721–1727 (2001)
Ali, M.F., Jaiswal, A., Roy, K.: In-memory low-cost bit-serial addition using commodity DRAM technology. IEEE Trans. Circuits Syst. I Regul. Pap. 67(1), 155–165 (2019)
Deng, Q., Jiang, L., Zhang, Y., et al.: DrAcc: a DRAM based accelerator for accurate CNN inference. In: Proceedings of the 55th Annual Design Automation Conference, pp. 1–6 (2018)
Balasubramonian, R., Kahng, A.B., Muralimanohar, N., et al.: CACTI 7: new tools for inter-connect exploration in innovative off-chip memories. ACM Trans. Archit. Code Optim. (TACO) 14(2), 1–25 (2017)
Tkacik, T.E.: A hardware random number generator. In: International Work-shop on Cryptographic Hardware and Embedded Systems, pp. 450–453 (2002)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)
Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graphcuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)
Zeng, Y., Zhang, P., Zhang, J., et al.: Towards high-resolution salient object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7234–7243 (2019)
Tang, M., Gorelick, L., Veksler, O., et al.: Grabcut in one cut. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1769–1776 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Deng, W., Wang, Z., Guo, Y., Zhang, J., Wu, Z., Wang, Y. (2024). DAS: A DRAM-Based Annealing System for Solving Large-Scale Combinatorial Optimization Problems. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14489. Springer, Singapore. https://doi.org/10.1007/978-981-97-0798-0_10
Download citation
DOI: https://doi.org/10.1007/978-981-97-0798-0_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0797-3
Online ISBN: 978-981-97-0798-0
eBook Packages: Computer ScienceComputer Science (R0)