Abstract
Given the saturation of single-threaded performance improvements in General-Purpose Processor, novel architectural techniques are required to meet emerging demands. In this paper, we propose a generic acceleration framework for approximate algorithms that replaces function execution by table look-up accesses in dedicated memories. A strategy based on the K-Means Clustering algorithm is used to learn mappings from arbitrary function inputs to frequently occurring outputs at compile-time. At run-time, these learned values are fetched from dedicated look-up tables and the best result is selected using the Nearest-Centroid Classifier, which is implemented in hardware. The proposed approach improves over the state-of-the-art neural acceleration solution, with nearly 3X times better performance, \(18.72\%\) up to \(90.99\%\) energy reductions and \(17\%\) area savings under similar levels of quality, thus opening new opportunities for performance harvesting in approximate accelerators.













Similar content being viewed by others
Notes
Presenting the details of this algorithm or ANN training is beyond the scope of this paper. We present here an overview with only enough details to allow a comparison with the approximation approach we developed.
References
Beck ACS, Lisba CAL, Carro L (2012) Adaptable embedded systems. Springer Publishing Company, Incorporated, Berlin
Xu Q, Mytkowicz T, Kim NS (2016) Approximate computing: a survey. IEEE Des Test 33(1):8–22
Mittal S (2016) A survey of techniques for approximate computing. ACM Comput Surv 48(4):1–33
Sidiroglou-Douskos S, Misailovic S, Hoffmann H, Rinard M (2011) Managing performance versus accuracy trade-offs with loop perforation. In: Proceedings of the ACM SIGSOFT symposium and European conference on foundations of software engineering (SIGSOFT/FSE)
Brandalero M, da Silveira LA, Souza JD, Beck ACS (2017) Accelerating error-tolerant applications with approximate function reuse. Sci Comput Progr 165:54–67
Hegde R, Shanbhag NR (1999) Energy-efficient signal processing via algorithmic noise-tolerance. In: Proceedings of the international symposium on low power electronics and design (ISPLED)
Mohapatra D, Chippa VK, Raghunathan A, Roy K (2011) Design of voltage-scalable meta-functions for approximate computing. In: Proceedings of the design, automation & test in Europe (DATE), pp 1–6
Brandalero M, Beck ACS, Carro L, Shafique M (2018) Approximate on-the-fly coarse-grained reconfigurable acceleration for general-purpose applications. In: Design automation conference (DAC), pp 1–6
Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2012) Neural acceleration for general-purpose approximate programs. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 449–460
Yazdanbakhsh A, Park J, Sharma, Lotfi-Kamran P, Esmaeilzadeh H (2015) Neural acceleration for GPU throughput processors. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 482–493
Moreau T et al. (2015) SNNAP: approximate computing on programmable SoCs via neural acceleration. In: Proceedings of the international symposium on high performance computer architecture (HPCA), pp 603–614
St. Amant R et al (2014) General-purpose code acceleration with limited-precision analog computation. ACM SIGARCH Comput Arch News 42(3):505–516
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Chaudhuri S, Gulwani S, Lublinerman R, Navidpour S (2011) Proving programs robust. In: Proceedings of the ACM SIGSOFT symposium and european conference on foundations of software engineering (SIGSOFT/FSE), p 102
Yazdanbakhsh A, Mahajan D, Lotfi-Kamran P, Esmaeilzadeh H (2016) AxBench: a multiplatform benchmark suite for approximate computing. IEEE Des Test 34(2):60–68
Muralimanohar N, Balasubramonian R, Jouppi NP (2009) CACTI 6.0: a tool to model large caches. Technical Report, HP Laboratories
Browne S, Dongarra J, Garner N, Ho G, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14(3):189–204
Han J, Orshansky M (2013) Approximate computing: an emerging paradigm for energy-efficient design. In: Proceedings of the European test symposium (ETS), pp 1–6
Hoffmann H et al. (2011) Dynamic knobs for responsive power-aware computing. In: ACM SIGARCH computer architecture news, vol 39, no 1. ACM, pp 199–212
Misailovic S, Sidiroglou S, Hoffmann H, Rinard M (2010) Quality of service profiling. In: Proceedings of the international conference on software engineering (ICSE), p 25
Mengte J, Raghunathan A, Chakradhar S, Byna S (2010) Exploiting the forgiving nature of applications for scalable parallel execution. In: IEEE international symposium on parallel and distributed processing (IPDPS). IEEE, pp 1–12
Misailovic S, Sidiroglou S, Rinard MC (2012) Dancing with uncertainty. In: Proceedings of the 2012 ACM workshop on relaxing synchronization for multicore and manycore scalability. ACM, pp 51–60
Recht B, Re C, Wright S, Niu F (2011) Hogwild: a lock-free approach to parallelizing stochastic gradient descent. Adv Neural Inf Process Syst 693–701
Renganarayana L, Srinivasan V, Nair R, Prener D (2012) Programming with relaxed synchronization. In: Proceedings of the 2012 ACM workshop on relaxing synchronization for multicore and manycore scalability. ACM, pp 41–50
Grigorian B, Farahpour N, Reinman G (2015) BRAINIAC: bringing reliable accuracy into neurally-implemented approximate computing. In: International symposium on high performance computer architecture (HPCA), pp 615–626
Chen T et al. (2012) BenchNN: on the broad potential application scope of hardware neural network accelerators. In: Proceedings of the international symposium on workload characterization (IISWC), pp 36–45
Ionica MH, Gregg D (2015) The movidius myriad architecture’s potential for scientific computing. IEEE Micro 35(1):6–14
Chen Y-H, Krishna T, Emer JS, Sze V (2016) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138
Yoffie DB (2014) Mobileye: the future of driverless cars. Harvard Business School Case, Boston, pp 421–715
Pham P-H et al (2012) Neuflow: dataflow vision processing system-on-a-chip. In: IEEE 55th international midwest symposium on circuits and systems (MWSCAS). IEEE, pp 1044–1047
Shoushtari M, BanaiyanMofrad A, Dutt N (2015) Exploiting partially-forgetful memories for approximate computing. IEEE Embed Syst Lett 7(1):19–22
Shafique M, Hafiz R, Rehman S, El-Harouni W, Henkel J (2016) Cross-layer approximate computing: from logic to architectures. In: Design automation conference (DAC), pp 1–6
Alvarez C, Corbal J, Valero M (2005) Fuzzy memoization for floating-point multimedia applications. IEEE Trans Comput 54(7):922–927
Liu S, Pattabiraman K, Moscibroda T, Zorn BG (2009) Flicker: saving refresh-power in mobile devices through critical data partitioning. In: Proceedings of the international conference on architectural support for programming languages and operating systems (ASPLOS’09). Citeseer
Lucas J, Alvarez-Mesa M, Andersch M, Juurlink B (2014) Sparkk: quality-scalable approximate storage in dram. In: Memory Forum 1–9
Chang IJ, Mohapatra D, Roy K (2011) A priority-based 6t/8t hybrid sram architecture for aggressive voltage scaling in video applications. IEEE Trans Circuits Syst Video Technol 21(2):101–112
Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666
Suresh A, Swamy BN, Rohou E, Seznec A (2015) Intercepting functions for memoization: a case study using transcendental functions. ACM Trans Archit Code Optim (TACO) 12(2):18
Sampson A et al (2011) EnerJ: approximate data types for safe and general low-power computation. In: Proceedings of the conference on programming language design and implementation (PLDI), vol 46, no 6, p 164
Baek W, Chilimbi TM (2010) Green: a framework for supporting energy-conscious programming using controlled approximation. In: ACM sigplan notices, vol 45, no 6. ACM, pp 198–209
Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2012) Architecture support for disciplined approximate programming. In: ACM SIGPLAN notices, vol 47, no 4. ACM, pp 301–312
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This study was financed in part by the CoordenaÇão de AperfeiÇoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. The authors would also like to thank CNPq and FAPERGS for partial support.
Rights and permissions
About this article
Cite this article
Jordan, M.G., Brandalero, M., Malfatti, G.M. et al. Data clustering for efficient approximate computing. Des Autom Embed Syst 24, 3–22 (2020). https://doi.org/10.1007/s10617-019-09228-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-019-09228-z