ABSTRACT
Motivated by the inherent error-resilience of emerging recognition, mining, and synthesis (RMS) applications, approximate computing techniques such as precision scaling has been advocated for achieving energy-efficiency gains at the cost of small accuracy loss. Most existing solutions, however, focus on the approximation of on-chip computations without considering that of off-chip data accesses, whose energy consumption may contribute to a significant portion of the total energy. In this work, we propose a novel approximate memory access technique for dynamic precision scaling, namely ApproxMA. To be specific, by taking both runtime data precision constraints and error-resilient capabilities of the application into consideration, ApproxMA determines the precision of data accesses and loads scaled data from off-chip memory for computation. Experimental results with mixture model-based clustering algorithms demonstrate the efficacy of the proposed methodology.
- V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, "Impact: imprecise adders for low-power approximate computing," in Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design, pp. 409--414, 2011. Google ScholarDigital Library
- J. Han and M. Orshansky, "Approximate computing: An emerging paradigm for energy-efficient design," in Proceedings of the 18th IEEE European Test Symposium (ETS), pp. 1--6, 2013.Google Scholar
- V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan, "Analysis and characterization of inherent application resilience for approximate computing," in Proceedings of the 50th Annual Design Automation Conference, p. 113, 2013. Google ScholarDigital Library
- Q. Zhang, F. Yuan, R. Ye, and Q. Xu, "Approxit: An approximate computing framework for iterative methods," in Proceedings of the 51st Annual Design Automation Conference, pp. 97:1--97:6, 2014. Google ScholarDigital Library
- Q. Zhang, T. Wang, Y. Tian, F. Yuan and Q. Xu, "ApproxANN: An Approximate Computing Framework for Artificial Neural Network," Proceedings IEEE/ACM Design, Automation, and Test in Europe (DATE), to appear, 2015. Google ScholarDigital Library
- S. Venkataramani, V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan, "Quality programmable vector processors for approximate computing," in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1--12, 2013. Google ScholarDigital Library
- O. Sarbishei and K. Radecka, "Analysis of precision for scaling the intermediate variables in fixed-point arithmetic circuits," in Proceedings of the International Conference on Computer-Aided Design, pp. 739--745, 2010. Google ScholarDigital Library
- S. Yoshizawa and Y. Miyanaga, "Tunable wordlength architecture for a low power wireless OFDM demodulator," IEICE Transactions on fundamentals of electronics, communications and computer sciences, vol. 89, no. 10, pp. 2866--2873, 2006. Google ScholarDigital Library
- S. Lee and A. Gerstlauer, "Fine grain word length optimization for dynamic precision scaling in dsp systems," in 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC), pp. 266--271, 2013.Google Scholar
- R. Ye, T. Wang, F. Yuan, R. Kumar, and Q. Xu, "On reconfiguration-oriented approximate adder design and its application," in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp. 48--54, 2013. Google ScholarDigital Library
- V. Melnykov, R. Maitra, "Finite mixture models and model-based clustering," Statistics Surveys, vol. 4, pp. 80--116, 2010.Google ScholarCross Ref
- C. M. Bishop et al., Pattern recognition and machine learning, vol. 1. springer New York, 2006. Google ScholarDigital Library
- H.-N. Nguyen, D. Menard, and O. Sentieys, "Dynamic precision scaling for low power WCDMA receiver," in IEEE International Symposium on Circuits and Systems, pp. 205--208, 2009.Google Scholar
- Y. Lee, Y. Choi, S.-B. Ko, and M. H. Lee, "Performance analysis of bit-width reduced floating-point arithmetic units in FPGAs: a case study of neural network-based face detector," EURASIP Journal on Embedded Systems, vol. 2009, p. 4, 2009. Google ScholarDigital Library
- N. Luehr, I. S. Ufimtsev, and T. J. Martínez, "Dynamic precision for electron repulsion integral evaluation on graphical processing units (GPUs)," Journal of Chemical Theory and Computation, vol. 7, no. 4, pp. 949--954, 2011.Google ScholarCross Ref
- H. Sangki, "3D super-via for memory applications," in Micro-Systems Packaging Initiative Packaging Workshop (WSPI), 2007.Google Scholar
- V. Sathish, M. J. Schulte, and N. S. Kim, "Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads," in Proceedings of ACM International Conference on Parallel Architectures and Compilation Techniques, 2012. Google ScholarDigital Library
- "CACTI," http://www.hpl.hp.com/research/cacti/.Google Scholar
- http://www.csie.ntu.edu.tw/~cjlin/libsvm/Google Scholar
- http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasetsGoogle Scholar
- T. K. Ho and E. M. Kleinberg, "Building projectable classifiers of arbitrary complexity," in Proceedings of the 13th IEEE International Conference on Pattern Recognition, vol. 2, pp. 880--885, 1996. Google ScholarDigital Library
Index Terms
- ApproxMA: Approximate Memory Access for Dynamic Precision Scaling
Recommendations
Cache Operations by MRU Change
The performance of set associative caches is analyzed. The method used is to group the cache lines into regions according to their positions in the replacement stacks of a cache, and then to observe how the memory access of a CPU is distributed over ...
Half-Wits: Software Techniques for Low-Voltage Probabilistic Storage on Microcontrollers with NOR Flash Memory
Special Section on Probabilistic Embedded ComputingThis work analyzes the stochastic behavior of writing to embedded flash memory at voltages lower than recommended by a microcontroller’s specifications in order to reduce energy consumption. Flash memory integrated within a microcontroller typically ...
Common-source-line array: An area efficient memory architecture for bipolar nonvolatile devices
Special Section on Networks on Chip: Architecture, Tools, and MethodologiesTraditional array organization of bipolar nonvolatile memories such as STT-MRAM and memristor utilizes two bitlines for cell manipulations. With technology scaling, such bitline pair will soon become the bottleneck for further density improvement. In ...
Comments