Energy efficiency via thread fusion and value reuse
Energy efficiency via thread fusion and value reuse
- Author(s): R. Rakvic ; J. González ; Q. Cai ; P. Chaparro ; G. Magklis ; A. González
- DOI: 10.1049/iet-cdt.2009.0040
For access to this article, please select a purchase option:
Buy article PDF
Buy Knowledge Pack
IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.
Thank you
Your recommendation has been sent to your librarian.
- Author(s): R. Rakvic 1 ; J. González 2 ; Q. Cai 2 ; P. Chaparro 2 ; G. Magklis 2 ; A. González 2
-
-
View affiliations
-
Affiliations:
1: United States Naval Academy, Annapolis, USA
2: UPC-Intel Lab Barcelona, Barcelona, Spain
-
Affiliations:
1: United States Naval Academy, Annapolis, USA
- Source:
Volume 4, Issue 2,
March 2010,
p.
114 – 125
DOI: 10.1049/iet-cdt.2009.0040 , Print ISSN 1751-8601, Online ISSN 1751-861X
Energy consumption has become the dominant metric when it comes to designing high-performance simultaneous multi-threaded (SMT) microprocessors. The authors propose a fusion of threads to help reduce energy consumption for these future SMT microprocessors. Threads are fused by merging two dynamic instances of the same static instruction into a single instruction thereby reducing unnecessary redundant computation in the front-end of the processor. The result is that power consumption is reduced in the pipeline until the execution stage. The authors have performed full system simulation, and our simulation results show average energy reduction of 10% with little impact on performance (less than 1%). They also extend thread fusion by proposing mechanisms to reduce the number of register file accesses and functional unit activity by reusing computations when the input values of the two dynamic instances of a fused instruction are the same. Our experiments show 5% energy savings in the integer register file and 10% in the integer functional units using this technique.
Inspec keywords: system-on-chip; microprocessor chips
Other keywords:
Subjects: Microprocessors and microcomputers; Microprocessor chips
References
-
-
1)
- Dubey, P.: ‘Recognition, mining and synthesis moves computers to the era of tera’. Technology@Intel Magazine, available at: http://www.intel.com/technology/magazine/computing/recognition-mining-synthesis-0205.htm, 2005.
-
2)
- Pajuelo, A., Gonzalez, A., Valero, M.: `Speculative dynamic vectorization', Proc. Int. Symp. on Computer Architecture, 2002.
-
3)
- R. Uhlig , R. Fishtein , O. Gershon , I. Hirsh , H. Wang . SoftSDV: a pre-silicon software development environment for the IA-64 architecture. Intel Technol. J. , 4
-
4)
- Chang, S.C., Li, W.Y.H., Kuo, Y.J., Chung, C.P.: `Early load: hiding load latency in deep pipeline processor', Proc. Asia-Pacific Computer Systems Architecture Conf. (ACSAC), August 2008, Taiwan.
-
5)
- D.M. Brooks , V. Tiwari , M. Martonosi . Wattch: a framework for architectural-level power analysis and optimization. Int. Symp. Comput. Architecture
-
6)
- Lopez, D., Llosa, J., Valero, M., Ayguade, E.: `Widening resources: a cost-effective technique for aggressive ILP architectures', Proc. Int. Symp. on Microarchitecture, 1998.
-
7)
- Kuck, D.J.: `Platform 2015 software: enabling innovation in parallelism for the next decade', Technical report, Intel White Paper, March 2005.
-
8)
- Borkar, S.Y.: `Platform 2015: Intel processor and platform evolution for the next decade', Technical report, Intel White Paper, March 2005.
-
9)
- Pisharath, J., Liu, Y., Ozisikyilmaz, B. ET AL.: ‘NU-MineBench project’. Available at: http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html.
-
10)
- Frequent Itemset Mining Implementations Repository: Available at: http://fimi.cs.helsinki.fi.
-
11)
- P. Kongetira , K. Aingaran , K. Olukotun . Niagara: a 32-way multithreaded Sparc processor. IEE Micro. , 2 , 21 - 29
-
12)
- Gonzalez, P., Cai, Q., Chaparro, P., Magklis, G., Rakvic, R., González, A.: `Thread fusion', Int. Symp. on Low Power Electronics and Designs (ISLPED), 2008.
-
13)
- P. Marchal , J.I. Gomez , D. Atienza , S. Mamagkakis , F. Catthoor . Power aware data and memory management for dynamic applications. IEE Proc. Comput. Dig. Tech. , 2 , 224 - 238
-
14)
- Ferri, C., Bahar, R.I., Loghi, M., Poncino, M.: `Energy-optimal synchronization primitives for single-chip multi-processors', Proc. 19th ACM Great Lakes Symp. VLSI, 10–12 May 2009, Boston, MA, USA.
-
15)
- R. Gonzalez , M. Horowitz . Energy dissipation in general purpose microprocessors. IEEE J. Solid State Circuits , 9 , 1277 - 1284
-
16)
- Hankins, R.A., Chinya, G.N., Collins, J.D.: `Multiple instruction stream processor', Proc. Int. Symp. on Computer Architecture, 2006.
-
17)
- F. Poletti , A. Poggiali , D. Bertozzi . Energy-efficient multiprocessor systems-on-chip for embedded computing: exploring programming models and their architectural support. IEEE Trans. Comput. , 5 , 606 - 621
-
18)
- Olukotun, K., Nayfeh, B.A., Hammond, L., Wilson, K., Chang, K.-Y.: `The case for a single chip multiprocessor', Proc. Int. Conf. on Architectural Support for Operating Systems, 1996.
-
19)
- A. Golander , S. Weiss . Reexecution and selective reuse in checkpoint processors. HiPEAC J. , 3 , 242 - 268
-
20)
- Davis, J.D., Laudon, J., Olukotun, K.: `Maximizing CMP throughput with mediocre cores', Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, 2005.
-
21)
- Bracy, A., Roth, A.: `Serialization-aware mini-graphs: performance with fewer resources', Proc. Int. Symp. on Microarchitecture, 2006.
-
22)
- Donald, J., Martonosi, M.: `Techniques for multicore thermal management: classification and new exploration', Proc. Int. Symp. on Computer Architecture, 2006.
-
23)
- Li, J., Martínez, J.F.: `Dynamic power performance adaptation of parallel computation on chip multiprocessors', Int. Symp. on High Performance Computer Architectures, 2006.
-
24)
- L.N. Vintan , A. Florea , A. Gellert . Focalising dynamic value prediction to CPU's context. IEE Proc. Comput. Digit. Tech.
-
25)
- A. Gellerta , A. Florea , L. Vintana . Exploiting selective instruction reuse and value prediction in a superscalar architecture. J. Syst. Archit. , 3 , 188 - 195
-
26)
- Isci, C., Buyuktosunoglu, A., Cher, C.-Y., Bose, P., Martonosi, M.: `An analysis of efficient multi-core global power management policies: maximizing performance for a given power budget', Proc. Int. Symp. on Microarchitecture, 2006.
-
27)
- M. Nourani , J. Chin . Test scheduling with power-time tradeoff and hot-spot avoidance using MILP. IEE Proc. Comput. Digital Tech. , 5 , 341 - 355
-
28)
- Vajapeyam, S., Joseph, P.J., Mitra, T.: `Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs', Proc. Int. Symp. on Computer Architecture, 1999.
-
29)
- Kumar, R., Jouppi, N.P., Tullsen, D.M.: `Cojoined-core chip multiprocessing', Proc. Int. Symp. on Microarchitecture, 2004.
-
30)
- Liao, C.-H., Shieh, J.-J.: `Exploiting speculative value reuse using value prediction', Proc. Seventh Asia-Pacific Conf. on Computer Systems Architecture, 1 January 2002, Melbourne, Victoria, Australia, p. 101–108.
-
31)
- Jaleel, A., Mattina, M., Jacob, B.: `Last level cache (LLC) performance of data-mining workloads on a CMP—a case study of parallel bioinformatics workloads', Proc. Int. Symp. on High Performance Computing, 2006.
-
32)
- Martel, I., Ortega, D., Ayguadé, E., Valero, M.: `Increasing effective IPC by exploiting distant parallelism', Proc. Int. Conf. on Supercomputing, 1999, p. 348–355.
-
33)
- Huh, J., Burger, D., Keckler, S.: `Exploring the design space of future CMPSs', Proc. Int. Symp. on Parallel Architectures and Compilation Techniques, 2001.
-
34)
- Li, J., Martínez, J., Huang, M.: `The thrifty barrier: energy-efficient synchronization in shared-memory multiprocessors', Proc. Int. Symp. on High Performance Computer Architectures, 2004.
-
35)
- Intel Corporation: Available at: http://www.intel.com/products/processor/atom/index.htm, 30 July 2009.
-
36)
- Sodani, A., Sohi, G.S.: `Dynamic instruction reuse', Proc. 24th Ann. Int. Symp. on Computer Architecture, 1–4 June 1997, Denver, Colorado, USA, p. 194–205.
-
37)
- , : `Throughput computing', Technical report, Sun White Paper, November 2005, Sun Microsystems:.
-
38)
- Tullsen, D., Eggers, S., Levy, H.: `Simultaneous multithreading: maximizing on-chip parallelism', Proc. Int. Symp. on Computer Architecture, 1995.
-
39)
- Computer-intensive highly parallel applications and uses. Intel Technol. J. , 2
-
40)
- Kumar, R., Zyuban, V., Tullsen, D.M.: `Interconnections in multi-core architectures: understanding mechanism, overheads and scaling', Proc. Int. Symp. on Computer Architecture, 2005.
-
41)
- S. Gochman , R. Ronnen , I. Anati . The Intel Pentium M processor: microarchitecture and performance. Intel Technol. J. , 2
-
42)
- Ekman, M., Dahlgren, F., Stenstrom, P.: `Evaluation of snoop-energy reduction techniques for chip-multiprocessors', Workshop on Duplicating, Deconstructing and Debunking, 2002.
-
1)