Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

Energy efficiency via thread fusion and value reuse

Energy efficiency via thread fusion and value reuse

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computers & Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Energy consumption has become the dominant metric when it comes to designing high-performance simultaneous multi-threaded (SMT) microprocessors. The authors propose a fusion of threads to help reduce energy consumption for these future SMT microprocessors. Threads are fused by merging two dynamic instances of the same static instruction into a single instruction thereby reducing unnecessary redundant computation in the front-end of the processor. The result is that power consumption is reduced in the pipeline until the execution stage. The authors have performed full system simulation, and our simulation results show average energy reduction of 10% with little impact on performance (less than 1%). They also extend thread fusion by proposing mechanisms to reduce the number of register file accesses and functional unit activity by reusing computations when the input values of the two dynamic instances of a fused instruction are the same. Our experiments show 5% energy savings in the integer register file and 10% in the integer functional units using this technique.

References

    1. 1)
      • Dubey, P.: ‘Recognition, mining and synthesis moves computers to the era of tera’. Technology@Intel Magazine, available at: http://www.intel.com/technology/magazine/computing/recognition-mining-synthesis-0205.htm, 2005.
    2. 2)
      • Pajuelo, A., Gonzalez, A., Valero, M.: `Speculative dynamic vectorization', Proc. Int. Symp. on Computer Architecture, 2002.
    3. 3)
      • R. Uhlig , R. Fishtein , O. Gershon , I. Hirsh , H. Wang . SoftSDV: a pre-silicon software development environment for the IA-64 architecture. Intel Technol. J. , 4
    4. 4)
      • Chang, S.C., Li, W.Y.H., Kuo, Y.J., Chung, C.P.: `Early load: hiding load latency in deep pipeline processor', Proc. Asia-Pacific Computer Systems Architecture Conf. (ACSAC), August 2008, Taiwan.
    5. 5)
      • D.M. Brooks , V. Tiwari , M. Martonosi . Wattch: a framework for architectural-level power analysis and optimization. Int. Symp. Comput. Architecture
    6. 6)
      • Lopez, D., Llosa, J., Valero, M., Ayguade, E.: `Widening resources: a cost-effective technique for aggressive ILP architectures', Proc. Int. Symp. on Microarchitecture, 1998.
    7. 7)
      • Kuck, D.J.: `Platform 2015 software: enabling innovation in parallelism for the next decade', Technical report, Intel White Paper, March 2005.
    8. 8)
      • Borkar, S.Y.: `Platform 2015: Intel processor and platform evolution for the next decade', Technical report, Intel White Paper, March 2005.
    9. 9)
      • Pisharath, J., Liu, Y., Ozisikyilmaz, B. ET AL.: ‘NU-MineBench project’. Available at: http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html.
    10. 10)
      • Frequent Itemset Mining Implementations Repository: Available at: http://fimi.cs.helsinki.fi.
    11. 11)
      • P. Kongetira , K. Aingaran , K. Olukotun . Niagara: a 32-way multithreaded Sparc processor. IEE Micro. , 2 , 21 - 29
    12. 12)
      • Gonzalez, P., Cai, Q., Chaparro, P., Magklis, G., Rakvic, R., González, A.: `Thread fusion', Int. Symp. on Low Power Electronics and Designs (ISLPED), 2008.
    13. 13)
      • P. Marchal , J.I. Gomez , D. Atienza , S. Mamagkakis , F. Catthoor . Power aware data and memory management for dynamic applications. IEE Proc. Comput. Dig. Tech. , 2 , 224 - 238
    14. 14)
      • Ferri, C., Bahar, R.I., Loghi, M., Poncino, M.: `Energy-optimal synchronization primitives for single-chip multi-processors', Proc. 19th ACM Great Lakes Symp. VLSI, 10–12 May 2009, Boston, MA, USA.
    15. 15)
      • R. Gonzalez , M. Horowitz . Energy dissipation in general purpose microprocessors. IEEE J. Solid State Circuits , 9 , 1277 - 1284
    16. 16)
      • Hankins, R.A., Chinya, G.N., Collins, J.D.: `Multiple instruction stream processor', Proc. Int. Symp. on Computer Architecture, 2006.
    17. 17)
      • F. Poletti , A. Poggiali , D. Bertozzi . Energy-efficient multiprocessor systems-on-chip for embedded computing: exploring programming models and their architectural support. IEEE Trans. Comput. , 5 , 606 - 621
    18. 18)
      • Olukotun, K., Nayfeh, B.A., Hammond, L., Wilson, K., Chang, K.-Y.: `The case for a single chip multiprocessor', Proc. Int. Conf. on Architectural Support for Operating Systems, 1996.
    19. 19)
      • A. Golander , S. Weiss . Reexecution and selective reuse in checkpoint processors. HiPEAC J. , 3 , 242 - 268
    20. 20)
      • Davis, J.D., Laudon, J., Olukotun, K.: `Maximizing CMP throughput with mediocre cores', Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, 2005.
    21. 21)
      • Bracy, A., Roth, A.: `Serialization-aware mini-graphs: performance with fewer resources', Proc. Int. Symp. on Microarchitecture, 2006.
    22. 22)
      • Donald, J., Martonosi, M.: `Techniques for multicore thermal management: classification and new exploration', Proc. Int. Symp. on Computer Architecture, 2006.
    23. 23)
      • Li, J., Martínez, J.F.: `Dynamic power performance adaptation of parallel computation on chip multiprocessors', Int. Symp. on High Performance Computer Architectures, 2006.
    24. 24)
      • L.N. Vintan , A. Florea , A. Gellert . Focalising dynamic value prediction to CPU's context. IEE Proc. Comput. Digit. Tech.
    25. 25)
      • A. Gellerta , A. Florea , L. Vintana . Exploiting selective instruction reuse and value prediction in a superscalar architecture. J. Syst. Archit. , 3 , 188 - 195
    26. 26)
      • Isci, C., Buyuktosunoglu, A., Cher, C.-Y., Bose, P., Martonosi, M.: `An analysis of efficient multi-core global power management policies: maximizing performance for a given power budget', Proc. Int. Symp. on Microarchitecture, 2006.
    27. 27)
      • M. Nourani , J. Chin . Test scheduling with power-time tradeoff and hot-spot avoidance using MILP. IEE Proc. Comput. Digital Tech. , 5 , 341 - 355
    28. 28)
      • Vajapeyam, S., Joseph, P.J., Mitra, T.: `Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs', Proc. Int. Symp. on Computer Architecture, 1999.
    29. 29)
      • Kumar, R., Jouppi, N.P., Tullsen, D.M.: `Cojoined-core chip multiprocessing', Proc. Int. Symp. on Microarchitecture, 2004.
    30. 30)
      • Liao, C.-H., Shieh, J.-J.: `Exploiting speculative value reuse using value prediction', Proc. Seventh Asia-Pacific Conf. on Computer Systems Architecture, 1 January 2002, Melbourne, Victoria, Australia, p. 101–108.
    31. 31)
      • Jaleel, A., Mattina, M., Jacob, B.: `Last level cache (LLC) performance of data-mining workloads on a CMP—a case study of parallel bioinformatics workloads', Proc. Int. Symp. on High Performance Computing, 2006.
    32. 32)
      • Martel, I., Ortega, D., Ayguadé, E., Valero, M.: `Increasing effective IPC by exploiting distant parallelism', Proc. Int. Conf. on Supercomputing, 1999, p. 348–355.
    33. 33)
      • Huh, J., Burger, D., Keckler, S.: `Exploring the design space of future CMPSs', Proc. Int. Symp. on Parallel Architectures and Compilation Techniques, 2001.
    34. 34)
      • Li, J., Martínez, J., Huang, M.: `The thrifty barrier: energy-efficient synchronization in shared-memory multiprocessors', Proc. Int. Symp. on High Performance Computer Architectures, 2004.
    35. 35)
      • Intel Corporation: Available at: http://www.intel.com/products/processor/atom/index.htm, 30 July 2009.
    36. 36)
      • Sodani, A., Sohi, G.S.: `Dynamic instruction reuse', Proc. 24th Ann. Int. Symp. on Computer Architecture, 1–4 June 1997, Denver, Colorado, USA, p. 194–205.
    37. 37)
      • , : `Throughput computing', Technical report, Sun White Paper, November 2005, Sun Microsystems:.
    38. 38)
      • Tullsen, D., Eggers, S., Levy, H.: `Simultaneous multithreading: maximizing on-chip parallelism', Proc. Int. Symp. on Computer Architecture, 1995.
    39. 39)
      • Computer-intensive highly parallel applications and uses. Intel Technol. J. , 2
    40. 40)
      • Kumar, R., Zyuban, V., Tullsen, D.M.: `Interconnections in multi-core architectures: understanding mechanism, overheads and scaling', Proc. Int. Symp. on Computer Architecture, 2005.
    41. 41)
      • S. Gochman , R. Ronnen , I. Anati . The Intel Pentium M processor: microarchitecture and performance. Intel Technol. J. , 2
    42. 42)
      • Ekman, M., Dahlgren, F., Stenstrom, P.: `Evaluation of snoop-energy reduction techniques for chip-multiprocessors', Workshop on Duplicating, Deconstructing and Debunking, 2002.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2009.0040
Loading

Related content

content/journals/10.1049/iet-cdt.2009.0040
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address