Skip to main content

Adaptive Data Compression for Low-Power On-Chip Networks

  • Chapter
  • First Online:
Low Power Networks-on-Chip
  • 935 Accesses

Abstract

With the recent design shift toward increasing the number of processing elements in a chip, supports for low power, low latency, and high bandwidth in on-chip interconnect are essential. Much of the previous work has focused on router architectures and network topologies using wide/long channels. However, such solutions may result in a complicated router design and a high interconnect power/area cost. In this chapter, we present a method to exploit a table-based data compression technique, relying on value patterns in cache traffic. Compressing a large packet into a small one saves power consumption by reducing required operations in network components and decreases contention by increasing the effective bandwidth of shared resources. The main challenges are providing a scalable implementation of tables and minimizing the latency overhead of compression. We propose a shared table scheme that needs one encoding and one decoding tables for each processing element, and a management protocol that does not require in-order delivery. This scheme eliminates table size dependence on a network size, which realizes scalability and reduces overhead cost of table for compression. Our simulation results are presented for 8-core and 16-core designs. Overall, our compression method improves the packet latency up to 44% with an average of 36% and reduces the network power consumption by 36% on average in 16-core tiled design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A flow represents a pair of source and destination. Therefore, an n-node network has n 2 flows.

  2. 2.

    Most routers have eight ports. Additionally, 12 routers have nine ports to connect a core (eight routers at periphery) or a memory controller (four routers in the center).

  3. 3.

    A flit can be subdivided into multiple physical transfer units (phits). Each phit is transferred across a channel in a single cycle. We assume that the phit size is equal to the flit size in this chapter.

  4. 4.

    VC allocator (R → p) has the longest latency among others and determines the delay of a router pipeline in both designs. Therefore, the number of VCs is selected for one clock cycle time.

  5. 5.

    We put the value table at the router rather than each node in SNUCA-CMP.

  6. 6.

    The destination sharing degree (dest) is the average number of destinations for each value. The source sharing degree (src) is the average number of sources for each value.

References

  1. Alameldeen, A.R., Wood, D.A.: Adaptive Cache Compression for High-Performance Processors. In: Proceedings of ISCA, pp. 212–223 (2004)

    Google Scholar 

  2. Basu, K., Choudhary, A.N., Pisharath, J., Kandemir, M.T.: Power Protocol: Reducing Power Dissipation on Off-Chip Data Buses. In: Proceedings of MICRO, pp. 345–355 (2002)

    Google Scholar 

  3. Beckmann, B.M., Wood, D.A.: Managing Wire Delay in Large Chip-Multiprocessor Caches. In: Proceedings of MICRO, pp. 319–330 (2004)

    Google Scholar 

  4. Cheng, L., Muralimanohar, N., Ramani, K., Balasubramonian, R., Carter, J.B.: Interconnect-Aware Coherence Protocols for Chip Multiprocessors. In: Proceedings of ISCA, pp. 339–351 (2006)

    Google Scholar 

  5. Citron, D., Rudolph, L.: Creating a Wider Bus Using Caching Techniques. In: Proceedings of HPCA, pp. 90–99 (1995)

    Google Scholar 

  6. Dally, W.J.: Express Cubes: Improving the Performance of k-Ary n-Cube Interconnection Networks. IEEE Transactions on Computers 40(9), 1016–1023 (1991)

    Article  Google Scholar 

  7. Dally, W.J., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  8. Das, R., Mishra, A.K., Nicopolous, C., Park, D., Narayan, V., Iyer, R., Yousif, M.S., Das, C.R.: Performance and Power Optimization through Data Compression in Network-on-Chip Architectures. In: Proceedings of HPCA, pp. 215–225 (2008)

    Google Scholar 

  9. Eisley, N., Peh, L.S., Shang, L.: In-Network Cache Coherence. In: Proceedings of MICRO, pp. 321–332 (2006)

    Google Scholar 

  10. Hallnor, E.G., Reinhardt, S.K.: A Unified Compressed Memory Hierarchy. In: Proceedings of HPCA, pp. 201–212 (2005)

    Google Scholar 

  11. Ho, R., Mai, K., Horowitz, M.: The Future of Wires. In: Proceedings of the IEEE, pp. 490–504 (2001)

    Google Scholar 

  12. Hoskote, Y., Vangal, S., Singh, A., Borkar, N., Borkar, S.: A 5-GHz Mesh Interconnect for a Teraflops Processor. IEEE Micro 27(5), 51–61 (2007)

    Article  Google Scholar 

  13. Jayasimha, D.N., Zafar, B., Hoskote, Y.: Interconnection Networks: Why They are Different and How to Compare Them. Tech. rep., Microprocessor Technology Lab, Corporate Technology Group, Intel Corp (2007). http://blogs.intel.com/research/terascale/ODI_why-different.pdf

  14. Jin, Y., Yum, K.H., Kim, E.J.: Adaptive Data Compression for High-Performance Low-Power On-Chip Networks. In: Proceedings of MICRO, pp. 354–363 (2008)

    Google Scholar 

  15. Kim, C., Burger, D., Keckler, S.W.: An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches. In: Proceedings of ASPLOS, pp. 211–222 (2002)

    Google Scholar 

  16. Kim, J., Balfour, J., Dally, W.J.: Flattened Butterfly Topology for On-Chip Networks. In: Proceedings of MICRO, pp. 172–182 (2007)

    Google Scholar 

  17. Kirman, N., Kirman, M., Dokania, R.K., Martínez, J.F., Apsel, A.B., Watkins, M.A., Albonesi, D.H.: Leveraging Optical Technology in Future Bus-based Chip Multiprocessors. In: Proceedings of MICRO, pp. 492–503 (2006)

    Google Scholar 

  18. Lipasti, M.H., Wilkerson, C.B., Shen, J.P.: Value Locality and Load Value Prediction. In: Proceedings of ASPLOS, pp. 138–147 (1996)

    Google Scholar 

  19. Liu, C., Sivasubramaniam, A., Kandermir, M.: Optimizing Bus Energy Consumption of On-Chip Multiprocessors Using Frequent Values. Journal of Systems Architecture 52, 129–142 (2006)

    Article  Google Scholar 

  20. Lv, T., Henkel, J., Lekatsas, H., Wolf, W.: A Dictionary-Based En/Decoding Scheme for Low-Power Data Buses. IEEE Transactions on VLSI Systems 11(5), 943–951 (2003)

    Article  Google Scholar 

  21. Magnusson, P.S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A Full System Simulation Platform. IEEE Computer 35(2), 50–58 (2002)

    Article  Google Scholar 

  22. Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News 33(4), 92–99 (2005)

    Article  Google Scholar 

  23. Mullins, R.D., West, A., Moore, S.W.: Low-Latency Virtual-Channel Routers for On-Chip Networks. In: Proceedings of ISCA, pp. 188–197 (2004)

    Google Scholar 

  24. Peh, L.S., Dally, W.J.: A Delay Model and Speculative Architecture for Pipelined Routers. In: Proceedings of HPCA, pp. 255–266 (2001)

    Google Scholar 

  25. Sankaralingam, K., Nagarajan, R., Liu, H., Kim, C., Huh, J., Burger, D., Keckler, S.W., Moore, C.R.: Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture. In: Proceedings of ISCA, pp. 422–433 (2003)

    Google Scholar 

  26. Sotiriadis, P.P., Chandrakasan, A.: Bus Energy Minimization by Transition Pattern Coding (TPC) in Deep Submicron Technologies. In: Proceedings of ICCAD, pp. 322–327 (2000)

    Google Scholar 

  27. Stan, M., Burleson, W.: Bus-Invert Coding for Low-Power I/O. IEEE Transaction on VLSI 3(1), 49–58 (1995)

    Article  Google Scholar 

  28. Tarjan, D., Thoziyoor, S., Jouppi, N.P.: Cacti 4.0. Tech. Rep. HPL-2006-86, HP Laboratories (2006)

    Google Scholar 

  29. Taylor, M.B., Lee, W., Amarasinghe, S.P., Agarwal, A.: Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architecture. In: Proceedings of HPCA, pp. 341–353 (2003)

    Google Scholar 

  30. Wang, H., Zhu, X., Peh, L.S., Malik, S.: Orion: a Power-Performance Simulator for Interconnection Networks. In: Proceedings of MICRO, pp. 294–305 (2002)

    Google Scholar 

  31. Wen, V., Whitney, M., Patel, Y., Kubiatowicz, J.: Exploiting Prediction to Reduce Power on Buses. In: Proceedings of HPCA, pp. 2–13 (2004)

    Google Scholar 

  32. Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., III, J.F.B., Agarwal, A.: On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro 27(5), 15–31 (2007)

    Google Scholar 

  33. Yang, J., Gupta, R.: Energy Efficient Frequent Value Data Cache Design. In: Proceedings of MICRO, pp. 197–207 (2002)

    Google Scholar 

  34. Yang, J., Gupta, R., Zhang, C.: Frequent Value Encoding for Low Power Data Buses. ACM Transactions on Design Automation of Electronic Systems 9(3), 354–384 (2004)

    Article  Google Scholar 

  35. Zhang, M., Asanovic, K.: Highly-Associative Caches for Low-Power Processors. In: Kool Chips Workshop, MICRO-33 (2000)

    Google Scholar 

  36. Zhang, Y., Yang, J., Gupta, R.: Frequent Value Locality and Value-Centric Data Cache Design. In: Proceedings of ASPLOS, pp. 150–159 (2000)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by NSF grants CCF-0541360 and CCF-0541384. Yuho Jin is currently supported by the National Science Foundation under Grant no. 0937060 to the Computing Research Association for the CIFellows Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuho Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Jin, Y., Yum, K.H., Kim, E.J. (2011). Adaptive Data Compression for Low-Power On-Chip Networks. In: Silvano, C., Lajolo, M., Palermo, G. (eds) Low Power Networks-on-Chip. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6911-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6911-8_6

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-6910-1

  • Online ISBN: 978-1-4419-6911-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics