Adaptive Data Compression for Low-Power On-Chip Networks

Jin, Yuho; Yum, Ki Hwan; Kim, Eun Jung

doi:10.1007/978-1-4419-6911-8_6

Yuho Jin⁴,
Ki Hwan Yum &
Eun Jung Kim

935 Accesses

Abstract

With the recent design shift toward increasing the number of processing elements in a chip, supports for low power, low latency, and high bandwidth in on-chip interconnect are essential. Much of the previous work has focused on router architectures and network topologies using wide/long channels. However, such solutions may result in a complicated router design and a high interconnect power/area cost. In this chapter, we present a method to exploit a table-based data compression technique, relying on value patterns in cache traffic. Compressing a large packet into a small one saves power consumption by reducing required operations in network components and decreases contention by increasing the effective bandwidth of shared resources. The main challenges are providing a scalable implementation of tables and minimizing the latency overhead of compression. We propose a shared table scheme that needs one encoding and one decoding tables for each processing element, and a management protocol that does not require in-order delivery. This scheme eliminates table size dependence on a network size, which realizes scalability and reduces overhead cost of table for compression. Our simulation results are presented for 8-core and 16-core designs. Overall, our compression method improves the packet latency up to 44% with an average of 36% and reduces the network power consumption by 36% on average in 16-core tiled design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A flow represents a pair of source and destination. Therefore, an n-node network has n ² flows.
2.
Most routers have eight ports. Additionally, 12 routers have nine ports to connect a core (eight routers at periphery) or a memory controller (four routers in the center).
3.
A flit can be subdivided into multiple physical transfer units (phits). Each phit is transferred across a channel in a single cycle. We assume that the phit size is equal to the flit size in this chapter.
4.
VC allocator (R → p) has the longest latency among others and determines the delay of a router pipeline in both designs. Therefore, the number of VCs is selected for one clock cycle time.
5.
We put the value table at the router rather than each node in SNUCA-CMP.
6.
The destination sharing degree (dest) is the average number of destinations for each value. The source sharing degree (src) is the average number of sources for each value.

References

Alameldeen, A.R., Wood, D.A.: Adaptive Cache Compression for High-Performance Processors. In: Proceedings of ISCA, pp. 212–223 (2004)
Google Scholar
Basu, K., Choudhary, A.N., Pisharath, J., Kandemir, M.T.: Power Protocol: Reducing Power Dissipation on Off-Chip Data Buses. In: Proceedings of MICRO, pp. 345–355 (2002)
Google Scholar
Beckmann, B.M., Wood, D.A.: Managing Wire Delay in Large Chip-Multiprocessor Caches. In: Proceedings of MICRO, pp. 319–330 (2004)
Google Scholar
Cheng, L., Muralimanohar, N., Ramani, K., Balasubramonian, R., Carter, J.B.: Interconnect-Aware Coherence Protocols for Chip Multiprocessors. In: Proceedings of ISCA, pp. 339–351 (2006)
Google Scholar
Citron, D., Rudolph, L.: Creating a Wider Bus Using Caching Techniques. In: Proceedings of HPCA, pp. 90–99 (1995)
Google Scholar
Dally, W.J.: Express Cubes: Improving the Performance of k-Ary n-Cube Interconnection Networks. IEEE Transactions on Computers 40(9), 1016–1023 (1991)
Article Google Scholar
Dally, W.J., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Das, R., Mishra, A.K., Nicopolous, C., Park, D., Narayan, V., Iyer, R., Yousif, M.S., Das, C.R.: Performance and Power Optimization through Data Compression in Network-on-Chip Architectures. In: Proceedings of HPCA, pp. 215–225 (2008)
Google Scholar
Eisley, N., Peh, L.S., Shang, L.: In-Network Cache Coherence. In: Proceedings of MICRO, pp. 321–332 (2006)
Google Scholar
Hallnor, E.G., Reinhardt, S.K.: A Unified Compressed Memory Hierarchy. In: Proceedings of HPCA, pp. 201–212 (2005)
Google Scholar
Ho, R., Mai, K., Horowitz, M.: The Future of Wires. In: Proceedings of the IEEE, pp. 490–504 (2001)
Google Scholar
Hoskote, Y., Vangal, S., Singh, A., Borkar, N., Borkar, S.: A 5-GHz Mesh Interconnect for a Teraflops Processor. IEEE Micro 27(5), 51–61 (2007)
Article Google Scholar
Jayasimha, D.N., Zafar, B., Hoskote, Y.: Interconnection Networks: Why They are Different and How to Compare Them. Tech. rep., Microprocessor Technology Lab, Corporate Technology Group, Intel Corp (2007). http://blogs.intel.com/research/terascale/ODI_why-different.pdf
Jin, Y., Yum, K.H., Kim, E.J.: Adaptive Data Compression for High-Performance Low-Power On-Chip Networks. In: Proceedings of MICRO, pp. 354–363 (2008)
Google Scholar
Kim, C., Burger, D., Keckler, S.W.: An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches. In: Proceedings of ASPLOS, pp. 211–222 (2002)
Google Scholar
Kim, J., Balfour, J., Dally, W.J.: Flattened Butterfly Topology for On-Chip Networks. In: Proceedings of MICRO, pp. 172–182 (2007)
Google Scholar
Kirman, N., Kirman, M., Dokania, R.K., Martínez, J.F., Apsel, A.B., Watkins, M.A., Albonesi, D.H.: Leveraging Optical Technology in Future Bus-based Chip Multiprocessors. In: Proceedings of MICRO, pp. 492–503 (2006)
Google Scholar
Lipasti, M.H., Wilkerson, C.B., Shen, J.P.: Value Locality and Load Value Prediction. In: Proceedings of ASPLOS, pp. 138–147 (1996)
Google Scholar
Liu, C., Sivasubramaniam, A., Kandermir, M.: Optimizing Bus Energy Consumption of On-Chip Multiprocessors Using Frequent Values. Journal of Systems Architecture 52, 129–142 (2006)
Article Google Scholar
Lv, T., Henkel, J., Lekatsas, H., Wolf, W.: A Dictionary-Based En/Decoding Scheme for Low-Power Data Buses. IEEE Transactions on VLSI Systems 11(5), 943–951 (2003)
Article Google Scholar
Magnusson, P.S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A Full System Simulation Platform. IEEE Computer 35(2), 50–58 (2002)
Article Google Scholar
Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News 33(4), 92–99 (2005)
Article Google Scholar
Mullins, R.D., West, A., Moore, S.W.: Low-Latency Virtual-Channel Routers for On-Chip Networks. In: Proceedings of ISCA, pp. 188–197 (2004)
Google Scholar
Peh, L.S., Dally, W.J.: A Delay Model and Speculative Architecture for Pipelined Routers. In: Proceedings of HPCA, pp. 255–266 (2001)
Google Scholar
Sankaralingam, K., Nagarajan, R., Liu, H., Kim, C., Huh, J., Burger, D., Keckler, S.W., Moore, C.R.: Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture. In: Proceedings of ISCA, pp. 422–433 (2003)
Google Scholar
Sotiriadis, P.P., Chandrakasan, A.: Bus Energy Minimization by Transition Pattern Coding (TPC) in Deep Submicron Technologies. In: Proceedings of ICCAD, pp. 322–327 (2000)
Google Scholar
Stan, M., Burleson, W.: Bus-Invert Coding for Low-Power I/O. IEEE Transaction on VLSI 3(1), 49–58 (1995)
Article Google Scholar
Tarjan, D., Thoziyoor, S., Jouppi, N.P.: Cacti 4.0. Tech. Rep. HPL-2006-86, HP Laboratories (2006)
Google Scholar
Taylor, M.B., Lee, W., Amarasinghe, S.P., Agarwal, A.: Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architecture. In: Proceedings of HPCA, pp. 341–353 (2003)
Google Scholar
Wang, H., Zhu, X., Peh, L.S., Malik, S.: Orion: a Power-Performance Simulator for Interconnection Networks. In: Proceedings of MICRO, pp. 294–305 (2002)
Google Scholar
Wen, V., Whitney, M., Patel, Y., Kubiatowicz, J.: Exploiting Prediction to Reduce Power on Buses. In: Proceedings of HPCA, pp. 2–13 (2004)
Google Scholar
Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., III, J.F.B., Agarwal, A.: On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro 27(5), 15–31 (2007)
Google Scholar
Yang, J., Gupta, R.: Energy Efficient Frequent Value Data Cache Design. In: Proceedings of MICRO, pp. 197–207 (2002)
Google Scholar
Yang, J., Gupta, R., Zhang, C.: Frequent Value Encoding for Low Power Data Buses. ACM Transactions on Design Automation of Electronic Systems 9(3), 354–384 (2004)
Article Google Scholar
Zhang, M., Asanovic, K.: Highly-Associative Caches for Low-Power Processors. In: Kool Chips Workshop, MICRO-33 (2000)
Google Scholar
Zhang, Y., Yang, J., Gupta, R.: Frequent Value Locality and Value-Centric Data Cache Design. In: Proceedings of ASPLOS, pp. 150–159 (2000)
Google Scholar

Download references

Acknowledgements

This work was supported in part by NSF grants CCF-0541360 and CCF-0541384. Yuho Jin is currently supported by the National Science Foundation under Grant no. 0937060 to the Computing Research Association for the CIFellows Project.

Author information

Authors and Affiliations

Department of Electrical Engineering, University of Southern California, 3740 McClintock Ave., Los Angeles, CA, 90089, USA
Yuho Jin

Authors

Yuho Jin
View author publications
You can also search for this author in PubMed Google Scholar
Ki Hwan Yum
View author publications
You can also search for this author in PubMed Google Scholar
Eun Jung Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuho Jin .

Editor information

Editors and Affiliations

Dipto. Elettronica e Informazione (DEI), Politecnico di Milano, Via Ponzio 34/5, Milano, 20133, Italy
Cristina Silvano
NEC Laboratories America, Inc., Independence Way 4, Princeton, 08540, New Jersey, USA
Marcello Lajolo
Dipto. Elettronica e Informazione (DEI), Politecnico di Milano, Via Ponzio 34/5, Milano, 20133, Italy
Gianluca Palermo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jin, Y., Yum, K.H., Kim, E.J. (2011). Adaptive Data Compression for Low-Power On-Chip Networks. In: Silvano, C., Lajolo, M., Palermo, G. (eds) Low Power Networks-on-Chip. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6911-8_6

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6911-8_6
Published: 27 August 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6910-1
Online ISBN: 978-1-4419-6911-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics