skip to main content
10.1145/2245276.2232031acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Affinity-aware DMA buffer management for reducing off-chip memory access

Published: 26 March 2012 Publication History

Abstract

It is well recognized that moving I/O data in/out memory has become critical for high bandwidth devices. Specifically, the embedded system, with limited cache size and simple architecture, consumes a large amount of CPU cycles for off-chip memory access. The work presented in this paper addresses this problem through an Affinity-aware DMA Buffer management strategy, called ADB, requiring no change to underlying hardware.
We introduce the concept of buffer affinity describes the data location of the recently released DMA buffer in the memory hierarchy. The more data in cache, the higher affinity the buffer has. Based on the character of the embedded system, we can identify buffer affinity at runtime. Using this online profiling, ADB allocates buffer with different affinity. For output processes, ADB allocates the high affinity buffer to reduce off-chip memory access when OS copies data from the user buffer to the kernel buffer. For input processes, ADB allocates the low affinity buffer to skip part of cache invalidation operations for maintaining I/O coherence. Measurements show that ADB, implemented in the Linux-2.6.32 kernel and running on a 1GHz UniCore-2 processor, improves the performance of network related programs from 5.2% to 8.8%.

References

[1]
http://www.fusionio.com/PDFs/Pressrelease_Pressrelease_HP_1millionIOPS.pdf, 2009.
[2]
A. Kumar and R. Huggahalli. Impact of Cache Coherence Protocols on the Processing of Network Traffic. In proc. MICRO'07, pages 161--171, 2009.
[3]
E. A. León, R. Riesen, K. B. Ferreira, and A. B. Maccabe. Cache injection for parallel applications. In proc. HDPC'11, pages 15--26, 2011.
[4]
R. Huggahalli, R. Iyer, and S. Tetrick. Direct Cache Access for High Bandwidth Network I/O. In proc. ISCA'05, pages 50--59, 2005.
[5]
M. Kaiserswerth. The Parallel Protocol Engine. I EEE/ACM Transaction on Networking, 1(6): 650--663, 1993.
[6]
D. Tang, Y. G. Bao, W. W. Hu, and M. Y. Chen. DMA cache: using on-chip storage to architecturally separate I/O data from CPU data for improving I/O performance. In Proc. HPCA'10, pages 1--12, 2010.
[7]
H. J. Chu. Zero-copy TCP in Solaris. In Proc. ATEC'96, pages 253--264, 1996.
[8]
X. Chen. An Improved Method of Zero-Copy Data Transmission in the High Speed Network Environment. In Proc. MINES'09, pages 281--284, 2009.
[9]
T. B. Berg. Maintaining I/O Data Coherence in Embedded Multicore Systems. IEEE Micro, 29(3): 10--19, 2009.
[10]
N. Qu, X. G. Gou, and X. Cheng. Using Uncacheable Memory to Improve Unity Linux Performance. In Proc. WIOSCA'05, pages 27--32, 2005.
[11]
S. B. Wickizer, R. Morris, and M. F. Kaashoek. Reinventing scheduling for multicore systems. In HotOS'09, pages 21--25, 2009.
[12]
C. Ding and Y. T. Zhong. Predicting whole-program locality through reuse distance analysis. In Proc. PLDI'03, pages 245--257, 2003.
[13]
Q. D. Lu, J. Lin, X. N. Ding, Z. Zhang, X. D. Zhang, and P. Sadayappan. Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning. In Proc. PACT'09, pages 246--257, 2009.
[14]
L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an OS-level software-only pollute buffer. In Proc. MICRO'08, pages 258--269, 2008.
[15]
X. Cheng, W. X. Yin, J. L. Lu, J. F. Yi, D. Tong, X. T. Guan, F. Liu, and X. H. Liu. The Research Progress of UniCore CPUs and PKUnity SoCs. Journal of Computer Science and Technology, 25(2): 200--213, 2010.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
March 2012
2179 pages
ISBN:9781450308571
DOI:10.1145/2245276
  • Conference Chairs:
  • Sascha Ossowski,
  • Paola Lecca
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DMA buffer management
  2. affinity-aware
  3. cache pollution
  4. input/output
  5. reduce off-chip memory access

Qualifiers

  • Research-article

Funding Sources

Conference

SAC 2012
Sponsor:
SAC 2012: ACM Symposium on Applied Computing
March 26 - 30, 2012
Trento, Italy

Acceptance Rates

SAC '12 Paper Acceptance Rate 270 of 1,056 submissions, 26%;
Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 155
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media