research-article

Affinity-aware DMA buffer management for reducing off-chip memory access

Authors:

Keyi WangAuthors Info & Claims

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing

Pages 1588 - 1593

https://doi.org/10.1145/2245276.2232031

Published: 26 March 2012 Publication History

Abstract

It is well recognized that moving I/O data in/out memory has become critical for high bandwidth devices. Specifically, the embedded system, with limited cache size and simple architecture, consumes a large amount of CPU cycles for off-chip memory access. The work presented in this paper addresses this problem through an Affinity-aware DMA Buffer management strategy, called ADB, requiring no change to underlying hardware.

We introduce the concept of buffer affinity describes the data location of the recently released DMA buffer in the memory hierarchy. The more data in cache, the higher affinity the buffer has. Based on the character of the embedded system, we can identify buffer affinity at runtime. Using this online profiling, ADB allocates buffer with different affinity. For output processes, ADB allocates the high affinity buffer to reduce off-chip memory access when OS copies data from the user buffer to the kernel buffer. For input processes, ADB allocates the low affinity buffer to skip part of cache invalidation operations for maintaining I/O coherence. Measurements show that ADB, implemented in the Linux-2.6.32 kernel and running on a 1GHz UniCore-2 processor, improves the performance of network related programs from 5.2% to 8.8%.

References

[1]

http://www.fusionio.com/PDFs/Pressrelease_Pressrelease_HP_1millionIOPS.pdf, 2009.

[2]

A. Kumar and R. Huggahalli. Impact of Cache Coherence Protocols on the Processing of Network Traffic. In proc. MICRO'07, pages 161--171, 2009.

Digital Library

[3]

E. A. León, R. Riesen, K. B. Ferreira, and A. B. Maccabe. Cache injection for parallel applications. In proc. HDPC'11, pages 15--26, 2011.

Digital Library

[4]

R. Huggahalli, R. Iyer, and S. Tetrick. Direct Cache Access for High Bandwidth Network I/O. In proc. ISCA'05, pages 50--59, 2005.

Digital Library

[5]

M. Kaiserswerth. The Parallel Protocol Engine. I EEE/ACM Transaction on Networking, 1(6): 650--663, 1993.

Digital Library

[6]

D. Tang, Y. G. Bao, W. W. Hu, and M. Y. Chen. DMA cache: using on-chip storage to architecturally separate I/O data from CPU data for improving I/O performance. In Proc. HPCA'10, pages 1--12, 2010.

[7]

H. J. Chu. Zero-copy TCP in Solaris. In Proc. ATEC'96, pages 253--264, 1996.

Digital Library

[8]

X. Chen. An Improved Method of Zero-Copy Data Transmission in the High Speed Network Environment. In Proc. MINES'09, pages 281--284, 2009.

Digital Library

[9]

T. B. Berg. Maintaining I/O Data Coherence in Embedded Multicore Systems. IEEE Micro, 29(3): 10--19, 2009.

Digital Library

[10]

N. Qu, X. G. Gou, and X. Cheng. Using Uncacheable Memory to Improve Unity Linux Performance. In Proc. WIOSCA'05, pages 27--32, 2005.

[11]

S. B. Wickizer, R. Morris, and M. F. Kaashoek. Reinventing scheduling for multicore systems. In HotOS'09, pages 21--25, 2009.

Digital Library

[12]

C. Ding and Y. T. Zhong. Predicting whole-program locality through reuse distance analysis. In Proc. PLDI'03, pages 245--257, 2003.

Digital Library

[13]

Q. D. Lu, J. Lin, X. N. Ding, Z. Zhang, X. D. Zhang, and P. Sadayappan. Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning. In Proc. PACT'09, pages 246--257, 2009.

Digital Library

[14]

L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an OS-level software-only pollute buffer. In Proc. MICRO'08, pages 258--269, 2008.

Digital Library

[15]

X. Cheng, W. X. Yin, J. L. Lu, J. F. Yi, D. Tong, X. T. Guan, F. Liu, and X. H. Liu. The Research Progress of UniCore CPUs and PKUnity SoCs. Journal of Computer Science and Technology, 25(2): 200--213, 2010.

Index Terms

Affinity-aware DMA buffer management for reducing off-chip memory access
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management
        Input / output
        Memory management
        Allocation / deallocation strategies

Recommendations

Write-Aware Management of NVM-based Memory Extensions
ICS '16: Proceedings of the 2016 International Conference on Supercomputing

Emerging Non-Volatile Memory (NVM) technologies, such as 3D XPoint, are expected to be in production as early as 2016. Emerging NVMs are very attractive for several reasons. First, they are non-volatile and hence incur no refresh power. Second, they are ...
Feasibility of decoupling memory management from the execution pipeline

In conventional architectures, the central processing unit (CPU) spends a significant amount of execution time allocating and de-allocating memory. Efforts to improve memory management functions using custom allocators have led to only small ...
Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems
RTAS '11: Proceedings of the 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium

Flash memory is becoming the storage media of choice for mobile devices and embedded systems. The performance of flash memory is impacted by the asymmetric speed of read and write operations, limited number of erase times and the absence of in-place ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing

March 2012

2179 pages

ISBN:9781450308571

DOI:10.1145/2245276

Conference Chairs:
Sascha Ossowski
University Rey Juan Carlos, Spain
,
Paola Lecca
The Microsoft Research - University of Trento COSBI, Italy

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Science and Technology of the People's Republic of China

Conference

SAC 2012

Sponsor:

SIGAPP

SAC 2012: ACM Symposium on Applied Computing

March 26 - 30, 2012

Trento, Italy

Acceptance Rates

SAC '12 Paper Acceptance Rate 270 of 1,056 submissions, 26%;

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
155
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents