research-article

Efficiently identifying working sets in block I/O streams

Authors:

Ethan L. Miller,

Lee WardAuthors Info & Claims

SYSTOR '11: Proceedings of the 4th Annual International Conference on Systems and Storage

Article No.: 5, Pages 1 - 12

https://doi.org/10.1145/1987816.1987823

Published: 30 May 2011 Publication History

Abstract

Identifying groups of blocks that tend to be read or written together in a given environment is the first step towards powerful techniques for device failure isolation and power management. For example, identified groups can be placed together on a single disk, avoiding excess drive activity across an exascale storage system. Unlike previous grouping work, we focus on identifying groupings in data that can be gathered from real, running systems with minimal impact. Using temporal, spatial, and access ordering information from an enterprise data set, we identified a set of groupings that consistently appear, indicating that these are working sets that are likely to be accessed together. We present several techniques to obtain groupings along with a discussion of what techniques best apply to particular types of real systems. We intend to use these preliminary results to inform our search for new types of workloads with a goal of identifying properties of easily separable workloads across different systems and dynamically moving groups in these workloads to reduce disk activity in large storage systems.

References

[1]

A. Amer and D.D.E. Long. Aggregating caches: A mechanism for implicit file prefetching. In MASCOTS 2001, pages 293--301. IEEE, 2002.

Digital Library

[2]

A. Amer, D.D.E. Long, J.F. Paris, and R.C. Burns. File access prediction with adjustable accuracy. In IPCCC 2002, pages 131--140. IEEE Computer Society, 2002.

Digital Library

[3]

I. Ari, A. Amer, R. Gramacy, E.L. Miller, S.A. Brandt, and D.D.E. Long. ACME: adaptive caching using multiple experts. In Proceedings in Informatics, volume 14, pages 143--158. Citeseer, 2002.

[4]

A. C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, L.N. Bairavasundaram, T.E. Denehy, F.I. Popovici, V. Prabhakaran, and M. Sivathanu. Semantically-smart disk systems: past, present, and future. ACM SIGMETRICS Performance Evaluation Review, 33(4):29--35, 2006.

Digital Library

[5]

D. Colarelli and D. Grunwald. Massive arrays of idle disks for storage archives. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, page 11. IEEE Computer Society Press, 2002.

Digital Library

[6]

T. H. Cormen, C.E. Leiserson, and R.L. Rivest. Algorithms. MIT Press, Cambridge, Massachusetts, 1990.

[7]

X. Ding, S. Jiang, F. Chen, K. Davis, and X. Zhang. DiskSeen: exploiting disk layout and access history to enhance I/O prefetch. In 2007 USENIX ATC, pages 1--14. USENIX Association, 2007.

Digital Library

[8]

S. Doraimani and A. Iamnitchi. File grouping for scientific data management: lessons from experimenting with real traces. In Proceedings of the 17th international symposium on High performance distributed computing, pages 153--164. ACM, 2008.

Digital Library

[9]

J. Duch and A. Arenas. Community detection in complex networks using extremal optimization. Physical Review E, 72(2):027104, 2005.

[10]

R. O. Duda, P.E. Hart, and D.G. Stork. Pattern classification, volume 2. Citeseer, 2001.

Digital Library

[11]

D. Essary and A. Amer. Predictive data grouping: Defining the bounds of energy and latency reduction through predictive data grouping and replication. Trans. Storage, 4(1):1--23, 2008.

Digital Library

[12]

S. Jiang, X. Ding, F. Chen, E. Tan, and X. Zhang. DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality. In FAST 2005, page 8. USENIX Association, 2005.

Digital Library

[13]

T. M. Kroeger and D.D.E. Long. Predicting file system actions from prior events. In Proceedings of the 1996 annual conference on USENIX Annual Technical Conference, page 26. Usenix Association, 1996.

Digital Library

[14]

T. M. Kroeger and D.E. Long. Design and implementation of a predictive file prefetching algorithm. In USENIX Annual Technical Conference, General Track, pages 105--118, 2001.

Digital Library

[15]

Z. Li, Z. Chen, S.M. Srinivasan, and Y. Zhou. C-miner: Mining block correlations in storage systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pages 173--186. USENIX Association, 2004.

Digital Library

[16]

Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry. A fast file system for UNIX. ACM Transactions on Computer Systems, 2(3):181--197, August 1984.

Digital Library

[17]

D. Narayanan, A. Donnelly, and A. Rowstron. Write off-loading: Practical power management for enterprise storage. ACM Transactions on Storage (TOS), 4(3):1--23, 2008.

Digital Library

[18]

J. Oly and D.A. Reed. Markov model prediction of I/O requests for scientific applications. In Proceedings of the 16th international conference on Supercomputing, pages 147--155. ACM, 2002.

Digital Library

[19]

E. Pinheiro and R. Bianchini. Energy conservation techniques for disk array-based servers. In ICS '04, pages 68--78. ACM, 2004.

Digital Library

[20]

E. Pinheiro, W.D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FASTâĂ&Zacute;07), 2007.

Digital Library

[21]

W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336):846--850, 1971.

[22]

A. Riska and E. Riedel. Disk drive level workload characterization. In Proceedings of the USENIX Annual Technical Conference, pages 97--103, 2006.

Digital Library

[23]

J. Schindler, J.L. Griffin, C.R. Lumb, and G.R. Ganger. Track-aligned extents: matching access patterns to disk drive characteristics. In Conference on File and Storage Technologies, 2002.

Digital Library

[24]

M. Sivathanu, V. Prabhakaran, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau. Improving storage system availability with D-GRAID. ACM TOS, 1(2):133--170, 2005.

Digital Library

[25]

M. Sivathanu, V. Prabhakaran, F.I. Popovici, T.E. Denehy, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau. Semantically-smart disk systems. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 73--88, 2003.

Digital Library

[26]

C. Staelin and H. Garcia-Molina. Clustering active disk data to improve disk performance. Princeton, NJ, USA, Tech. Rep. CS--TR--298--90, 1990.

[27]

A. S. Tanenbaum, J.N. Herder, and H. Bos. File size distribution on UNIX systems: then and now. ACM SIGOPS Operating Systems Review, 40(1):104, 2006.

Digital Library

[28]

J. Wang and Y. Hu. PROFS-performance-oriented data reorganization for log-structured file system on multi-zone disks. In mascots, page 0285. Published by the IEEE Computer Society, 2001.

Digital Library

[29]

A. Wildani and E.L. Miller. Semantic data placement for power management in archival storage. In Petascale Data Storage Workshop (PDSW), 2010 5th, pages 1--5. IEEE, 2010.

[30]

N. J. Yadwadkar, C. Bhattacharyya, K. Gopinath, T. Niranjan, and S. Susarla. Discovery of application workloads from network file traces. In Proceedings of the 8th USENIX conference on File and storage technologies, page 14. USENIX Association, 2010.

Digital Library

[31]

X. Zhuang and H.H.S. Lee. Reducing cache pollution via dynamic data prefetch filtering. IEEE Transactions on Computers, pages 18--31, 2007.

Digital Library

Cited By

Estro TBhandari PWildani AZadok EBadam AChidambaram V(2020)Desperately seeking ... optimal multi-tier cache configurationsProceedings of the 12th USENIX Conference on Hot Topics in Storage and File Systems10.5555/3488733.3488739(6-6)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3488733.3488739
Wildani AMiller E(2016)Can We Group Storage? Statistical Techniques to Identify Predictive Groupings in Storage System AccessesACM Transactions on Storage10.1145/273804212:2(1-33)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1145/2738042
Karakoyunlu CChandy J(2016)Exploiting user metadata for energy-aware node allocation in a cloud storage systemJournal of Computer and System Sciences10.1016/j.jcss.2015.09.00382:2(282-309)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.jcss.2015.09.003
Show More Cited By

Index Terms

Efficiently identifying working sets in block I/O streams
1. Hardware
  1. Communication hardware, interfaces and storage

Recommendations

Techniques for efficiently allocating persistent storage

Efficient disk storage is a crucial component for many applications. The commonly used method of storing data on disk using file systems or databases incurs significant overhead which can be a problem for applications which need to frequently access and ...
A large-scale study of file-system contents
A large-scale study of file-system contents
SIGMETRICS '99: Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SYSTOR '11: Proceedings of the 4th Annual International Conference on Systems and Storage

May 2011

189 pages

ISBN:9781450307734

DOI:10.1145/1987816

General Chair:
Paula Ta-Shma
IBM
,
Program Chairs:
Jose Moreira
IBM
,
Liuba Shrira
Brandeis U

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

NetApp
Mellanox: Mellanox Technologies
Hewlett-Packard
Intel: Intel
Red Hat: Red Hat, Inc.
MARVELL: Marvell Technology Group
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SYSTOR '11

Sponsor:

Mellanox
Intel
Red Hat
MARVELL
IBM

SYSTOR '11: The 4th Annual International Conference on Systems and Storage

May 30 - June 1, 2011

Haifa, Israel

Acceptance Rates

SYSTOR '11 Paper Acceptance Rate 16 of 53 submissions, 30%;

Overall Acceptance Rate 108 of 323 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
189
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Estro TBhandari PWildani AZadok EBadam AChidambaram V(2020)Desperately seeking ... optimal multi-tier cache configurationsProceedings of the 12th USENIX Conference on Hot Topics in Storage and File Systems10.5555/3488733.3488739(6-6)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3488733.3488739
Wildani AMiller E(2016)Can We Group Storage? Statistical Techniques to Identify Predictive Groupings in Storage System AccessesACM Transactions on Storage10.1145/273804212:2(1-33)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1145/2738042
Karakoyunlu CChandy J(2016)Exploiting user metadata for energy-aware node allocation in a cloud storage systemJournal of Computer and System Sciences10.1016/j.jcss.2015.09.00382:2(282-309)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.jcss.2015.09.003
Li HGui C(2014)A New Hybrid SSD Architecture Based on SLC and MLCApplied Mechanics and Materials10.4028/www.scientific.net/AMM.541-542.474541-542(474-477)Online publication date: Mar-2014
https://doi.org/10.4028/www.scientific.net/AMM.541-542.474
Wildani AMiller EAdams ILong D(2014)PERSESProceedings of the 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems10.1109/MASCOTS.2014.17(71-80)Online publication date: 9-Sep-2014
https://dl.acm.org/doi/10.1109/MASCOTS.2014.17
Huang PWan GZhou KHuang MLi CWang H(2013)Improve Effective Capacity and Lifetime of Solid State DrivesProceedings of the 2013 IEEE Eighth International Conference on Networking, Architecture and Storage10.1109/NAS.2013.13(50-59)Online publication date: 17-Jul-2013
https://dl.acm.org/doi/10.1109/NAS.2013.13
Rodeh OWildani AMiller E(2013)HANDSProceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)10.1109/ICDE.2013.6544846(446-457)Online publication date: 8-Apr-2013
https://dl.acm.org/doi/10.1109/ICDE.2013.6544846
Lee DO'Sullivan MWalker CMacKenzie MMaltzahn CBent J(2011)Robust benchmarking for archival storage tiersProceedings of the sixth workshop on Parallel Data Storage10.1145/2159352.2159354(1-6)Online publication date: 13-Nov-2011
https://dl.acm.org/doi/10.1145/2159352.2159354

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten