skip to main content
10.1145/1987816.1987823acmotherconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article

Efficiently identifying working sets in block I/O streams

Published: 30 May 2011 Publication History

Abstract

Identifying groups of blocks that tend to be read or written together in a given environment is the first step towards powerful techniques for device failure isolation and power management. For example, identified groups can be placed together on a single disk, avoiding excess drive activity across an exascale storage system. Unlike previous grouping work, we focus on identifying groupings in data that can be gathered from real, running systems with minimal impact. Using temporal, spatial, and access ordering information from an enterprise data set, we identified a set of groupings that consistently appear, indicating that these are working sets that are likely to be accessed together. We present several techniques to obtain groupings along with a discussion of what techniques best apply to particular types of real systems. We intend to use these preliminary results to inform our search for new types of workloads with a goal of identifying properties of easily separable workloads across different systems and dynamically moving groups in these workloads to reduce disk activity in large storage systems.

References

[1]
A. Amer and D.D.E. Long. Aggregating caches: A mechanism for implicit file prefetching. In MASCOTS 2001, pages 293--301. IEEE, 2002.
[2]
A. Amer, D.D.E. Long, J.F. Paris, and R.C. Burns. File access prediction with adjustable accuracy. In IPCCC 2002, pages 131--140. IEEE Computer Society, 2002.
[3]
I. Ari, A. Amer, R. Gramacy, E.L. Miller, S.A. Brandt, and D.D.E. Long. ACME: adaptive caching using multiple experts. In Proceedings in Informatics, volume 14, pages 143--158. Citeseer, 2002.
[4]
A. C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, L.N. Bairavasundaram, T.E. Denehy, F.I. Popovici, V. Prabhakaran, and M. Sivathanu. Semantically-smart disk systems: past, present, and future. ACM SIGMETRICS Performance Evaluation Review, 33(4):29--35, 2006.
[5]
D. Colarelli and D. Grunwald. Massive arrays of idle disks for storage archives. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, page 11. IEEE Computer Society Press, 2002.
[6]
T. H. Cormen, C.E. Leiserson, and R.L. Rivest. Algorithms. MIT Press, Cambridge, Massachusetts, 1990.
[7]
X. Ding, S. Jiang, F. Chen, K. Davis, and X. Zhang. DiskSeen: exploiting disk layout and access history to enhance I/O prefetch. In 2007 USENIX ATC, pages 1--14. USENIX Association, 2007.
[8]
S. Doraimani and A. Iamnitchi. File grouping for scientific data management: lessons from experimenting with real traces. In Proceedings of the 17th international symposium on High performance distributed computing, pages 153--164. ACM, 2008.
[9]
J. Duch and A. Arenas. Community detection in complex networks using extremal optimization. Physical Review E, 72(2):027104, 2005.
[10]
R. O. Duda, P.E. Hart, and D.G. Stork. Pattern classification, volume 2. Citeseer, 2001.
[11]
D. Essary and A. Amer. Predictive data grouping: Defining the bounds of energy and latency reduction through predictive data grouping and replication. Trans. Storage, 4(1):1--23, 2008.
[12]
S. Jiang, X. Ding, F. Chen, E. Tan, and X. Zhang. DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality. In FAST 2005, page 8. USENIX Association, 2005.
[13]
T. M. Kroeger and D.D.E. Long. Predicting file system actions from prior events. In Proceedings of the 1996 annual conference on USENIX Annual Technical Conference, page 26. Usenix Association, 1996.
[14]
T. M. Kroeger and D.E. Long. Design and implementation of a predictive file prefetching algorithm. In USENIX Annual Technical Conference, General Track, pages 105--118, 2001.
[15]
Z. Li, Z. Chen, S.M. Srinivasan, and Y. Zhou. C-miner: Mining block correlations in storage systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pages 173--186. USENIX Association, 2004.
[16]
Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry. A fast file system for UNIX. ACM Transactions on Computer Systems, 2(3):181--197, August 1984.
[17]
D. Narayanan, A. Donnelly, and A. Rowstron. Write off-loading: Practical power management for enterprise storage. ACM Transactions on Storage (TOS), 4(3):1--23, 2008.
[18]
J. Oly and D.A. Reed. Markov model prediction of I/O requests for scientific applications. In Proceedings of the 16th international conference on Supercomputing, pages 147--155. ACM, 2002.
[19]
E. Pinheiro and R. Bianchini. Energy conservation techniques for disk array-based servers. In ICS '04, pages 68--78. ACM, 2004.
[20]
E. Pinheiro, W.D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FASTâĂŹ07), 2007.
[21]
W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336):846--850, 1971.
[22]
A. Riska and E. Riedel. Disk drive level workload characterization. In Proceedings of the USENIX Annual Technical Conference, pages 97--103, 2006.
[23]
J. Schindler, J.L. Griffin, C.R. Lumb, and G.R. Ganger. Track-aligned extents: matching access patterns to disk drive characteristics. In Conference on File and Storage Technologies, 2002.
[24]
M. Sivathanu, V. Prabhakaran, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau. Improving storage system availability with D-GRAID. ACM TOS, 1(2):133--170, 2005.
[25]
M. Sivathanu, V. Prabhakaran, F.I. Popovici, T.E. Denehy, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau. Semantically-smart disk systems. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 73--88, 2003.
[26]
C. Staelin and H. Garcia-Molina. Clustering active disk data to improve disk performance. Princeton, NJ, USA, Tech. Rep. CS--TR--298--90, 1990.
[27]
A. S. Tanenbaum, J.N. Herder, and H. Bos. File size distribution on UNIX systems: then and now. ACM SIGOPS Operating Systems Review, 40(1):104, 2006.
[28]
J. Wang and Y. Hu. PROFS-performance-oriented data reorganization for log-structured file system on multi-zone disks. In mascots, page 0285. Published by the IEEE Computer Society, 2001.
[29]
A. Wildani and E.L. Miller. Semantic data placement for power management in archival storage. In Petascale Data Storage Workshop (PDSW), 2010 5th, pages 1--5. IEEE, 2010.
[30]
N. J. Yadwadkar, C. Bhattacharyya, K. Gopinath, T. Niranjan, and S. Susarla. Discovery of application workloads from network file traces. In Proceedings of the 8th USENIX conference on File and storage technologies, page 14. USENIX Association, 2010.
[31]
X. Zhuang and H.H.S. Lee. Reducing cache pollution via dynamic data prefetch filtering. IEEE Transactions on Computers, pages 18--31, 2007.

Cited By

View all
  • (2020)Desperately seeking ... optimal multi-tier cache configurationsProceedings of the 12th USENIX Conference on Hot Topics in Storage and File Systems10.5555/3488733.3488739(6-6)Online publication date: 13-Jul-2020
  • (2016)Can We Group Storage? Statistical Techniques to Identify Predictive Groupings in Storage System AccessesACM Transactions on Storage10.1145/273804212:2(1-33)Online publication date: 1-Feb-2016
  • (2016)Exploiting user metadata for energy-aware node allocation in a cloud storage systemJournal of Computer and System Sciences10.1016/j.jcss.2015.09.00382:2(282-309)Online publication date: 1-Mar-2016
  • Show More Cited By

Index Terms

  1. Efficiently identifying working sets in block I/O streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SYSTOR '11: Proceedings of the 4th Annual International Conference on Systems and Storage
    May 2011
    189 pages
    ISBN:9781450307734
    DOI:10.1145/1987816
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • NetApp
    • Mellanox: Mellanox Technologies
    • Hewlett-Packard
    • Intel: Intel
    • Red Hat: Red Hat, Inc.
    • MARVELL: Marvell Technology Group
    • IBM: IBM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 May 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automated management
    2. file-system contents
    3. grouping

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SYSTOR '11
    Sponsor:
    • Mellanox
    • Intel
    • Red Hat
    • MARVELL
    • IBM

    Acceptance Rates

    SYSTOR '11 Paper Acceptance Rate 16 of 53 submissions, 30%;
    Overall Acceptance Rate 108 of 323 submissions, 33%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Desperately seeking ... optimal multi-tier cache configurationsProceedings of the 12th USENIX Conference on Hot Topics in Storage and File Systems10.5555/3488733.3488739(6-6)Online publication date: 13-Jul-2020
    • (2016)Can We Group Storage? Statistical Techniques to Identify Predictive Groupings in Storage System AccessesACM Transactions on Storage10.1145/273804212:2(1-33)Online publication date: 1-Feb-2016
    • (2016)Exploiting user metadata for energy-aware node allocation in a cloud storage systemJournal of Computer and System Sciences10.1016/j.jcss.2015.09.00382:2(282-309)Online publication date: 1-Mar-2016
    • (2014)A New Hybrid SSD Architecture Based on SLC and MLCApplied Mechanics and Materials10.4028/www.scientific.net/AMM.541-542.474541-542(474-477)Online publication date: Mar-2014
    • (2014)PERSESProceedings of the 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems10.1109/MASCOTS.2014.17(71-80)Online publication date: 9-Sep-2014
    • (2013)Improve Effective Capacity and Lifetime of Solid State DrivesProceedings of the 2013 IEEE Eighth International Conference on Networking, Architecture and Storage10.1109/NAS.2013.13(50-59)Online publication date: 17-Jul-2013
    • (2013)HANDSProceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)10.1109/ICDE.2013.6544846(446-457)Online publication date: 8-Apr-2013
    • (2011)Robust benchmarking for archival storage tiersProceedings of the sixth workshop on Parallel Data Storage10.1145/2159352.2159354(1-6)Online publication date: 13-Nov-2011

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media