skip to main content
10.1145/1987816.1987823acmotherconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article

Efficiently identifying working sets in block I/O streams

Authors Info & Claims
Published:30 May 2011Publication History

ABSTRACT

Identifying groups of blocks that tend to be read or written together in a given environment is the first step towards powerful techniques for device failure isolation and power management. For example, identified groups can be placed together on a single disk, avoiding excess drive activity across an exascale storage system. Unlike previous grouping work, we focus on identifying groupings in data that can be gathered from real, running systems with minimal impact. Using temporal, spatial, and access ordering information from an enterprise data set, we identified a set of groupings that consistently appear, indicating that these are working sets that are likely to be accessed together. We present several techniques to obtain groupings along with a discussion of what techniques best apply to particular types of real systems. We intend to use these preliminary results to inform our search for new types of workloads with a goal of identifying properties of easily separable workloads across different systems and dynamically moving groups in these workloads to reduce disk activity in large storage systems.

References

  1. A. Amer and D.D.E. Long. Aggregating caches: A mechanism for implicit file prefetching. In MASCOTS 2001, pages 293--301. IEEE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Amer, D.D.E. Long, J.F. Paris, and R.C. Burns. File access prediction with adjustable accuracy. In IPCCC 2002, pages 131--140. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Ari, A. Amer, R. Gramacy, E.L. Miller, S.A. Brandt, and D.D.E. Long. ACME: adaptive caching using multiple experts. In Proceedings in Informatics, volume 14, pages 143--158. Citeseer, 2002.Google ScholarGoogle Scholar
  4. A. C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, L.N. Bairavasundaram, T.E. Denehy, F.I. Popovici, V. Prabhakaran, and M. Sivathanu. Semantically-smart disk systems: past, present, and future. ACM SIGMETRICS Performance Evaluation Review, 33(4):29--35, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Colarelli and D. Grunwald. Massive arrays of idle disks for storage archives. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, page 11. IEEE Computer Society Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. H. Cormen, C.E. Leiserson, and R.L. Rivest. Algorithms. MIT Press, Cambridge, Massachusetts, 1990.Google ScholarGoogle Scholar
  7. X. Ding, S. Jiang, F. Chen, K. Davis, and X. Zhang. DiskSeen: exploiting disk layout and access history to enhance I/O prefetch. In 2007 USENIX ATC, pages 1--14. USENIX Association, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Doraimani and A. Iamnitchi. File grouping for scientific data management: lessons from experimenting with real traces. In Proceedings of the 17th international symposium on High performance distributed computing, pages 153--164. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Duch and A. Arenas. Community detection in complex networks using extremal optimization. Physical Review E, 72(2):027104, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. O. Duda, P.E. Hart, and D.G. Stork. Pattern classification, volume 2. Citeseer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Essary and A. Amer. Predictive data grouping: Defining the bounds of energy and latency reduction through predictive data grouping and replication. Trans. Storage, 4(1):1--23, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Jiang, X. Ding, F. Chen, E. Tan, and X. Zhang. DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality. In FAST 2005, page 8. USENIX Association, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. M. Kroeger and D.D.E. Long. Predicting file system actions from prior events. In Proceedings of the 1996 annual conference on USENIX Annual Technical Conference, page 26. Usenix Association, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. M. Kroeger and D.E. Long. Design and implementation of a predictive file prefetching algorithm. In USENIX Annual Technical Conference, General Track, pages 105--118, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. Li, Z. Chen, S.M. Srinivasan, and Y. Zhou. C-miner: Mining block correlations in storage systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pages 173--186. USENIX Association, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry. A fast file system for UNIX. ACM Transactions on Computer Systems, 2(3):181--197, August 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Narayanan, A. Donnelly, and A. Rowstron. Write off-loading: Practical power management for enterprise storage. ACM Transactions on Storage (TOS), 4(3):1--23, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Oly and D.A. Reed. Markov model prediction of I/O requests for scientific applications. In Proceedings of the 16th international conference on Supercomputing, pages 147--155. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Pinheiro and R. Bianchini. Energy conservation techniques for disk array-based servers. In ICS '04, pages 68--78. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Pinheiro, W.D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FASTâĂŹ07), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336):846--850, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Riska and E. Riedel. Disk drive level workload characterization. In Proceedings of the USENIX Annual Technical Conference, pages 97--103, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Schindler, J.L. Griffin, C.R. Lumb, and G.R. Ganger. Track-aligned extents: matching access patterns to disk drive characteristics. In Conference on File and Storage Technologies, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Sivathanu, V. Prabhakaran, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau. Improving storage system availability with D-GRAID. ACM TOS, 1(2):133--170, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Sivathanu, V. Prabhakaran, F.I. Popovici, T.E. Denehy, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau. Semantically-smart disk systems. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 73--88, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Staelin and H. Garcia-Molina. Clustering active disk data to improve disk performance. Princeton, NJ, USA, Tech. Rep. CS--TR--298--90, 1990.Google ScholarGoogle Scholar
  27. A. S. Tanenbaum, J.N. Herder, and H. Bos. File size distribution on UNIX systems: then and now. ACM SIGOPS Operating Systems Review, 40(1):104, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Wang and Y. Hu. PROFS-performance-oriented data reorganization for log-structured file system on multi-zone disks. In mascots, page 0285. Published by the IEEE Computer Society, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Wildani and E.L. Miller. Semantic data placement for power management in archival storage. In Petascale Data Storage Workshop (PDSW), 2010 5th, pages 1--5. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  30. N. J. Yadwadkar, C. Bhattacharyya, K. Gopinath, T. Niranjan, and S. Susarla. Discovery of application workloads from network file traces. In Proceedings of the 8th USENIX conference on File and storage technologies, page 14. USENIX Association, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Zhuang and H.H.S. Lee. Reducing cache pollution via dynamic data prefetch filtering. IEEE Transactions on Computers, pages 18--31, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficiently identifying working sets in block I/O streams

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SYSTOR '11: Proceedings of the 4th Annual International Conference on Systems and Storage
      May 2011
      189 pages
      ISBN:9781450307734
      DOI:10.1145/1987816

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 May 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SYSTOR '11 Paper Acceptance Rate16of53submissions,30%Overall Acceptance Rate94of285submissions,33%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader