skip to main content
research-article

Generating realistic impressions for file-system benchmarking

Published:14 December 2009Publication History
Skip Abstract Section

Abstract

The performance of file systems and related software depends on characteristics of the underlying file-system image (i.e., file-system metadata and file contents). Unfortunately, rather than benchmarking with realistic file-system images, most system designers and evaluators rely on ad hoc assumptions and (often inaccurate) rules of thumb. Furthermore, the lack of standardization and reproducibility makes file-system benchmarking ineffective. To remedy these problems, we develop Impressions, a framework to generate statistically accurate file-system images with realistic metadata and content. Impressions is flexible, supporting user-specified constraints on various file-system parameters using a number of statistical techniques to generate consistent images. In this article, we present the design, implementation, and evaluation of Impressions and demonstrate its utility using desktop search as a case study. We believe Impressions will prove to be useful to system developers and users alike.

References

  1. Agrawal, N., Bolosky, W. J., Douceur, J. R., and Lorch, J. R. 2007. A five-year study of file-system metadata. In Proceedings of the 5th USENIX Symposium on File and Storage Technologies (FAST'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anderson, D. and Chase, J. 2002. Fstress: A flexible network file service benchmark. In Tech rep. Duke University.Google ScholarGoogle Scholar
  3. Anderson, E., Kallahalla, M., Uysal, M., and Swaminathan, R. 2004. Buttress: A toolkit for flexible and high fidelity I/O benchmarking. In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baker, M., Hartman, J., Kupfer, M., Shirriff, K., and Ousterhout, J. 1991. Measurements of a distributed file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP'91). 198--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chen, P. M. and Patterson, D. A. 1993. A new approach to I/O performance evaluation--self-scaling I/O benchmarks, predicted I/O performance. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'93). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cipar, J., Corner, M. D., and Berger, E. D. 2007. Tfs: A transparent file system for contributory storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). USENIX Association, Berkeley, CA. 28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms, 2nd Ed. MIT Press and McGraw-Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cox, L. P., Murray, C. D., and Noble, B. D. 2002. Pastiche: Making backup cheap and easy. SIGOPS Oper. Syst. Rev. 36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cox, L. P. and Noble, B. D. 2003. Samsara: Honor among thieves in peer-to-peer storage. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). ACM, New York. 120--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dahlin, M. D., Wang, R. Y., Anderson, T. E., and Patterson, D. A. 1994. Cooperative caching: Using remote client memory to improve file system performance. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation (OSDI'94). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Douceur, J. R. and Bolosky, W. J. 1999. A large-scale study of file-system contents. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). 59--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Downey, A. B. 2001. The structural cause of file size distributions. In Proceedings of the 9th International Symposium on Modeling Analysis, and Simulation of Computer-Telecommunications Systems (MASCOTS'01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ebling, M. R. and Satyanarayanan, M. 1994. Synrgen: An extensible file reference generator. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'94). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fu, K., Kaashoek, M. F., and Mazières, D. 2002. Fast and secure distributed read-only file system. ACM Trans. Comput. Syst. 20, 1, 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gopal, B. and Manber, U. 1999. Integrating content-based access mechanisms with hierarchical file systems. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI'99). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gribble, S. D., Manku, G. S., Roselli, D. S., Brewer, E. A., Gibson, T. J., and Miller, E. L. 1998. Self-similarity in file systems. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). 141--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hutchinson, N. C., Manley, S., Federwisch, M., Harris, G., Hitz, D., Kleiman, S., and O'Malley, S. 1999. Logical vs. physical file system backup. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI'99). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Irlam, G. 1993. Unix file size survey—1993. http://www.base.com/gordoni/ufs93.html.Google ScholarGoogle Scholar
  19. Katcher, J. 1997. PostMark: A new file system benchmark. Tech. rep. TR-3022, Network Appliance Inc.Google ScholarGoogle Scholar
  20. Mesnier, M. P., Wachs, M., Sambasivan, R. R., Lopez, J., Hendricks, J., Ganger, G. R., and O'Hallaron, D. 2007. Trace: Parallel trace replay with approximate causal events. In Proceedings of the 5th USENIX Symposium on File and Storage Technologies (FAST'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. McDougall R. Filebench: Application level file system benchmark. http://www.solarisinternals.com/si/tools/filebench/index.php.Google ScholarGoogle Scholar
  22. Mitzenmacher, M. 2002. Dynamic models for file sizes and double pareto distributions. In Internet Mathematics.Google ScholarGoogle Scholar
  23. Mplayer. The MPlayer movie player. http://www.mplayerhq.hu/.Google ScholarGoogle Scholar
  24. Mullender, S. J. and Tanenbaum, A. S. 1984. Immediate files. Softw. Practice Exper. 14, 4, 365--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Muthitacharoen, A., Chen, B., and Mazières, D. 2001. A low-bandwidth network file system. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). 174--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. NIST. 2007. Text retrieval conference (trec) datasets. http://trec.nist.gov/data.Google ScholarGoogle Scholar
  27. Ousterhout, J. K., Costa, H. D., Harrison, D., Kunze, J. A., Kupfer, M., and Thompson, J. G. 1985. A trace-driven analysis of the UNIX 4.2 BSD file system. In Proceedings of the 10th ACM Symposium on Operating System Principles (SOSP'85). 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Padioleau, Y. and Ridoux, O. 2003. A logic file system. In Proceedings of the USENIX Annual Technical Conference.Google ScholarGoogle Scholar
  29. Patterson, D., Gibson, G., and Katz, R. 1988. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the ACM SIGMOD Conference on the Management of Data (SIGMOD'88). 109--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Prabhakaran, V., Bairavasundaram, L. N., Agrawal, N., Gunawi, H. S., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2005. IRON file systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). 206--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Przydatek, B. 2002. A Fast Approximation Algorithm for the subset-sum problem. Inter. Trans. Oper. Res. 9, 4, 437--459.Google ScholarGoogle ScholarCross RefCross Ref
  32. Riedel, E., Kallahalla, M., and Swaminathan, R. 2002. A framework for evaluating storage system security. In Proceedings of the 1st USENIX Symposium on File and Storage Technologies (FAST'02). 14--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rowstron, A. and Druschel, P. 2001. Storage management and caching in PAST, A large-scale, persistent peer-to-peer storage utility. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Satyanarayanan, M. 1981. A study of file sizes and functional lifetimes. In Proceedings of the 8th ACM Symposium on Operating Systems Principles (SOSP). 96--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sienknecht, T. F., Friedrich, R. J., Martinka, J. J., and Friedenbach, P. M. 1994. The implications of distributed data in a commercial environment on the design of hierarchical storage management. Perform. Eval. 20, 1--3, 3--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sigurd, B., Eeg-Olofsson, M., and van de Weijer, J. 2004. Word length, sentence length and frequency—Zipf revisited. Studia Linguist. 58, 1, 37--52.Google ScholarGoogle ScholarCross RefCross Ref
  37. Smith, K. and Seltzer, M. I. 1997. File system aging. In Proceedings of the Sigmetrics Conference.Google ScholarGoogle Scholar
  38. SNIA. 2007. Storage network industry association: Lotta repository. http://iotta.snia.org.Google ScholarGoogle Scholar
  39. Sobti, S., Garg, N., Zheng, F., Lai, J., Shao, Y., Zhang, C., Ziskind, W., and Krishnamurthy, A. 2004. Segank: A distributed mobile storage system. In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST'04). 239--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Storer, M. W., Greenan, K. M., Miller, E. L., and Voruganti, K. 2008. Pergamum: Replacing tape with energy efficient, reliable, disk-based archival storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST'08). USENIX Association, Berkeley, CA, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Wirzenius, L. 2009. Genbackupdata: Tool to generate backup test data. http://braawi.org/genbackupdata.html.Google ScholarGoogle Scholar
  42. Wright, C. P., Joukov, N., Kulkarni, D., Miretskiy, Y., and Zadok, E. 2005. Auto-pilot: A platform for system software benchmarking. In Proceedings of the Annual USENIX Technical Conference, FREENIX Track. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhang, Z. and Ghose, K. 2003. yfs: A journaling file system design for handling large data sets with reduced seeking. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03). USENIX Association, Berkeley, CA, 59--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zhu, N., Chen, J., and Chiueh, T.-C. 2005. Tbbt: Scalable and accurate trace replay for file server evaluation. In Proceedings of the 4th USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, 24. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generating realistic impressions for file-system benchmarking

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 5, Issue 4
          December 2009
          155 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/1629080
          Issue’s Table of Contents

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 December 2009
          • Received: 1 August 2009
          • Accepted: 1 August 2009
          Published in tos Volume 5, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader