Abstract
The performance of file systems and related software depends on characteristics of the underlying file-system image (i.e., file-system metadata and file contents). Unfortunately, rather than benchmarking with realistic file-system images, most system designers and evaluators rely on ad hoc assumptions and (often inaccurate) rules of thumb. Furthermore, the lack of standardization and reproducibility makes file-system benchmarking ineffective. To remedy these problems, we develop Impressions, a framework to generate statistically accurate file-system images with realistic metadata and content. Impressions is flexible, supporting user-specified constraints on various file-system parameters using a number of statistical techniques to generate consistent images. In this article, we present the design, implementation, and evaluation of Impressions and demonstrate its utility using desktop search as a case study. We believe Impressions will prove to be useful to system developers and users alike.
- Agrawal, N., Bolosky, W. J., Douceur, J. R., and Lorch, J. R. 2007. A five-year study of file-system metadata. In Proceedings of the 5th USENIX Symposium on File and Storage Technologies (FAST'07). Google ScholarDigital Library
- Anderson, D. and Chase, J. 2002. Fstress: A flexible network file service benchmark. In Tech rep. Duke University.Google Scholar
- Anderson, E., Kallahalla, M., Uysal, M., and Swaminathan, R. 2004. Buttress: A toolkit for flexible and high fidelity I/O benchmarking. In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST'04). Google ScholarDigital Library
- Baker, M., Hartman, J., Kupfer, M., Shirriff, K., and Ousterhout, J. 1991. Measurements of a distributed file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP'91). 198--212. Google ScholarDigital Library
- Chen, P. M. and Patterson, D. A. 1993. A new approach to I/O performance evaluation--self-scaling I/O benchmarks, predicted I/O performance. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'93). 1--12. Google ScholarDigital Library
- Cipar, J., Corner, M. D., and Berger, E. D. 2007. Tfs: A transparent file system for contributory storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). USENIX Association, Berkeley, CA. 28. Google ScholarDigital Library
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms, 2nd Ed. MIT Press and McGraw-Hill. Google ScholarDigital Library
- Cox, L. P., Murray, C. D., and Noble, B. D. 2002. Pastiche: Making backup cheap and easy. SIGOPS Oper. Syst. Rev. 36. Google ScholarDigital Library
- Cox, L. P. and Noble, B. D. 2003. Samsara: Honor among thieves in peer-to-peer storage. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). ACM, New York. 120--132. Google ScholarDigital Library
- Dahlin, M. D., Wang, R. Y., Anderson, T. E., and Patterson, D. A. 1994. Cooperative caching: Using remote client memory to improve file system performance. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation (OSDI'94). Google ScholarDigital Library
- Douceur, J. R. and Bolosky, W. J. 1999. A large-scale study of file-system contents. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). 59--70. Google ScholarDigital Library
- Downey, A. B. 2001. The structural cause of file size distributions. In Proceedings of the 9th International Symposium on Modeling Analysis, and Simulation of Computer-Telecommunications Systems (MASCOTS'01). Google ScholarDigital Library
- Ebling, M. R. and Satyanarayanan, M. 1994. Synrgen: An extensible file reference generator. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'94). Google ScholarDigital Library
- Fu, K., Kaashoek, M. F., and Mazières, D. 2002. Fast and secure distributed read-only file system. ACM Trans. Comput. Syst. 20, 1, 1--24. Google ScholarDigital Library
- Gopal, B. and Manber, U. 1999. Integrating content-based access mechanisms with hierarchical file systems. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI'99). Google ScholarDigital Library
- Gribble, S. D., Manku, G. S., Roselli, D. S., Brewer, E. A., Gibson, T. J., and Miller, E. L. 1998. Self-similarity in file systems. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). 141--150. Google ScholarDigital Library
- Hutchinson, N. C., Manley, S., Federwisch, M., Harris, G., Hitz, D., Kleiman, S., and O'Malley, S. 1999. Logical vs. physical file system backup. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI'99). Google ScholarDigital Library
- Irlam, G. 1993. Unix file size survey—1993. http://www.base.com/gordoni/ufs93.html.Google Scholar
- Katcher, J. 1997. PostMark: A new file system benchmark. Tech. rep. TR-3022, Network Appliance Inc.Google Scholar
- Mesnier, M. P., Wachs, M., Sambasivan, R. R., Lopez, J., Hendricks, J., Ganger, G. R., and O'Hallaron, D. 2007. Trace: Parallel trace replay with approximate causal events. In Proceedings of the 5th USENIX Symposium on File and Storage Technologies (FAST'07). Google ScholarDigital Library
- McDougall R. Filebench: Application level file system benchmark. http://www.solarisinternals.com/si/tools/filebench/index.php.Google Scholar
- Mitzenmacher, M. 2002. Dynamic models for file sizes and double pareto distributions. In Internet Mathematics.Google Scholar
- Mplayer. The MPlayer movie player. http://www.mplayerhq.hu/.Google Scholar
- Mullender, S. J. and Tanenbaum, A. S. 1984. Immediate files. Softw. Practice Exper. 14, 4, 365--368. Google ScholarDigital Library
- Muthitacharoen, A., Chen, B., and Mazières, D. 2001. A low-bandwidth network file system. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). 174--187. Google ScholarDigital Library
- NIST. 2007. Text retrieval conference (trec) datasets. http://trec.nist.gov/data.Google Scholar
- Ousterhout, J. K., Costa, H. D., Harrison, D., Kunze, J. A., Kupfer, M., and Thompson, J. G. 1985. A trace-driven analysis of the UNIX 4.2 BSD file system. In Proceedings of the 10th ACM Symposium on Operating System Principles (SOSP'85). 15--24. Google ScholarDigital Library
- Padioleau, Y. and Ridoux, O. 2003. A logic file system. In Proceedings of the USENIX Annual Technical Conference.Google Scholar
- Patterson, D., Gibson, G., and Katz, R. 1988. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the ACM SIGMOD Conference on the Management of Data (SIGMOD'88). 109--116. Google ScholarDigital Library
- Prabhakaran, V., Bairavasundaram, L. N., Agrawal, N., Gunawi, H. S., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2005. IRON file systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). 206--220. Google ScholarDigital Library
- Przydatek, B. 2002. A Fast Approximation Algorithm for the subset-sum problem. Inter. Trans. Oper. Res. 9, 4, 437--459.Google ScholarCross Ref
- Riedel, E., Kallahalla, M., and Swaminathan, R. 2002. A framework for evaluating storage system security. In Proceedings of the 1st USENIX Symposium on File and Storage Technologies (FAST'02). 14--29. Google ScholarDigital Library
- Rowstron, A. and Druschel, P. 2001. Storage management and caching in PAST, A large-scale, persistent peer-to-peer storage utility. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). Google ScholarDigital Library
- Satyanarayanan, M. 1981. A study of file sizes and functional lifetimes. In Proceedings of the 8th ACM Symposium on Operating Systems Principles (SOSP). 96--108. Google ScholarDigital Library
- Sienknecht, T. F., Friedrich, R. J., Martinka, J. J., and Friedenbach, P. M. 1994. The implications of distributed data in a commercial environment on the design of hierarchical storage management. Perform. Eval. 20, 1--3, 3--25. Google ScholarDigital Library
- Sigurd, B., Eeg-Olofsson, M., and van de Weijer, J. 2004. Word length, sentence length and frequency—Zipf revisited. Studia Linguist. 58, 1, 37--52.Google ScholarCross Ref
- Smith, K. and Seltzer, M. I. 1997. File system aging. In Proceedings of the Sigmetrics Conference.Google Scholar
- SNIA. 2007. Storage network industry association: Lotta repository. http://iotta.snia.org.Google Scholar
- Sobti, S., Garg, N., Zheng, F., Lai, J., Shao, Y., Zhang, C., Ziskind, W., and Krishnamurthy, A. 2004. Segank: A distributed mobile storage system. In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST'04). 239--252. Google ScholarDigital Library
- Storer, M. W., Greenan, K. M., Miller, E. L., and Voruganti, K. 2008. Pergamum: Replacing tape with energy efficient, reliable, disk-based archival storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST'08). USENIX Association, Berkeley, CA, 1--16. Google ScholarDigital Library
- Wirzenius, L. 2009. Genbackupdata: Tool to generate backup test data. http://braawi.org/genbackupdata.html.Google Scholar
- Wright, C. P., Joukov, N., Kulkarni, D., Miretskiy, Y., and Zadok, E. 2005. Auto-pilot: A platform for system software benchmarking. In Proceedings of the Annual USENIX Technical Conference, FREENIX Track. Google ScholarDigital Library
- Zhang, Z. and Ghose, K. 2003. yfs: A journaling file system design for handling large data sets with reduced seeking. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03). USENIX Association, Berkeley, CA, 59--72. Google ScholarDigital Library
- Zhu, N., Chen, J., and Chiueh, T.-C. 2005. Tbbt: Scalable and accurate trace replay for file server evaluation. In Proceedings of the 4th USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, 24. Google ScholarDigital Library
Index Terms
- Generating realistic impressions for file-system benchmarking
Recommendations
Generating realistic impressions for file-system benchmarking
FAST '09: Proccedings of the 7th conference on File and storage technologiesThe performance of file systems and related software depends on characteristics of the underlying file-system image (i.e., file-system metadata and file contents). Unfortunately, rather than benchmarking with realistic file-system images, most system ...
Benchmarking SSD-Based Lustre File System Configurations
XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery EnvironmentDue to recent development efforts, ZFS on Linux is now a viable alternative to the traditional ldiskfs backend used for production Lustre file systems. Certain ZFS features, such as copy-on-write, make it even more appealing for systems utilizing SSD ...
Comments