ABSTRACT
Relative fitness is a new black-box approach to modeling the performance of storage devices. In contrast with an absolute model that predicts the performance of a workload on a given storage device, a relative fitness model predicts performance differences between a pair of devices. There are two primary advantages to this approach. First, because are lative fitness model is constructed for a device pair, the application-device feedback of a closed workload can be captured (e.g., how the I/O arrival rate changes as the workload moves from device A to device B). Second, a relative fitness model allows performance and resource utilization to be used in place of workload characteristics. This is beneficial when workload characteristics are difficult to obtain or concisely express (e.g., rather than describe the spatio-temporal characteristics of a workload, one could use the observed cache behavior of device A to help predict the performance of B.
This paper describes the steps necessary to build a relative fitness model, with an approach that is general enough to be used with any black-box modeling technique. We compare relative fitness models and absolute models across a variety of workloads and storage devices. On average, relative fitness models predict bandwidth and throughput within 10-20% and can reduce prediction error by as much as a factor of two when compared to absolute models.
- G. A. Alvarez, J. Wilkes, E. Borowsky, S. Go, T. H. Romer, R. Becker-Szendy, R. Golding, A. Merchant, M. Spasojevic, and A. Veitch. Minerva: an automated resource provisioning tool for large-scale storage systems. ACM Transactions on Computer Systems, 19(4):483--518. ACM, November 2001. Google ScholarDigital Library
- E. Anderson. Simple table-based modeling of storage devices. SSP Technical Report HPL-SSP-2001-4. HP Laboratories, July 2001.Google Scholar
- E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch. Hippodrome: running circles around storage administration. Conference on File and Storage Technologies (Monterey, CA, 28-30 January 2002), pages 175--188. USENIX Association, 2002. Google ScholarDigital Library
- N. Appliance. PostMark: A New File System Benchmark. http://www.netapp.com.Google Scholar
- E. Borowsky, R. Golding, A. Merchant, L. Schreier, E. Shriver, M. Spasojevic, and J. Wilkes. Using attribute-managed storage to achieve QoS. International Workshop on Quality of Service (Pittsburgh, PA, 21-23 March 1997). IFIP, 1997.Google ScholarCross Ref
- L. Breiman, J. H.Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth.Google Scholar
- M. J. Carey, D. J. DeWitt, M. J. Franklin, N. E. Hall, M. L. McAuliffe, J. F. Naughton, D. T. Schuh, M. H. Solomon, C. K. Tan, O. G. Tsatalos, S. J. White, and M. J. Zwilling. Shoring up persistent applications. ACM SIGMOD International Conference on Management of Data (Minneapolis, MN, 24-27 May 1994). Published as SIGMOD Record, 23(2):383--394. ACM Press, 1994. Google ScholarDigital Library
- D. J. Futuyma. Evolutionary Biology. Third edition. SUNY, Stony Brook. Sinauer. December 1998.Google Scholar
- G. R. Ganger. Generating representative synthetic workloads: an unsolved problem. International Conference on Management and Performance Evaluation of Computer Systems (Nashville, TN), pages 1263--1269, 1995.Google Scholar
- G. R. Ganger and Y. N. Patt. Using system-level models to evaluate I/O subsystem designs. IEEE Transactions on Computers, 47(6):667--678, June 1998. Google ScholarDigital Library
- G. R. Ganger, J. D. Strunk, and A. J. Klosterman. Self-Storage: brick-based storage with automated administration. Technical Report CMU-CS-03-178. Carnegie Mellon University, August 2003.Google ScholarCross Ref
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Verlag. 2001.Google Scholar
- Intel. iSCSI. www.sourceforge.net/projects/intel-iscsi.Google Scholar
- T. Kelly, I. Cohen, M. Goldszmidt, and K. Keeton. Inducing models of black-box storage arrays. Technical report HPL-2004-108. HP, June 2004.Google Scholar
- Z. Kurmas and K. Keeton. Using the distiller to direct the development of self-configuration software. International Conference on Autonomic Computing (New York, NY, 17-18 May 2004), pages 172--179. IEEE, 2004. Google ScholarDigital Library
- Z. Kurmas, K. Keeton, and K. Mackenzie. Synthesizing representative I/O workloads using iterative distillation. International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (Orlando, FL, 12-15 October 2003). IEEE/ACM, 2003.Google ScholarCross Ref
- A. Merchant and P. S. Yu. Analytic modeling of clustered RAID with mapping based on nearly random permutation. IEEE Transactions on Computers, 45(3):367--373, March 1996. Google ScholarDigital Library
- T. M. Mitchell. Machine Learning. McGraw-Hill, 1997. Google ScholarDigital Library
- F. I. Popovici, A. C. A. Dusseau, and R. H. A. Dusseau. Robust, portable I/O scheduling with the disk mimic. USENIX Annual Technical Conference (San Antonio, TX, 09-14 June 2003), pages 297--310. IEEE, 2003.Google Scholar
- C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. IEEE Computer, 27(3):17--28, March 1994. Google ScholarDigital Library
- J. Satran. iSCSI. http://www.ietf.org/rfc/rfc3720.txt.Google Scholar
- E. Shriver, A. Merchant, and J. Wilkes. An analytic behavior model for disk drives with readahead caches and request reordering. ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (Madison, WI, 22-26 June 1999). Published as ACM SIGMETRICS Performance Evaluation Review, 26(1):182--191. ACM Press, 1990. Google ScholarDigital Library
- Transaction Processing Performance Council. TPC Benchmark C. http://www.tpc.org/tpcc.Google Scholar
- M. Uysal, G. A. Alvarez, and A. Merchant. A modular, analytical throughput model for modern disk arrays. International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (Cincinnati, OH, 15-18 August 2001), pages 183--192. IEEE, 2001. Google ScholarDigital Library
- E. Varki, A. Merchant, J. Xu, and X. Qiu. Issues and challenges in the performance analysis of real disk arrays. Transactions on Parallel and Distributed Systems, 15(6):559--574. IEEE, June 2004. Google ScholarDigital Library
- M. Wang, A. Ailamaki, and C. Faloutsos. Capturing the spatio-temporal behavior of real traffic data. IFIP WG 7.3 Symposium on Computer Performance (Rome, Italy, 23-27 September 2002). Published as Performance Evaluation, 49(1-4):147--163, 2002. Google ScholarDigital Library
- M. Wang, K. Au, A. Ailamaki, A. Brockwell, C. Faloutsos, and G. R. Ganger. Storage device performance prediction with CART models. International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (Volendam, The Netherlands, 05-07 October 2004), pages 588--595. IEEE/ACM, 2004. Google ScholarDigital Library
- M. Wang, T. Madhyastha, N. H. Chan, S. Papadimitriou, and C. Faloutsos. Data mining meets performance evaluation: fast algorithms for modeling bursty traffic. International Conference on Data Engineering (San Jose, CA, 26-01 March 2002), pages 507--516. IEEE, 2002. Google ScholarDigital Library
Index Terms
- Modeling the relative fitness of storage
Recommendations
Relative fitness modeling
A Direct Path to Dependable SoftwareRelative fitness is a new approach to modeling the performance of storage devices (e.g., disks and RAID arrays). In contrast to a conventional model, which predicts the performance of an application's I/O on a given device, a relative fitness model ...
Modeling the relative fitness of storage
SIGMETRICS '07 Conference ProceedingsRelative fitness is a new black-box approach to modeling the performance of storage devices. In contrast with an absolute model that predicts the performance of a workload on a given storage device, a relative fitness model predicts performance ...
Comments