skip to main content
article

Quickly finding near-optimal storage designs

Published: 01 November 2005 Publication History

Abstract

Despite the importance of storage in enterprise computer systems, there are few adequate tools to design and configure a storage system to meet application data requirements efficiently. Storage system design involves choosing the disk arrays to use, setting the configuration options on those arrays, and determining an efficient mapping of application data onto the configured system. This is a complex process because of the multitude of disk array configuration options, and the need to take into account both capacity and potentially contending I/O performance demands when placing the data. Thus, both existing tools and administrators using rules of thumb often generate designs that are of poor quality.This article presents the Disk Array Designer (DAD), which is a tool that can be used both to guide administrators in their design decisions and to automate the design process. DAD uses a generalized best-fit bin packing heuristic with randomization and backtracking to search efficiently through the huge number of possible design choices. It makes decisions using device models that estimate storage system performance. We evaluate DAD's designs based on traces from a variety of database, filesystem, and e-mail workloads. We show that DAD can handle the difficult task of configuring midrange and high-end disk arrays, even with complex real-world workloads. We also show that DAD quickly generates near-optimal storage system designs, improving in both speed and quality over previous tools.

References

[1]
Allen, N. 2001. Don't waste your storage dollars: What you need to know. Research note COM-13-1217. Gartner, Inc. Stanford, CT. Go online to http://www.gartner.com.
[2]
Alvarez, G., Borowsky, E., Go, S., Romer, T. H., Becker-Szendy, R., Golding, R., Merchant, A., Spasojevic, M., Veitch, A., and Wilkes, J. 2001. Minerva: An automated resource provisioning tool for large-scale storage systems. ACM Trans. Comput. Syst. 19, 4 (Nov.), 483--518.
[3]
Anderson, E. 2001. Simple table-based modeling of storage devices. Technical note HPL-SSP-2001-4. HP Labs, Palo Alto, CA. Go online to http://www.hpl.hp.com/SSP/papers/.
[4]
Anderson, E., Hobbs, M., Keeton, K., Spence, S., Uysal, M., and Veitch, A. 2002a. Hippodrome: Running rings around storage administration. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Monterey, CA, 175--188.
[5]
Anderson, E., Kallahalla, M., Spence, S., Swaminathan, R., and Wang, Q. 2001. Ergastulum: An approach to solving the workload and device configuration problem. Go online to http://www.hpl.hp.com/SSP/papers/.
[6]
Anderson, E., Swaminathan, R., Veitch, A., Alvarez, G., and Wilkes, J. 2002b. Selecting RAID levels for disk arrays. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Monterey, CA.
[7]
Borowsky, E., Golding, R., Merchant, A., Schrier, L., Shriver, E., Spasojevic, M., and Wilkes, J. 1997. Using attribute-managed storage to achieve QoS. In Proceedings of the 5th International Workshop on Quality of Service. Kluwer, Norwell, MA, 199--202.
[8]
Buzen, J. P., Goldberg, R. P., Langer, A. M., Lentz, E., Schwenk, H. S., Sheetz, D. A., and Shum, A. 1978. BEST/1---design of a tool for computer system capacity planning. In Proceedings of the AFIPS National Computer Conference (NCC), S. P. Ghosh and L. Y. Liu, Eds. AFIPS Press, Montvale, NJ, 447--455.
[9]
Chase, J., Anderson, D., Thakar, P., Vahdat, A., and Doyle, R. 2001. Managing energy and server resources in hosting centers. In Proceedings of the 18th ACM Symposium on Operating System Principles. (SOSP'01) Chateau Lake Louise, Banff, Canada. ACM Press, New York, NY, 103--116.
[10]
Chaudhuri, S. and Narasayya, V. 1998. AutoAdmin “What-if” index analysis utility. In Proceedings of the SIGMOD International Conference on Management of Data Seattle WA, ACM Press, New York, NY, 367--378.
[11]
Coffman, Jr., E. G., Garey, M. R., and Johnson, D. S. 1997. Approximation algorithms for bin packing: A survey. In Approximation Algorithms for NP-Hard Problems, D. Hochbaum, ed, PWS Publishing, Boston, MA, 46--93.
[12]
Dechter, R. and Frost, D. 2002. Backjump-based backtracking for constraint satisfaction problems. Art. Intell. 136, 2 (April), 147--188.
[13]
Dowdy, L. W. and Foster, D. V. 1982. Comparative models of the file assignment problem. ACM Comput. Surv. 14, 2 (June), 287--313.
[14]
Fernandez, W. and Lueker, G. 1981. Bin packing can be solved within 1+ε in linear time. Combinatorica 1, 4, 349--55.
[15]
Furnas, G. W. and Zacks, J. 1994. Multitrees: Enriching and reusing hierarchical structure. In Proceedings of the Human Factors in Computing Systems CHI '94 Conference (Boston, MA). 330--336.
[16]
Garey, M. and Johnson, D. 1979. Computers and Intractability. W. H. Freeman and Company, New York, NY.
[17]
Hewlett-Packard Company. 1998. Model 30/FC High Availability Disk Array---User's Guide. Hewlett-Packard, Pub. No. A3661-90001. Hewlett-Packard, Palo Alto, CA.
[18]
Hewlett-Packard Company. 2000. HP SureStore E Disk Array FC60---Advanced User's Guide. Hewlett-Packard, Palo Alto, CA.
[19]
PHewlett-Packard Company. 2001. HP Surestore Disk Array XP256---Configuration Guide, Chapter 4.7.5, HP e3000 Business Servers Configuration Guide. Hewlett-Packard, Palo Alto, CA.
[20]
Hewlett-Packard Development Company. 2003. HP Managed Services: Delivering more storage value on demand. HP StorageWorks case study. Hewlett-Packard, Palo Alto, CA.
[21]
Hill, R. A. 1994. System for managing data storage based on vector-summed size-frequency vectors for data sets, devices, and residual storage on devices. US Patent 5,345,584, 6 September 1994.
[22]
Johnson, D., Demers, A., Ullman, J., Garey, M., and Graham, R. 1974a. Worst case bounds for simple one-dimensional packing algorithms. SIAM J. Comput. 3, 299--325.
[23]
Johnson, D. S., Demers, A., Ullman, J. D., Garey, M. R., and Graham, R. L. 1974b. Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J. Comput. 3, 4 (Dec.), 299--325.
[24]
Kenyon, C. 1996. Best-fit bin-packing with random order. In SODA: ACM-SIAM Symposium on Discrete Algorithms (Atlanta, GA). ACM Press, New York, NY.
[25]
Kenyon, C. 1997. Best-fit bin packing with random order. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms. ACM Press, New York, NY.
[26]
Lamb, E. 2001. Hardware spending matters. Red Herring, 32--33.
[27]
Loaiza, J. 2002. Optimal storage configuration made easy. White paper 295. Oracle Corporation, Redwood Shores, CA.
[28]
Merchant, A. and Alvarez, G. A. 2001. Disk array models in Minerva. Tech. rep. HPL-2001-118. HP Labs., Palo Alto, CA. Go online to http://www.hpl.hp.com/techreports/2001/HPL-2001-118.html.
[29]
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., and Teller, E. 1953. Equations of state calculations by fast computing machines. J. Chemi. Phys. 21, 1087--1091.
[30]
Morris, R. 2002. Storage: From atoms to people. Keynote speech at the Conference on File and Storage Technologies (FAST).
[31]
Nemhauser, G. and Wolsey, L. 1988. Integer and Combinatorial Optimization. John Wiley and Sons, New York, NY.
[32]
Patterson, D., Gibson, G., and Katz, R. 1988. A case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the SIGMOD international Conference on the Management of Data (Chicago, IL). ACM Press, New York, NY, 109--116.
[33]
Scheuermann, P., Weikum, G., and Zabback, P. 1998. Data partitioning and load balancing in parallel disk systems. VLDB J.: Very Large Data Bases 7, 1, 48--66.
[34]
Shor, P. 1986. The average-case analysis of some on-line algorithms for bin packing. Combinatorica 6, 179--200.
[35]
Shriver, E. 1996. A formalization of the attribute mapping problem. Tech. rep. HPL-SSP-95-10. HP Labs, Palo Alto, CA. Go online to http://www.hpl.hp.com/SSP/papers.
[36]
Shriver, E. 1997. Performance modeling for realistic storage devices. Ph.D. dissertation. New York University, New York, NY.
[37]
Toyoda, Y. 1975. A simplified algorithm for obtaining approximate solutions to zero-one programming problems. Manage. Sci. 21, 12, 1417--1427.
[38]
Uysal, M. 2004. Personal communication.
[39]
Uysal, M., Alvarez, G. A., and Merchant, A. 2001. A modular, analytical throughput model for modern disk arrays. In Proceedings of the 9th International Symposium on Modeling, Analysis and Simulation on Computer and Telecommunications Systems (MASCOTS 2001, Cincinnati, OH). IEEE, Computer Society Press, Los, Alamitos, CA.
[40]
Veitch, A. and Keeton, K. 2003. The Rubicon workload characterization tool. Tech. rep. HPL-SSP-2003-13. HP Labs, Palo Alto, CA. Go online to http://www.hpl.hp.com/SSP/papers.
[41]
Ward, J., O'Sullivan, M., Shahoumian, T., and Wilkes, J. 2002. Appia: Automatic storage area network fabric design. In Proceedings of the Conference on File and Storage Technologies (FAST). USENIX, Monterey, CA, 203--217.
[42]
Weikum, G., Zabback, P., and Scheuermann, P. 1991. Dynamic file allocation in disk arrays. In Proceedings of the SIGMOD Conference (Denver, CO). ACM Press, New York, NY, 406--415.
[43]
Wilkes, J. 2001. Traveling to Rome: QoS specifications for automated storage system management. In Proceedings of the International Workshop on Quality of Service (Karlsruhe, Germany). Springer, Berlin, Germany, 75--91.
[44]
Wolf, J. 1989. The placement optimization program: A practical solution to the disk file assignment problem. In Proceedings of the ACM SIGMETRICS Conference (Berkeley, CA). ACM Press, New York, NY, 1--10.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 23, Issue 4
November 2005
137 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/1113574
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2005
Published in TOCS Volume 23, Issue 4

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Analysis of Optimal File Placement for Energy-Efficient File-Sharing Cloud Storage SystemIEEE Transactions on Sustainable Computing10.1109/TSUSC.2020.30372607:1(75-86)Online publication date: 1-Jan-2022
  • (2022)Machine Learning-based Adaptive Migration Algorithm for Hybrid Storage Systems2022 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS55553.2022.9925545(1-8)Online publication date: Oct-2022
  • (2022)BibliographyStorage Systems10.1016/B978-0-32-390796-5.00023-1(641-693)Online publication date: 2022
  • (2022)Heterogeneous Disk Arrays - HDAsStorage Systems10.1016/B978-0-32-390796-5.00019-X(565-591)Online publication date: 2022
  • (2022)Storage technologies and their dataStorage Systems10.1016/B978-0-32-390796-5.00011-5(89-196)Online publication date: 2022
  • (2019)Decision-Making Approaches for Performance QoS in Distributed Storage Systems: A SurveyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2893940(1-1)Online publication date: 2019
  • (2018)Storage tier-aware replicative data reorganization with prioritization for efficient workload processingFuture Generation Computer Systems10.1016/j.future.2017.04.01079:P2(618-629)Online publication date: 1-Feb-2018
  • (2018)Dynamic Control of Storage Bandwidth Using Double Deep Recurrent Q-NetworkNeural Information Processing10.1007/978-3-030-04239-4_20(222-234)Online publication date: 13-Dec-2018
  • (2017)MiradorProceedings of the 15th Usenix Conference on File and Storage Technologies10.5555/3129633.3129653(213-227)Online publication date: 27-Feb-2017
  • (2017)ExaPlanACM Transactions on Storage10.1145/307883913:2(1-41)Online publication date: 22-May-2017
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media