skip to main content
article

Challenges in managing dependable data systems

Published: 01 March 2006 Publication History

Abstract

Recent work shows how to automatically design storage systems that meet performance and dependability requirements by appropriately selecting and configuring storage devices, and creating snapshot, remote mirror, and traditional backup copies. Although this work represents a solid foundation, users demand an even higher level of functionality: the ability to cost-effectively manage data according to application-centric (or better, business process-centric) performance, dependability and manageability requirements, as these requirements evolve over the data's lifetime. In this paper, we outline several research challenges in managing dependable data systems, including capturing users' high-level goals; translating them into storage-level requirements; and designing, deploying, and analyzing the resulting data systems.

References

[1]
CA BrightStor Storage Resource Manager. http://www3.ca.com/solutions/Product.aspx?ID=1541.]]
[2]
HP Storage Essentials Software. http://h18006.www1.hp.com/products/storage/software/esuite/.]]
[3]
HP StorageWorks multilevel data protection and recovery. http://h18006.www1.hp.com/storage/highlights/mpr/.]]
[4]
NetApp SnapRestore. http://www.netapp.com/products/software/snaprestore.html.]]
[5]
StoreAge Virtualization Manager. http://www.storeage.com.]]
[6]
Veritas CommandCentral Storage. http://www.veritas.com/Products/www?c=product&refld=19.]]
[7]
M. Abd-El-Malek et al. Ursa minor: Versatile cluster-based storage. In Proc. 4th Conf. on File and Storage Technologies (FAST), pp. 60--72, Dec. 2005.]]
[8]
M. K. Aguilera et al. Performance debugging for distributed systems of black boxes. In Proc. 19th ACM Symposium on Operating Systems Principles (SOSP), Oct. 2003.]]
[9]
G. Alvarez et al. Minerva: an automated resource provisioning tool for large-scale storage systems. ACM Transactions on Computer Systems, 19(4):483--518, November 2001.]]
[10]
E. Anderson et al. An experimental study of data migration algorithms. In Proc. 5th Workshop on Algorithm Engineering (WAE), August 2001.]]
[11]
E. Anderson et al. Hippodrome: running circles around storage administration. In Proc. 1st Conf. on File and Storage Technologies (FAST), pp. 175--188, January 2002.]]
[12]
E. Anderson et al. Selecting RAID levels for disk arrays. In Proc. 1st Conf. on File and Storage Technologies (FAST), pp. 189--201, January 2002.]]
[13]
M. Baker et al. A fresh look at the reliability of long-term digital storage. In Proc. European Systems Conference (EuroSys), Apr. 2006.]]
[14]
P. Barham et al. Using Magpie for request extraction and work-load modelling. In Proc. Symp. on Operating Systems Design and Implementation (OSDI), pp. 259--272, 2004.]]
[15]
R. Bhagwan et al. Total Recall: System support for automated availability management. In Proc. Symp. on Network Systems Design and Implementation (NSDI), pp. 337--350, March 2004.]]
[16]
I. Cohen et al. Correlating instrumentation data to system states: A building block for automated diagnosis and control. In Proc. Symp. on Operating Systems Design and Implementation (OSDI), pp. 231--244, Dec. 2004.]]
[17]
Eagle Rock Alliance Ltd. Online survey results: 2001 cost of downtime. http://contingencyplanningresearch.com/2001 Survey.pdf, Aug. 2001.]]
[18]
S. Gaonkar et al. Designing dependable storage solutions for shared application environments. In Proc. Intl. Conf. on Dependable Systems and Networks (DSN), June 2006.]]
[19]
A. Guhe. Tuning MAID storage for backup and archive data. In Proc. IEEE/NASA Goddard Conf. on Mass Storage Systems and Technologies, April 2005.]]
[20]
K. Keeton et al. Designing for disasters. In Proc. 3rd Conf. on File and Storage Technologies (FAST), pp. 59--72, Mar. 2004.]]
[21]
K. Keeton et al. Lessons and challenges in automating data dependability. In Proc. 11th ACM-SIGOPS European Workshop, Sept. 2004.]]
[22]
K. Keeton et al. On the road to recovery: restoring data after disasters. In Proc. European Systems Conference (EuroSys), April 2006.]]
[23]
K. Keeton and A. Merchant. A framework for evaluating storage system dependability. In Proc. Intl. Conf. on Dependable Systems and Networks (DSN), pp. 877--886, June 2004.]]
[24]
K. Keeton, A. Merchant, and J. Wilkes. Understanding the dependability of large-scale information systems using interposition-based fault injection. Technical Report HPL-SSP-2006-15, Hewlett-Packard Labs Storage Systems Program, February 2006.]]
[25]
D. Narayanan, E. Thereska, and A. Ailamaki. Continuous resource monitoring for self-predicting dbms. In Proc. 13th Intl. Symp. on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), September 2005.]]
[26]
B. Olson. CDP buyers' guide: an overview of today's continuous data protection (CDP) solutions. Storage Networking Industry Association (SNIA), July 2005.]]
[27]
J. Saltzer, D. Reed, and D. Clark. End-to-end arguments in system design. ACM Transactions on Computer Systems, 2(4):277--288, November 1984.]]
[28]
D. S. Santry et al. Deciding when to forget in the Elephant file system. In Proc. 17th ACM Symposium on Operating Systems Principles (SOSP), pp. 110--123, Dec. 1999.]]
[29]
M. Seltzer et al. Provenance-aware storage systems. Technical Report TR-18-05, Harvard University, July 2005.]]
[30]
J. Wilkes. Traveling to Rome: QoS specifications for automated storage system management. In Proc. Intl. Workshop on Quality of Service (IWQoS), pp. 75--91. Springer-Verlag, June 2001.]]
[31]
J. Wylie et al. Selecting the right data distribution scheme for a survivable storage system. Technical Report CMU-CS-01-120, Carnegie Mellon University Computer Science, May 2001.]]
[32]
L. You, K. Pollack, and D. Long. Deep Store: an archival storage system architecture. In Proc. Intl. Conf. on Data Engineering (ICDE), April 2005.]]

Cited By

View all
  • (2011)PlatoCluster Computing10.1007/s10586-010-0122-y14:3(229-244)Online publication date: 1-Sep-2011
  • (2010)End-to-end disaster recovery planning: From art to science2010 IEEE Network Operations and Management Symposium - NOMS 201010.1109/NOMS.2010.5488491(357-364)Online publication date: Apr-2010
  • (2009)Storage administrationProceedings of the Symposium on Computer Human Interaction for the Management of Information Technology10.1145/1641587.1641595(56-59)Online publication date: 7-Nov-2009
  • Show More Cited By

Index Terms

  1. Challenges in managing dependable data systems

                            Recommendations

                            Comments

                            Information & Contributors

                            Information

                            Published In

                            cover image ACM SIGMETRICS Performance Evaluation Review
                            ACM SIGMETRICS Performance Evaluation Review  Volume 33, Issue 4
                            Design, implementation, and performance of storage systems
                            March 2006
                            45 pages
                            ISSN:0163-5999
                            DOI:10.1145/1138085
                            Issue’s Table of Contents

                            Publisher

                            Association for Computing Machinery

                            New York, NY, United States

                            Publication History

                            Published: 01 March 2006
                            Published in SIGMETRICS Volume 33, Issue 4

                            Check for updates

                            Qualifiers

                            • Article

                            Contributors

                            Other Metrics

                            Bibliometrics & Citations

                            Bibliometrics

                            Article Metrics

                            • Downloads (Last 12 months)0
                            • Downloads (Last 6 weeks)0
                            Reflects downloads up to 27 Feb 2025

                            Other Metrics

                            Citations

                            Cited By

                            View all
                            • (2011)PlatoCluster Computing10.1007/s10586-010-0122-y14:3(229-244)Online publication date: 1-Sep-2011
                            • (2010)End-to-end disaster recovery planning: From art to science2010 IEEE Network Operations and Management Symposium - NOMS 201010.1109/NOMS.2010.5488491(357-364)Online publication date: Apr-2010
                            • (2009)Storage administrationProceedings of the Symposium on Computer Human Interaction for the Management of Information Technology10.1145/1641587.1641595(56-59)Online publication date: 7-Nov-2009
                            • (2009)Applying genetic algorithms to decision making in autonomic computing systemsProceedings of the 6th international conference on Autonomic computing10.1145/1555228.1555258(97-106)Online publication date: 15-Jun-2009
                            • (2006)On the road to recoveryACM SIGOPS Operating Systems Review10.1145/1218063.121795840:4(235-248)Online publication date: 18-Apr-2006
                            • (2006)On the road to recoveryProceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 200610.1145/1217935.1217958(235-248)Online publication date: 18-Apr-2006

                            View Options

                            Login options

                            View options

                            PDF

                            View or Download as a PDF file.

                            PDF

                            eReader

                            View online with eReader.

                            eReader

                            Figures

                            Tables

                            Media

                            Share

                            Share

                            Share this Publication link

                            Share on social media