skip to main content
10.1145/2485732.2485752acmconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article

Building intelligence for software defined data centers: modeling usage patterns

Published: 30 June 2013 Publication History

Abstract

As both the amount of data to be stored and the rate of data production grows, data center designers and operators face the challenge of planning and managing systems whose characteristics, workloads, and performance, availability, and reliability goals change rapidly. As we move towards software-defined data centers (SDDCs) the ability to reconfigure and adapt our solutions is increasing, but to take full advantage of that increase we must design smarter, more intelligent systems that are aware of how they are being used and able to deliver accurate predictions of their characteristics, workloads, and goals. In this paper we propose a novel algorithm for use in an intelligent, user-aware SDDC which performs run-time analysis of user storage system activity in a manner that has a minimal impact on performance and provides accurate estimations of future user activity.
Our algorithm can produce both generalized models, and specific models, depending on the parameters used. Our algorithms are efficient, and have low overhead, making them ideal to use to add intelligence to SDDCs and build intelligent storage systems. We use our algorithm to analyze actual data from two real systems, monitoring user activity two and three times a day for each system respectively, over a period of roughly two years, for almost 500 distinct users.

References

[1]
Vmware software-defined data center the next-generation data center, 2013. URL http://www.vmware.com/solutions/datacenter/software-defined-datacenter/index.html.
[2]
E. Anderson. Capture, conversion, and analysis of an intense nfs workload. In Proccedings of the 7th conference on File and storage technologies, pages 139--152. USENIX Association, 2009.
[3]
L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The annals of mathematical statistics, pages 164--171, 1970.
[4]
N. L. Beebe, S. D. Stacy, and D. Stuckey. Digital forensic implications of zfs. digital investigation, 6:S99--S107, 2009.
[5]
J. Bonwick and B. Moore. Zfs: The last word in file systems. online} {retrieved on Jan. 22, 2008} Retrieved from the Internet, 2007.
[6]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1--38, 1977.
[7]
N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. McKeown, and S. Shenker. Nox: towards an operating system for networks. ACM SIGCOMM Computer Communication Review, 38(3):105--110, 2008.
[8]
J. L. Hafner, V. Deenadhayalan, W. Belluomini, and K. Rao. Undetected disk errors in raid arrays. IBM Journal of Research and Development, 52(4.5):413--425, 2008.
[9]
J. A. Hartigan and M. A. Wong. Algorithm as 136: A k-means clustering algorithm. Applied statistics, pages 100--108, 1979.
[10]
B.-H. Juang and L. R. Rabiner. Hidden markov models for speech recognition. Technometrics, 33(3):251--272, 1991.
[11]
H. V. Madhyastha, J. C. McCullough, G. Porter, R. Kapoor, S. Savage, A. C. Snoeren, and A. Vahdat. scc: cluster storage provisioning informed by application characteristics and slas. In Proceedings of the 10th USENIX conference on File and Storage Technologies, pages 23--23. USENIX Association, 2012.
[12]
N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. Openflow: enabling innovation in campus networks. ACM SIGCOMM Computer Communication Review, 38(2):69--74, 2008.
[13]
J. R. Norris. Markov chains. Number 2008. Cambridge University Press, 1998.
[14]
E. Rozier, W. Belluomini, V. Deenadhayalan, J. Hafner, K. Rao, and P. Zhou. Evaluating the impact of undetected disk errors in raid systems. In Dependable Systems & Networks, 2009. DSN'09. IEEE/IFIP International Conference on, pages 83--92. IEEE, 2009.
[15]
E. W. Rozier, W. H. Sanders, P. Zhou, N. Mandagere, S. M. Uttamchandani, and M. L. Yakushev. Modeling the fault tolerance consequences of deduplication. In Reliable Distributed Systems (SRDS), 2011 30th IEEE Symposium on, pages 75--84. IEEE, 2011.
[16]
E. W. D. Rozier and W. H. Sanders. A framework for efficient evaluation of the fault tolerance of deduplicated storage systems. In Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on, pages 1--12. IEEE, 2012.
[17]
B. Schroeder and G. A. Gibson. Disk failures in the real world: What does an mttf of 1,000,000 hours mean to you. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), pages 1--16, 2007.
[18]
G. Soundararajan, D. Lupei, S. Ghanbari, A. D. Popescu, J. Chen, and C. Amza. Dynamic resource allocation for database servers running on virtual storage. In Proccedings of the 7th conference on File and storage technologies, pages 71--84. USENIX Association, 2009.
[19]
R. Strobl and O. Evangelist. Zfs: Revolution in file systems. Sun Tech Days, 2009:2008, 2008.
[20]
V. Tarasov, S. Kumar, J. Ma, D. Hildebrand, A. Povzner, G. Kuenning, and E. Zadok. Extracting flexible, replayable models from large block traces. FAST12, 2012.
[21]
A. Van Moorsel and B. R. Haverkort. Probabilistic evaluation for the analytical solution of large markov models: Algorithms and tool support. Microelectronics Reliability, 36(6):733--755, 1996.
[22]
A. P. Van Moorsel and W. H. Sanders. Transient solution of markov models by combining adaptive and standard uniformization. Reliability, IEEE Transactions on, 46(3):430--440, 1997.
[23]
G. Wallace, F. Douglis, H. Qian, P. Shilane, S. Smaldone, M. Chamness, and W. Hsu. Characteristics of backup workloads in production systems. In Proceedings of the Tenth USENIX Conference on File and Storage Technologies (FAST12), 2012.
[24]
P. Zikopoulos, C. Eaton, et al. Understanding big data: Analytics for enterprise class hadoop and streaming data. 2011.

Cited By

View all
  • (2020)A Survey and Classification of Software-Defined Storage SystemsACM Computing Surveys10.1145/338589653:3(1-38)Online publication date: 28-May-2020
  • (2020)Software-defined load-balanced data center: design, implementation and performance analysisCluster Computing10.1007/s10586-020-03134-xOnline publication date: 13-Jul-2020
  • (2019)Classified enhancement model for big data storage reliability based on Boolean satisfiability problemCluster Computing10.1007/s10586-019-02941-1Online publication date: 11-May-2019
  • Show More Cited By
  1. Building intelligence for software defined data centers: modeling usage patterns

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SYSTOR '13: Proceedings of the 6th International Systems and Storage Conference
    June 2013
    198 pages
    ISBN:9781450321167
    DOI:10.1145/2485732
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. file and storage systems
    2. software-defined data centers
    3. usage
    4. user modeling
    5. workload characterization
    6. workload modeling

    Qualifiers

    • Research-article

    Conference

    SYSTOR '13
    Sponsor:
    • INTEL
    • Riverbed
    • Technion
    • SIGOPS
    • EMC<sup>2</sup>
    • AXCIENT
    • USENIX Assoc
    • IBM
    • HP

    Acceptance Rates

    SYSTOR '13 Paper Acceptance Rate 20 of 49 submissions, 41%;
    Overall Acceptance Rate 108 of 323 submissions, 33%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)A Survey and Classification of Software-Defined Storage SystemsACM Computing Surveys10.1145/338589653:3(1-38)Online publication date: 28-May-2020
    • (2020)Software-defined load-balanced data center: design, implementation and performance analysisCluster Computing10.1007/s10586-020-03134-xOnline publication date: 13-Jul-2020
    • (2019)Classified enhancement model for big data storage reliability based on Boolean satisfiability problemCluster Computing10.1007/s10586-019-02941-1Online publication date: 11-May-2019
    • (2018)SAT-based Important Data Reliability Enhancement Model for Big Data StorageProceedings of the 3rd International Conference on Big Data and Computing10.1145/3220199.3220220(20-26)Online publication date: 28-Apr-2018
    • (2017)An overall approach to achieve load balancing for Hadoop Distributed File SystemInternational Journal of Web and Grid Services10.1504/IJWGS.2017.08737013:4(448-466)Online publication date: 1-Jan-2017
    • (2015)Improving Reliability with Dynamic Syndrome Allocation in Intelligent Software Defined Data CentersProceedings of the 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks10.1109/DSN.2015.46(219-230)Online publication date: 22-Jun-2015
    • (2015)Characterizing Data Dependence Constraints for Dynamic Reliability Using N-Queens Attack DomainsProceedings of the 12th International Conference on Quantitative Evaluation of Systems - Volume 925910.1007/978-3-319-22264-6_14(211-227)Online publication date: 1-Sep-2015
    • (2014)Model-based automation for hardware provisioning in IT infrastructure2014 IEEE International Systems Conference Proceedings10.1109/SysCon.2014.6819272(293-300)Online publication date: Mar-2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media