skip to main content
10.1145/1815695.1815697acmotherconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article

Using machine learning techniques to enhance the performance of an automatic backup and recovery system

Authors Info & Claims
Published:24 May 2010Publication History

ABSTRACT

A typical disaster recovery system will have mirrored storage at a site that is geographically separate from the main operational site. In many cases, communication between the local site and the backup repository site is performed over a network which is inherently slow, such as a WAN, or is highly strained, for example due to a whole-site disaster recovery operation.

The goal of this work is to alleviate the performance impact of the network in such a scenario, and to do so using machine learning techniques. We focus on two main areas, prefetching and read-ahead size determination. In both cases we significantly improve the performance of the system.

Our main contributions are as follows: We introduce a theoretical model of the system and the problem we are trying to solve and bound the gain from prefetching techniques. We construct two frequent pattern mining algorithms and use them for prefetching. A framework for controlling and combining multiple prefetch algorithms is presented as well. These algorithms, as well as various simple prefetch algorithms, are compared on a simulation environment. We introduce a novel algorithm for determining the amount of read ahead on such a system that is based on intuition from online competitive analysis and on regression techniques. The significant positive impact of this algorithm is demonstrated on IBM's FastBack system.

Much of our improvements have been applied with little or no modification of the current implementation's internals. We therefore feel confident in stating that the techniques are general and are likely to have applications elsewhere.

References

  1. WU Fengguang, XI Hongsheng, and XU Chenfeng. On the design of a new linux readahead framework. SIGOPS Oper. Syst. Rev., 42(5):75--84, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Carsten Gerlhof, Carsten A. Gerlhof, and Alfons Kemper. A multi-threaded architecture for prefetching in object bases. In In Proc. of the Int. Conf. on Extending Database Technology, pages 351--364. Springer-Verlag, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Carsten A. Gerlhof and Alfons Kemper. Prefetch support relations in object bases. In In Proc. of the Sixth Int. Workshop on Persistent Object Systems, pages 115--126. Springer and British Computer Society, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Binny S. Gill, Luis Angel, and D. Bathen. Amp: Adaptive multi-stream prefetching in a shared cache. In In Proceedings of the Fifth USENIX Symposium on File and Storage Technologies (FAST 07, pages 185--198, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Binny S. Gill and Dharmendra S. Modha. Sarc: Sequential prefetching in adaptive replacement cache. In In Proceedings of USENIX 2005 Annual Technical Conference, page 293308, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Xin J. Han, H. Cheng and X. Yan. Frequent pattern mining: Current status and future directions. In Data Mining and Knowledge Discovery, 10th Anniversary Issue, pages 55--86, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hui Lei and Dan Duchamp. An analytical approach to file prefetching. In In Proceedings of the USENIX 1997 Annual Technical Conference, pages 275--288, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, and Yuanyuan Zhou. C-miner: Mining block correlations in storage systems. In In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST 04, pages 173--186, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Shuang Liang, Song Jiang, and Xiaodong Zhang. Step: Sequentiality and thrashing detection based prefetching to improve performance of networked storage servers. In ICDCS '07: Proceedings of the 27th International Conference on Distributed Computing Systems, page 64, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. OLTP traces. Available via http://traces.cs.umass.edu/index.php/Storage/Storage.Google ScholarGoogle Scholar
  11. Mark Palmer. Fido: A cache that learns to fetch. In In Proceedings of the 17th International Conference on Very Large Data Bases, pages 255--264, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Hugo Patterson, Garth A. Gibson, Eka Ginting, Daniel Stodolsky, and Jim Zelenka. Informed prefetching and caching. In In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, pages 79--95. ACM Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Carl Tait, Hui Lei, and Swamp Acharya. Intelligent file hoarding for mobile computers, 1995.Google ScholarGoogle Scholar
  14. A. Inkeri Verkamo. Empirical results on locality in database referencing. SIGMETRICS Perform. Eval. Rev., 13(2):49--58, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Wedekind and George Zoerntlein. Prefetching in realtime database applications. SIGMOD Rec., 15(2):215--226, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Using machine learning techniques to enhance the performance of an automatic backup and recovery system

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  SYSTOR '10: Proceedings of the 3rd Annual Haifa Experimental Systems Conference
                  May 2010
                  211 pages
                  ISBN:9781605589084
                  DOI:10.1145/1815695

                  Copyright © 2010 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 24 May 2010

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  Overall Acceptance Rate94of285submissions,33%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader