skip to main content
research-article

Autograph: automatically extracting workflow file signatures

Published: 01 January 2009 Publication History

Abstract

Storage management activities, such as reporting, file placement, migration and archiving, require the ability to discover files that belong to an application workflow by relying only on information from the file server. Some classes of application workflows, such as rendering an animated sequence from its graphics models or building an application from its source files, often exhibit a high degree of repeatability. We describe a system called Autograph that exploits this repeatability to discover files that belong to an application workflow. Our approach examines traces of file accesses, finds repeated and correlated accesses, and infers which files likely belong to the same workflow. Our solution targets server workflows and uses file server traces, which contain less process and file information than the local machine traces used in prior work. We show that Autograph successfully extracts workflow file signatures, even if the workflows are concurrent or share files.

References

[1]
P. Lyman and H. R. Varian, "How much information." 2003. Retrieved from http://www.sims.berkeley.edu/how-much-info-2003 on September 17, 2007.
[2]
X. Yan, J. Han, and R. Afshar, "CloSpan: mining closed sequential patterns in large datasets," in Proc. SIAM Intl. Conf. on Data Mining (SDM), (San Francisco, CA), pp. 166--177, May 2003.
[3]
B. C. M. Fung, K. Wang, and M. Ester, "Hierarchical document clustering using frequent itemsets," in Proc. SIAM Intl. Conf. on Data Mining (SDM), (San Francisco, CA), pp. 59--70, May 2003.
[4]
Y. Zhou, T. Kelly, J. Wiener, and E. Anderson, "An extended evaluation of two-phase scheduling methods for animation rendering," in Proc. 11th Workshop on Job Scheduling Strategies for ParallelProcessing (JSSPP), (Cambridge, MA), pp. 123--145, June 2005.
[5]
J. Griffioen and R. Appleton, "Reducing file system latency using a predictive approach," in Proc. USENIX Summer Technical Conference, (Boston, MA), pp. 197--207, June 1994.
[6]
T. M. Kroeger and D. D. E. Long, "Predicting file-system actions from prior events," in Proc. USENIX Annual Technical Conference, (San Diego, CA), pp. 319--328, Jan. 1996.
[7]
G. H. Kuenning and G. J. Popek, "Automated hoarding for mobile computers," in Proc. 16th ACM Symposium on Operating Systems Principles (SOSP), (St. Malo, France), pp. 264--275, Oct. 1997.
[8]
S. Shah, C. A. N. Soules, G. R. Ganger, and B. D. Noble, "Using provenance to aid in personal file search," in Proc. USENIX Annual Technical Conference, (Santa Clara, CA), pp. 171--184, June 2007.
[9]
C. A. N. Soules and G. R. Ganger, "Connections: using context to enhance file search," in Proc. ACM Symposium on Operating Systems Principles (SOSP), (Brighton, UK), pp. 119--132, Oct. 2005.
[10]
R. Agrawal, T. Imielinski, and A. N. Swami, "Mining association rules between sets of items in large databases," in Proc. 1993 ACM SIGMOD Intl. Conf. on Management of Data, (Washington, D.C.), pp. 207--216, May 1993.
[11]
M. Zaki and C. Hsiao, "Charm: an efficient algorithm for closed association rule mining," in Proc. 2nd SIAM Intl. Conf. on Data Mining (SDM), (Arlington, VA), pp. 457--473, Apr. 2002.
[12]
C. J. Van Rijsbergen, Information Retrieval, 2nd edition. London: Butterworths, 1979.
[13]
B. Larsen and C. Aone, "Fast and effective text mining using linear-time document clustering," in Proc. of Intl. Conf. on Knowledge Discovery and Data Mining (KDD), (San Diego, CA), pp. 16--22, Aug. 1999.
[14]
J. J. Kistler and M. Satyanarayanan, "Disconnected operation in the coda file system," in Proc. 13th ACM Symposium on Operating Systems Principles (SOSP), (Pacific Grove, CA), pp. 213--225, Oct. 1991.
[15]
Z. Li, Z. Chen, S. M. Srinivasan, and Y. Zhou, "C-Miner: Mining block correlations in storage systems," in Proc. Conference on File and Storage Technologies (FAST), (San Francisco, CA), pp. 173--186, Mar. 2004.
[16]
K. Muniswamy-Reddy, D. A. Holland, U. Braun, and M. Seltzer, "Provenance-aware storage systems," in Proc. USENIX Annual Technical Conference, (Boston, MA), pp. 43--56, June 2006.

Cited By

View all
  • (2019)Data Heat Prediction in Storage Systems Using Behavior Specific Prediction Models2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC)10.1109/IPCCC47392.2019.8958715(1-8)Online publication date: Oct-2019
  • (2015)Automatic request analyzer for QoS enabled storage systemProceedings of the 11th Central & Eastern European Software Engineering Conference in Russia10.1145/2855667.2855670(1-8)Online publication date: 22-Oct-2015
  • (2014)Automatic identification of application I/O signatures from noisy server-side tracesProceedings of the 12th USENIX conference on File and Storage Technologies10.5555/2591305.2591326(213-228)Online publication date: 17-Feb-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 43, Issue 1
January 2009
97 pages
ISSN:0163-5980
DOI:10.1145/1496909
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2009
Published in SIGOPS Volume 43, Issue 1

Check for updates

Author Tags

  1. application workflow
  2. storage management

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Data Heat Prediction in Storage Systems Using Behavior Specific Prediction Models2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC)10.1109/IPCCC47392.2019.8958715(1-8)Online publication date: Oct-2019
  • (2015)Automatic request analyzer for QoS enabled storage systemProceedings of the 11th Central & Eastern European Software Engineering Conference in Russia10.1145/2855667.2855670(1-8)Online publication date: 22-Oct-2015
  • (2014)Automatic identification of application I/O signatures from noisy server-side tracesProceedings of the 12th USENIX conference on File and Storage Technologies10.5555/2591305.2591326(213-228)Online publication date: 17-Feb-2014
  • (2014)Analysis and classification of multimedia I/O requests to storage systemProceedings of the 10th Central and Eastern European Software Engineering Conference in Russia10.1145/2687233.2687243(1-5)Online publication date: 23-Oct-2014
  • (2011)Obtaining Thresholds for the Effectiveness of Business Process MiningProceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement10.1109/ESEM.2011.64(453-462)Online publication date: 22-Sep-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media