Abstract:
Many phenomena that we wish to discover are comprised of sequences of events or event primitives. Often signatures are constructed to identify such phenomena using either...Show MoreMetadata
Abstract:
Many phenomena that we wish to discover are comprised of sequences of events or event primitives. Often signatures are constructed to identify such phenomena using either distributions or frequencies of attributes, or specific subsequences that are known to correlate to the phenomena. Distribution-based identification does not capture the essence of the sequence of behaviors and therefore may suffer from lack of specificity. At the other extreme, using specific subsequences to identify target phenomena is often too specific and suffers from lower sensitivity when natural variations arise in the phenomena, measuring process, or data analysis. We introduce here a method for discovering signatures for phenomena that are well characterized by sequences of event primitives. In this paper, we describe the steps taken and lessons learned in generalizing a sequence analysis method, BLAST, for use on non-biological datasets including expressing and operating on alphabets of varying length, constructing a reward/penalty model for arbitrary datasets, and discovering low complexity segments in sequence data by extending BLAST's native low-complexity estimating algorithms. We also present high-level overviews of several case studies that demonstrate the utility of this method to discovering signatures in a wide array of applications including network traffic, software analysis, server characterization, and others. Finally, we demonstrate how signatures discovered using this method can be expressed using a variety of model formalisms, each having its own relative benefit.
Date of Conference: 04-07 June 2013
Date Added to IEEE Xplore: 15 August 2013
ISBN Information: