Abstract
Hidden Markov Models (HMMs) have applications in several areas of computer security. One drawback of HMMs is the selection of appropriate model parameters, which is often ad hoc or requires domain-specific knowledge. While algorithms exist to find local optima for some parameters, the number of states must always be specified and directly impacts the accuracy and generality of the model. In addition, domain knowledge is not always available or may be based on assumptions that prove incorrect or sub-optimal.
We apply the ε-machine—a special type of HMM—to the task of constructing network protocol models solely from network traffic. Unlike previous approaches, ε-machine reconstruction infers the minimal HMM architecture directly from data and is well suited to applications such as anomaly detection. We draw distinctions between our approach and previous research, and discuss the benefits and challenges of ε-machine for protocol model inference.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Erman, J., Mahanti, A., Arlitt, M.: Internet traffic identification using machine learning. In: Proceedings of the 49th IEEE Global Telecommunications Conference, pp. 1–6 (2006)
Rabiner, L.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989)
Crutchfield, J.P., Young, K.: Inferring statistical complexity. Phys. Rev. Let. 63 (1989); Crutchfield, J.P.: Physica D 75 11–54 (1994); Crutchfield, J. P., Shalizi, C. R.: Phys. Rev. E 59(1), 275–283, 105–108 (1999)
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley Interscience, New York (2006)
Beddoe, M.: Network protocol analysis using bioinformatics algorithms. Technical report, McAfee Inc. (2005)
Cui, W., Paxson, V., Weaver, N., Katz, R.: Protocol-independent adaptive replay of application dialog. In: Proceedings of the 13th Annual Symposium on Network and Distributed System Security (2006)
Cui, W., Kannan, J., Wang, H.: Discoverer: Automatic protocol reverse engineering from network traces. In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pp. 1–14 (2007)
Lin, Z., Jiang, X., Xu, D., Zhang, X.: Automatic protocol format reverse engineering through context-aware monitored execution. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (2008)
Wondracek, G., Milani Comparetti, P., Kruegel, C., Kirda, E.: Automatic network protocol analysis. In: Proceedings of the 15th Symposium on Network and Distributed System Security (2008)
Caballero, J., Poosankam, P., Kreibich, C., Song, D.: Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering. In: Proceedings of the 16th ACM conference on Computer and Communications Security, pp. 621–634 (2009)
Leita, C., Mermoud, K., Dacier, M.: Scriptgen: An automated script generation tool for honeyd. In: Proceedings of the 21st Annual Computer Security Applications Conference, pp. 203–214 (2005)
Milani Comparetti, P., Wondracek, G., Kruegel, C., Kirda, E.: Prospex: Protocol specification extraction. In: IEEE Symposium on Security and Privacy (2009)
Norris, J.R.: Markov Chains. Cambridge University Press, Cambridge (1997)
Crutchfield, J., Feldman, D.: Regularities unseen, randomness observed: Levels of entropy convergence. Chaos 15, 25–54 (2003)
Shalizi, C.R., Shalizi, K.L.: Blind construction of optimal nonlinear recursive predictors for discrete sequences. In: Proceedings of the 20th conference on Uncertainty in Artificial Intelligence, pp. 504–511 (2004)
Shalizi, C., Shalizi, K., Crutchfield, J.: Pattern discovery in time series, Part I: Theory, algorithm, analysis, and convergence, 2002 Santa Fe Institute Working Paper 02-10-060; arXiv.org/abs/cs.LG/0210025
Li, H., Zhang, K., Jiang, T.: Minimum entropy clustering and applications to gene expression analysis. In: Computational Systems Bioinformatics Conference, International IEEE Computer Society, pp. 142–151 (2004)
Postel, J.: Internet Control Message Protocol (1981), Updated by RFCs 950, 4884
Modbus Organization: Modbus Messaging Implementation Guide 1.0b (2006)
Bugalho, M., Oliveira, A.L.: Inference of regular languages using state merging algorithms with search. Pattern Recognition 38 (2005)
Godefroid, P.: Random testing for security: blackbox vs. whitebox fuzzing. In: Proceedings of the 2nd international workshop on Random testing, p. 1 (2007)
Infigo Information Security: Multiple FTP Servers vulnerabilities (2006) (accessed October 29, 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Whalen, S., Bishop, M., Crutchfield, J.P. (2010). Hidden Markov Models for Automated Protocol Learning. In: Jajodia, S., Zhou, J. (eds) Security and Privacy in Communication Networks. SecureComm 2010. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 50. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16161-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-16161-2_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16160-5
Online ISBN: 978-3-642-16161-2
eBook Packages: Computer ScienceComputer Science (R0)