Abstract
This chapter further elaborates on a topic of the previous chapter—inference—by focusing on a particular class of algorithms important for processing of cyber information—machine learning. The chapter also continues the thread of ontology and semantics as it explores the tradeoffs between the effectiveness of an algorithm and the semantic clarity of its products. It is often difficult to extract meaningful contextual information from a machine learning algorithm, because those algorithms that provide high accuracy also tend to use representations less comprehensible to humans. On the other hand, those algorithms that use more human-accessible vocabulary can be less accurate—they produce more false alerts (false positives), which confuse analysts. A related tradeoff is between the internal semantics of the algorithm versus the external semantics of its output. We illustrate this tradeoff with two case studies. Developers of CSA systems must be aware of such tradeoffs, and seek ways to mitigate them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abe, N., Zadrozny, B., and Langford, J. “Outlier detection by active learning,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, 2006.
Alon, N., Gibbons, P. B., Matias, Y., & Szegedy, M. (1999). Tracking join and self-join sizes in limited storage. Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems.
Axelsson, S. “The base-rate fallacy and the difficulty of intrusion detection,” ACM Transactions on Information and System Security (TISSEC), vol. 3, no. 3, pp. 186–205, 2000.
Barford, P., Dacier, M., Dietterich, T. G., Fredrikson, M., Giffin, J., Jajodia, S., and Jha, S. “Cyber SA: Situational awareness for cyber defense,” in Cyber Situational Awareness, Springer, 2010a, pp. 3–13.
Barford, P., Chen, Y., Goyal, A., Li, Z., Paxson, V., and Yegneswaran, V. “Employing Honeynets for network situational awareness,” in Cyber Situational Awareness, Springer, 2010b, pp. 71–102.
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT, 2010.
Brugger, S. T., and Chow, J. “An assessment of the DARPA IDS Evaluation Dataset using Snort,” UC Davis department of Computer Science, 2007.
Ciresan, D., Meier, U., and Schmidhuber, J. “Multi-column deep neural networks for image classification,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012.
Cisco Corporation. “Cisco Visual Networking Index: Forecast and Methodology, 2012–2017,” Cisco Corporation, 2013.
Cortes, C., and Vapnik, V. “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
D’Amico, A., Whitley, K., Tesone, D., O’Brien, B., and Roth, E. “Achieving cyber defense situational awareness: A cognitive task analysis of information assurance analysts,” in Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2005.
Depren, O., Topallar, M., Anarim, E., and Ciliz, M. K. “An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks,” Expert Systems with Applications, vol. 29, no. 4, pp. 713–722, nov 2005.
Endsley, M. R. “Toward a theory of situation awareness in dynamic systems,” Human Factors: The Journal of the Human Factors and Ergonomics Society, vol. 37, no. 1, pp. 32–64, 1995.
Ertoz, L., Eilertson, E., Lazarevic, A., Tan, P.-N., Kumar, V., Srivastava, J., and Dokas, A. P. “MINDS-minnesota intrusion detection system,” Next Generation Data Mining, pp. 199–218, 2004.
Goodfellow, I. J., Bulatov, Y., Ibarz, J., Arnoud, S., & Shet, V. (2013). Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks. ArXiv/CS, abs/1312.6082.
Harang, R. “Bridging the Semantic Gap: Human Factors in Anomaly-Based Intrusion Detection Systems,” in Network Science and Cybersecurity, New York, Springer, 2014, pp. 15–37.
Harang, R., and Guarino, P. “Clustering of Snort alerts to identify patterns and reduce analyst workload,” in MILITARY COMMUNICATIONS CONFERENCE, 2012.
Lakhina, A., Crovella, M., and Diot, C. “Diagnosing network-wide traffic anomalies,” ACM SIGCOMM Computer Communication Review, vol. 34, no. 4, pp. 219–230, 2004.
Lakhina, A., Crovella, M., and Diot, C. “Mining anomalies using traffic feature distributions,” ACM SIGCOMM Computer Communication Review, vol. 35, no. 4, pp. 217–228, 2005.
Lakkaraju, K., Yurcik, W., and Lee, A. J. “NVisionIP: netflow visualizations of system state for security situational awareness,” in 2004 ACM workshop on Visualization and data mining for computer security, 2004.
Laskov, P., Dussel, P., Schafer, C., and Rieck, K. “Learning Intrusion Detection: Supervised or Unsupervised,” in Image analysis and processing, 2005.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1999). Gradient-based learning applied to document recognition. Proceedings of the IEEE , 86(11), 2278-2324.
LeCun, Y., Cortes, C., & Burges, C. J. (2014). MNIST handwritten digit database. Retrieved April 14, 2014, from http://yann.lecun.com/exdb/mnist/
Li, P., and König, C. “b-Bit minwise hashing,” in ACM Proceedings of the 19th international conference on World wide web, 2010.
Li, W.-J., Wang, K., Stolfo, S. J., and Herzog, B. “Fileprints: Identifying file types by n-gram analysis,” in Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, 2005.
Mell, P. “Hyperagg: A Python Program for Efficient Alert Aggregation Using Set Cover Approximation and Hamming Distance,” National Institute of Standards and Technology, 2013. [Online]. Available: http://csrc.nist.gov/researchcode/hyperagg-mell-20130109.zip.
Mell, P., and Harang, R. “Enabling Efficient Analysts: Reducing Alerts to Review through Hamming Distance Based Aggregation (SUBMITTED),” in Twelfth Annual Conference on Privacy, Security, and Trust, Toronto, 2014.
Molina, M., Paredes-Oliva, I., Routly, W., and Barlet-Ros, P. “Operational experiences with anomaly detection in backbone networks,” Computers & Security, vol. 31, no. 3, pp. 273–285, may 2012.
Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT Press.
Paxson, V. “Bro: A system for detecting network intruders in real time,” Computer Networks, vol. 31, no. 23–24, pp. 2435–2463, 1999.
Rehak, M., Pechoucek, M., Celeda, P., Novotny, J., and Minarik, P. “CAMNEP: agent-based network intrusion detection system,” in Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems, 2008.
Roesch, M. “Snort – lightweight intrusion detection for networks,” Proceedings of the 13th USENIX conference on System administration, pp. 229–238, 1999.
Shi, Q., Petterson, J., Dror, G., Langford, J., Strehl, A. L., Smola, A. J., and Vishwanathan, S. V. N. “Hash kernels,” in International Conference on Artificial Intelligence and Statistics, 2009.
Sommer, R., and Paxson, V. “Outside the Closed World: On Using Machine Learning for Network Intrusion Detection,” in 2010 IEEE Symposium on Security and Privacy (SP), 2010.
Song, Y., Locasto, M. E., Stavrou, A., Keromytis, A. D., and Stolfo, S. J. “On the infeasibility of modeling polymorphic shellcode – Re-thinking . . .,” MACH LEARN, 2009.
Wang, K., and Stolfo, S. “Anomalous payload-based network intrusion detection,” in Recent Advances in Intrusion Detection, 2004.
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., and Attenberg, J. “Feature hashing for large scale multitask learning,” in Proceedings of the 26th Annual International Conference on Machine Learning, 2011.
Wilshusen, G. C. “CYBERSECURITY: A Better Defined and Implemented National Strategy Is Needed to Address Persistent Challenges,” 2013.
Xu, K., Zhang, Z.-L., and Bhattacharyya, S. “Reducing unwanted traffic in a backbone network,” in USENIX Workshop on Steps to Reduce Unwanted Traffic in the Internet, Boston, 2005.
Yegneswaran, V., Barford, P., and Paxson, V. “Using honeynets for internet situational awareness,” in ACM Hotnets IV, 2005.
Yin, X., Yurcik, W., Treaster, M., Li, Y., and Lakkaraju, K. “VisFlowConnect: netflow visualizations of link relationships for security situational awareness,” in 2004 ACM workshop on Visualization and data mining for computer security, 2004.
Zhang, J., Zulkernine, M., and Haque, A. “Random-Forests-Based Network Intrusion Detection Systems,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 38, no. 5, pp. 649–659, sep 2008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Harang, R. (2014). Learning and Semantics. In: Kott, A., Wang, C., Erbacher, R. (eds) Cyber Defense and Situational Awareness. Advances in Information Security, vol 62. Springer, Cham. https://doi.org/10.1007/978-3-319-11391-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-11391-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11390-6
Online ISBN: 978-3-319-11391-3
eBook Packages: Computer ScienceComputer Science (R0)