Learning and Semantics

Harang, Richard

doi:10.1007/978-3-319-11391-3_10

Richard Harang⁵

Part of the book series: Advances in Information Security ((ADIS,volume 62))

2592 Accesses

Abstract

This chapter further elaborates on a topic of the previous chapter—inference—by focusing on a particular class of algorithms important for processing of cyber information—machine learning. The chapter also continues the thread of ontology and semantics as it explores the tradeoffs between the effectiveness of an algorithm and the semantic clarity of its products. It is often difficult to extract meaningful contextual information from a machine learning algorithm, because those algorithms that provide high accuracy also tend to use representations less comprehensible to humans. On the other hand, those algorithms that use more human-accessible vocabulary can be less accurate—they produce more false alerts (false positives), which confuse analysts. A related tradeoff is between the internal semantics of the algorithm versus the external semantics of its output. We illustrate this tradeoff with two case studies. Developers of CSA systems must be aware of such tradeoffs, and seek ways to mitigate them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abe, N., Zadrozny, B., and Langford, J. “Outlier detection by active learning,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, 2006.
Google Scholar
Alon, N., Gibbons, P. B., Matias, Y., & Szegedy, M. (1999). Tracking join and self-join sizes in limited storage. Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems.
Google Scholar
Axelsson, S. “The base-rate fallacy and the difficulty of intrusion detection,” ACM Transactions on Information and System Security (TISSEC), vol. 3, no. 3, pp. 186–205, 2000.
Google Scholar
Barford, P., Dacier, M., Dietterich, T. G., Fredrikson, M., Giffin, J., Jajodia, S., and Jha, S. “Cyber SA: Situational awareness for cyber defense,” in Cyber Situational Awareness, Springer, 2010a, pp. 3–13.
Google Scholar
Barford, P., Chen, Y., Goyal, A., Li, Z., Paxson, V., and Yegneswaran, V. “Employing Honeynets for network situational awareness,” in Cyber Situational Awareness, Springer, 2010b, pp. 71–102.
Google Scholar
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT, 2010.
Google Scholar
Brugger, S. T., and Chow, J. “An assessment of the DARPA IDS Evaluation Dataset using Snort,” UC Davis department of Computer Science, 2007.
Google Scholar
Ciresan, D., Meier, U., and Schmidhuber, J. “Multi-column deep neural networks for image classification,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012.
Google Scholar
Cisco Corporation. “Cisco Visual Networking Index: Forecast and Methodology, 2012–2017,” Cisco Corporation, 2013.
Google Scholar
Cortes, C., and Vapnik, V. “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
Google Scholar
D’Amico, A., Whitley, K., Tesone, D., O’Brien, B., and Roth, E. “Achieving cyber defense situational awareness: A cognitive task analysis of information assurance analysts,” in Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2005.
Google Scholar
Depren, O., Topallar, M., Anarim, E., and Ciliz, M. K. “An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks,” Expert Systems with Applications, vol. 29, no. 4, pp. 713–722, nov 2005.
Google Scholar
Endsley, M. R. “Toward a theory of situation awareness in dynamic systems,” Human Factors: The Journal of the Human Factors and Ergonomics Society, vol. 37, no. 1, pp. 32–64, 1995.
Google Scholar
Ertoz, L., Eilertson, E., Lazarevic, A., Tan, P.-N., Kumar, V., Srivastava, J., and Dokas, A. P. “MINDS-minnesota intrusion detection system,” Next Generation Data Mining, pp. 199–218, 2004.
Google Scholar
Goodfellow, I. J., Bulatov, Y., Ibarz, J., Arnoud, S., & Shet, V. (2013). Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks. ArXiv/CS, abs/1312.6082.
Google Scholar
Harang, R. “Bridging the Semantic Gap: Human Factors in Anomaly-Based Intrusion Detection Systems,” in Network Science and Cybersecurity, New York, Springer, 2014, pp. 15–37.
Google Scholar
Harang, R., and Guarino, P. “Clustering of Snort alerts to identify patterns and reduce analyst workload,” in MILITARY COMMUNICATIONS CONFERENCE, 2012.
Google Scholar
Lakhina, A., Crovella, M., and Diot, C. “Diagnosing network-wide traffic anomalies,” ACM SIGCOMM Computer Communication Review, vol. 34, no. 4, pp. 219–230, 2004.
Google Scholar
Lakhina, A., Crovella, M., and Diot, C. “Mining anomalies using traffic feature distributions,” ACM SIGCOMM Computer Communication Review, vol. 35, no. 4, pp. 217–228, 2005.
Google Scholar
Lakkaraju, K., Yurcik, W., and Lee, A. J. “NVisionIP: netflow visualizations of system state for security situational awareness,” in 2004 ACM workshop on Visualization and data mining for computer security, 2004.
Google Scholar
Laskov, P., Dussel, P., Schafer, C., and Rieck, K. “Learning Intrusion Detection: Supervised or Unsupervised,” in Image analysis and processing, 2005.
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1999). Gradient-based learning applied to document recognition. Proceedings of the IEEE , 86(11), 2278-2324.
Article Google Scholar
LeCun, Y., Cortes, C., & Burges, C. J. (2014). MNIST handwritten digit database. Retrieved April 14, 2014, from http://yann.lecun.com/exdb/mnist/
Google Scholar
Li, P., and König, C. “b-Bit minwise hashing,” in ACM Proceedings of the 19th international conference on World wide web, 2010.
Google Scholar
Li, W.-J., Wang, K., Stolfo, S. J., and Herzog, B. “Fileprints: Identifying file types by n-gram analysis,” in Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, 2005.
Google Scholar
Mell, P. “Hyperagg: A Python Program for Efficient Alert Aggregation Using Set Cover Approximation and Hamming Distance,” National Institute of Standards and Technology, 2013. [Online]. Available: http://csrc.nist.gov/researchcode/hyperagg-mell-20130109.zip.
Mell, P., and Harang, R. “Enabling Efficient Analysts: Reducing Alerts to Review through Hamming Distance Based Aggregation (SUBMITTED),” in Twelfth Annual Conference on Privacy, Security, and Trust, Toronto, 2014.
Google Scholar
Molina, M., Paredes-Oliva, I., Routly, W., and Barlet-Ros, P. “Operational experiences with anomaly detection in backbone networks,” Computers & Security, vol. 31, no. 3, pp. 273–285, may 2012.
Google Scholar
Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT Press.
Google Scholar
Paxson, V. “Bro: A system for detecting network intruders in real time,” Computer Networks, vol. 31, no. 23–24, pp. 2435–2463, 1999.
Google Scholar
Rehak, M., Pechoucek, M., Celeda, P., Novotny, J., and Minarik, P. “CAMNEP: agent-based network intrusion detection system,” in Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems, 2008.
Google Scholar
Roesch, M. “Snort – lightweight intrusion detection for networks,” Proceedings of the 13th USENIX conference on System administration, pp. 229–238, 1999.
Google Scholar
Shi, Q., Petterson, J., Dror, G., Langford, J., Strehl, A. L., Smola, A. J., and Vishwanathan, S. V. N. “Hash kernels,” in International Conference on Artificial Intelligence and Statistics, 2009.
Google Scholar
Sommer, R., and Paxson, V. “Outside the Closed World: On Using Machine Learning for Network Intrusion Detection,” in 2010 IEEE Symposium on Security and Privacy (SP), 2010.
Google Scholar
Song, Y., Locasto, M. E., Stavrou, A., Keromytis, A. D., and Stolfo, S. J. “On the infeasibility of modeling polymorphic shellcode – Re-thinking . . .,” MACH LEARN, 2009.
Google Scholar
Wang, K., and Stolfo, S. “Anomalous payload-based network intrusion detection,” in Recent Advances in Intrusion Detection, 2004.
Google Scholar
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., and Attenberg, J. “Feature hashing for large scale multitask learning,” in Proceedings of the 26th Annual International Conference on Machine Learning, 2011.
Google Scholar
Wilshusen, G. C. “CYBERSECURITY: A Better Defined and Implemented National Strategy Is Needed to Address Persistent Challenges,” 2013.
Google Scholar
Xu, K., Zhang, Z.-L., and Bhattacharyya, S. “Reducing unwanted traffic in a backbone network,” in USENIX Workshop on Steps to Reduce Unwanted Traffic in the Internet, Boston, 2005.
Google Scholar
Yegneswaran, V., Barford, P., and Paxson, V. “Using honeynets for internet situational awareness,” in ACM Hotnets IV, 2005.
Google Scholar
Yin, X., Yurcik, W., Treaster, M., Li, Y., and Lakkaraju, K. “VisFlowConnect: netflow visualizations of link relationships for security situational awareness,” in 2004 ACM workshop on Visualization and data mining for computer security, 2004.
Google Scholar
Zhang, J., Zulkernine, M., and Haque, A. “Random-Forests-Based Network Intrusion Detection Systems,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 38, no. 5, pp. 649–659, sep 2008.
Google Scholar

Download references

Author information

Authors and Affiliations

United States Army Research Laboratory, Adelphi, MD, USA
Richard Harang

Authors

Richard Harang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard Harang .

Editor information

Editors and Affiliations

United States Army Research Laboratory, Adelphi, Maryland, USA
Alexander Kott
United States Army Research Office, Research Triangle Park, North Carolina, USA
Cliff Wang
United States Army Research Laboratory, Adelphi, Maryland, USA
Robert F. Erbacher

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Harang, R. (2014). Learning and Semantics. In: Kott, A., Wang, C., Erbacher, R. (eds) Cyber Defense and Situational Awareness. Advances in Information Security, vol 62. Springer, Cham. https://doi.org/10.1007/978-3-319-11391-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-11391-3_10
Published: 02 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11390-6
Online ISBN: 978-3-319-11391-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics