Abstract
A reference set is a set of data of network traffic whose form and content allows detecting an event or a group of events. Realistic and representative datasets based on real traffic can improve research in the fields of intruders and anomaly detection. Creating reference sets tackles a number of issues such as the collection and storage of large volumes of data, the privacy of information and the relevance of collected events. Moreover, rare events are hard to analyse among background traffic and need specialist detection tools. One of the common problems that can be detected in network traffic is spam. This paper presents the methodology for creating a network traffic reference set for spam detection. The methodology concerns the selection of significant features, the collection and storage of data, the analysis of the collected data, the enrichment of the data with additional events and the propagation of the set. Moreover, a hybrid classifier that detects spam on relatively high level is presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Behera, G.: Privacy preserving c4.5 using gini index, pp. 1–4 (March 2011)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)
Deri, L.: nprobe: an open source netflow probe for gigabit networks. In: Proc. of Terena TNC 2003 (2003)
Fomenkov, M., Claffy, K.: Internet measurement data management challenges. In: Workshop on Research Data Lifecycle Management, Princeton, NJ (July 2011)
Grzenda, M.: Towards the reduction of data used for the classification of network flows. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS, vol. 7209, pp. 68–77. Springer, Heidelberg (2012), http://dx.doi.org/10.1007/978-3-642-28931-6_7
Kim, H., Claffy, K., Fomenkov, M., Barman, D., Faloutsos, M., Lee, K.: Internet traffic classification demystified: myths, caveats, and the best practices. In: Proceedings of the 2008 ACM CoNEXT Conference, CoNEXT 2008, pp. 11:1–11:12. ACM, New York (2008)
Kobiersky, P., Korenek, J., Polcak, L.: Packet header analysis and field extraction for multigigabit networks, pp. 96–101 (April 2009)
Limwiwatkul, L., Rungsawang, A.: Distributed denial of service detection using tcp/ip header and traffic measurement analysis, vol. 1, pp. 605–610 (October 2004)
Moore, A., Crogan, M., Moore, A.W., Mary, Q., Zuev, D., Zuev, D., Crogan, M.L.: Discriminators for use in flow-based classification. Tech. rep. (2005)
Ouyang, T., Ray, S., Rabinovich, M., Allman, M.: Can network characteristics detect spam effectively in a stand-alone enterprise? In: Spring, N., Riley, G.F. (eds.) PAM 2011. LNCS, vol. 6579, pp. 92–101. Springer, Heidelberg (2011)
Schatzmann, D., Burkhart, M., Spyropoulos, T.: Flow-level characteristics of spam and ham (291) (August 2008)
ŽádnÃk, M., Michlovský, Z.: Is spam visible in flow-level statistics? Tech. rep. (2009), http://www.fit.vutbr.cz/research/view_pub.php?id=9277
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Luckner, M., Filasiak, R. (2013). Reference Data Sets for Spam Detection: Creation, Analysis, Propagation. In: Pan, JS., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2013. Lecture Notes in Computer Science(), vol 8073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40846-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-40846-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40845-8
Online ISBN: 978-3-642-40846-5
eBook Packages: Computer ScienceComputer Science (R0)