Abstract
Big data refers to datasets that we cannot manage with standard tools and within which lie valuable information previously hidden. New data mining techniques are needed to deal with the increasing size of such data, their complex structure as well as their veracity which is on covering questions of data imperfection and uncertainty. Even though big data veracity is often overlooked, it is very challenging and important for an accurate and reliable mining and knowledge discovery. This paper proposes MapReduce-based belief decision trees for big data as classifiers of uncertain large-scale datasets. The proposed averaging and conjunctive classification approaches are experimented for intrusion detection on KDD’99 massive intrusion dataset. Several granularity attacks’ levels have been considered depending on whether dealing with whole kind of attacks, or grouping them in categories or focusing on distinguishing normal and abnormal connections.




Similar content being viewed by others
References
Abbes T, Bouhoula A, Rusinowitch M (2004) Protocol analysis in intrusion detection using decision tree. In: International conference on information technology: coding and computing, vol 1. IEEE Computer Society, pp 404–408
Ajabi M, Boukhris I, Elouedi Z (2016) Big data classification using belief decision trees: Application to intrusion detection. In: International conference on advanced intelligent system and informatics, advances in intelligent systems and computing, vol 407. Springer, Berlin, pp 369–379
Akamine M, Ajmera J (2012) Decision tree-based acoustic models for speech recognition. EURASIP J Audio Speech Music Process 2012(1):10
Amdahl GM (2007) Validity of the single processor approach to achieving large scale computing capabilities, reprinted from the AFIPS conference proceedings. IEEE Solid State Circuits Soc Newsl 12(3):19–20
Appriou A (1999) Multisensor signal processing in the framework of the theory of evidence. In: NATO/RTA, SCI lecture series 216 on application of mathematical signal processing techniques to mission systems
Azar A, El-Metwally S (2013) Decision tree classifiers for automated medical diagnosis. Neural Comput Appl 23(7–8):2387–2403
Ben Amor N, Benferhat S, Elouedi Z (2004) Naive Bayes vs decision trees in intrusion detection systems. In: ACM symposium on applied computing. ACM, pp 420–424
Bouzida Y, Cuppens F (2006) Neural networks vs. decision trees for intrusion detection. In: IEEE/IST workshop on monitoring, attack detection and mitigation, vol 28, p 29
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and ensemble design of intrusion detection systems. Comput Secur 24(4):295–307
Chen C, Zhang C (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347
Chimphlee W, Abdullah AH, Sap MNM, Srinoy S, Chimphlee S (2006) Anomaly-based intrusion detection using fuzzy rough clustering. In: International conference on hybrid information technology, vol 1. IEEE, pp 329–334
Dai W, Ji W (2014) A mapreduce implementation of c4.5 decision tree algorithm. Int J Database Theory Appl 7(1):49–60
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Dempster A (1968) A generalization of Bayesian inference. J R Stat Soc Ser B (Methodol) 30:205–247
Denœux T, Zouhal LM (2001) Handling possibilistic labels in pattern classification using evidential reasoning. Fuzzy Sets Syst 122:47–62
Depren O, Topallar M, Anarim E, Ciliz MK (2005) An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks. Expert Syst Appl 29(4):713–722
Destercke S, Dubois D (2009) Can the minimum rule of possibility theory be extended to belief functions? In: European conference on symbolic and quantitative approaches to reasoning with uncertainty. Lecture Notes in Computer Science, vol 5590. Springer, Berlin, pp 299–310
Elouedi Z, Mellouli K (2001) Induction of belief decision trees: a conjunctive approach. In: Conference of the applied stochastic models and data analysis, pp 404–409
Elouedi Z, Mellouli K, Smets P (2000) Decision trees using the belief function theory. In: International conference on information processing and management of uncertainty, vol 1, pp 141–148
Elouedi Z, Mellouli K, Smets P (2001) Belief decision trees: theoretical foundations. Int J Approx Reason 28(2):91–124
Ferrera P, De Prado I, Palacios E, Fernandez-Marquez J, Di Marzo Serugendo G (2014) Tuple mapreduce and pangool: an associated implementation. Knowl Inf Syst 41(2):531–557
Hurwitz J, Nugent A, Halper F, Kaufman M (2013) Big data for dummies. Wiley, Hoboken
Koc L, Mazzuchi TA, Sarkani S (2013) A network intrusion detection system based on a hidden nave Bayes multiclass classifier. Expert Syst Appl 39(18):13492–13500
Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with MapReduce: a survey. SIGMOD Rec 40(4):11–20
Lin SW, Ying KC, Lee CY, Lee ZJ (2012) An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection. Appl Soft Comput 12(10):3285–3290
Lippmann R, Haines JW, Fried DJ, Korba J, Das K (2000) The 1999 DARPA off-line intrusion detection evaluation. Comput Netw 34(4):579–595
Liu G, Wang X (2008) An integrated intrusion detection system by using multiple neural networks. In: IEEE conference on cybernetics and intelligent systems. IEEE, pp 22–27
Madden S (2012) From databases to big data. IEEE Internet Comput 3:4–6
Mehta M, Agrawal R, Rissanen J (1996) SLIQ: a fast scalable classifier for data mining. In: Apers P, Bouzeghoub M, Gardarin G (eds) Advances in database technology. Springer, Berlin, pp 18–32
Om H, Kundu A (2012) A hybrid system for reducing the false alarm rate of anomaly intrusion detection system. In: International conference on recent advances in information technology (RAIT). IEEE, pp 131–136
Patel J, Katkar V (2016) A multi-classifiers based novel DoS/DDoS attack detection using fuzzy logic. In: International conference on ICT for sustainable development. Springer, Berlin, pp 809–815
Pathan ASK (2014) The state of the art in intrusion prevention and detection. CRC Press, Boca raton
Quinlan J (2014) C4.5: programs for machine learning. Elsevier, Amsterdam
Ramos V, Abraham A (2005) Antids: self organized ant-based clustering model for intrusion detection system. In: Abraham A, Dote Y, Furuhashi T, Köppen M, Ohuchi A, Ohsawa Y (eds) Soft Computing as transdisciplinary science and technology. Springer, Berlin, pp 977–986
Sagiroglu S, Sinanc D (2013) Big data: a review. In: International conference on collaboration technologies and systems. IEEE, pp 42–47
Scott SL (2004) A bayesian paradigm for designing intrusion detection systems. Comput Stat Data Anal 45(1):69–83
Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, Princeton
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: Symposium on mass storage systems and technologies. IEEE, pp 1–10
Smets P (1998) The transferable belief model for quantified belief representation, vol 1. Kluwer, Dordrecht, pp 267–301
Stolfo S (1999) KDD cup 1999 dataset. KDD repository. University of California, Irvine. http://kdd.ics.uci.edu
Suthaharan S (2014) Big data classification: problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Perform Eval Rev 41(4):70–73
Trabelsi S, Elouedi Z, El Aroui M (2014) Incremental induction of belief decision trees in averaging approach. In: International conference on database and expert systems applications. Springer, Berlin, pp 454–461
Trabelsi S, Elouedi Z, Mellouli K (2006) Pruning method of belief decision trees. World Acad Sci Eng Technol 21:100–105
Trabelsi S, Elouedi Z, Mellouli K (2007) Pruning belief decision tree methods in averaging and conjunctive approaches. Int J Approx Reason 46(3):568–595
Tsai CF, Hsu YF, Lin CY, Lin WY (2009) Intrusion detection by machine learning: a review. Expert Syst Appl 36(10):11994–12000
White T (2012) Hadoop: the definitive guide. O’Reilly Media, Sebastopol
Wu X, Kumar V, Quinlan J, Ghosh J, Yang Q, Motoda H (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37
Wu X, Zhu X, Wu G, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Yao Y, Lingras P (1998) Interpretations of belief functions in the theory of rough sets. Inf Sci 104(12):81–106
Yu J, Lee H, Kim MS, Park D (2008) Traffic flooding attack detection with SNMP MIB using SVM. Comput Commun 31(17):4212–4219
Zhang Z, Shen H (2005) Application of online-training SVMs for real-time intrusion detection with different considerations. Comput Commun 28(12):1428–1442
Haines JW, Lippmann RP, Fried DJ, Zissman MA, Tran E (2001) 1999 DARPA intrusion detection evaluation: design and procedures. DTIC Document
Zuech R, Khoshgoftaar TM, Wald R (2015) Intrusion detection and big heterogeneous data: a survey. J Big Data 2(1):1–41
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Boukhris, I., Elouedi, Z. & Ajabi, M. Toward intrusion detection using belief decision trees for big data. Knowl Inf Syst 53, 671–698 (2017). https://doi.org/10.1007/s10115-017-1034-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1034-4