Skip to main content
Log in

Experimenting and assessing machine learning tools for detecting and analyzing malicious behaviors in complex environments

  • Original Article
  • Published:
Journal of Reliable Intelligent Environments Aims and scope Submit manuscript

Abstract

This paper proposes applying and experimentally assessing machine learning tools to solve security issues in complex environments, specifically identifying and analyzing malicious behaviors. To evaluate the effectiveness of machine learning algorithms to detect anomalies, we consider the following three real-world case studies: (i) detecting and analyzing Tor traffic, on the basis of a machine learning-based discrimination technique; (ii) identifying and analyzing CAN bus attacks via deep learning; (iii) detecting and analyzing mobile malware, with particular regard to ransomware in Android environments, by means of structural entropy-based classification. Derived observations confirm the effectiveness of machine learning in supporting security of complex environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://archive.wired.com/politics/security/news/2005/05/67542?currentPage=all.

  2. http://www.csoonline.com/article/2131589/investigations-forensics/how-online-black-markets-work.html.

  3. http://www.cs.waikato.ac.nz/ml/weka/.

  4. http://www.unb.ca/cic/research/datasets/tor.html.

  5. www.can.bosch.com.

  6. https://sites.google.com/a/hksecurity.net/ocslab/Datasets/car-hacking-dataset.

  7. http://www.cs.waikato.ac.nz/ml/weka/.

  8. https://github.com/Waikato/wekaDeeplearning4j.

  9. https://goo.gl/ht8Rqd.

  10. https://goo.gl/C94swo.

  11. https://goo.gl/t7UbkU.

  12. https://goo.gl/iRJYfH.

  13. http://www.cs.waikato.ac.nz/ml/weka/.

  14. http://users.aber.ac.uk/rkj/book/wekafull.jar.

  15. https://developers.virustotal.com/reference.

  16. https://www.virustotal.com/.

References

  1. Dissecting the android bouncer. https://jon.oberheide.org/files/summercon12-bouncer.pdf. Accessed 30 Jan 2015

  2. Addision PS (2002) The illustrated wavelet transform handbook: introductory theory and applications in science, engineering, medicine and finance. Taylor & Francis Group, Abingdon

    Google Scholar 

  3. Al-Kahtani MS (2012) Survey on security attacks in vehicular ad hoc networks (vanets). In: 6th international conference on signal processing and communication systems (ICSPCS), 2012, pp 1–9.,IEEE

  4. Al-rimy BAS, Maarof MA, Shaid SZM. (2018) Ransomware threat success factors, taxonomy, and countermeasures: a survey and research directions. Comput Secur

  5. Andronio N, Zanero S, Maggi F (2015) Heldroid: dissecting and detecting mobile ransomware. In: International workshop on recent advances in intrusion detection, pp 382–404. Springer

  6. Athanasiadis IN, Kaburlasos VG, Mitkas PA, Petridis V (2003) Applying machine learning techniques on air quality data for real-time decision support. In: ITEE. Citeseer

  7. Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding, pp 29–39. Springer

  8. Barker J, Hannay P, Szewczyk P (2011) Using traffic analysis to identify the second generation onion router. In: IFIP 9th international Conference on embedded and ubiquitous computing (EUC), 2011, pp 72–78. IEEE

  9. Battista P, Mercaldo F, Nardone V, Santone A, Visaggio CA (2016) Identification of android malware families with model checking. In: Proceedings of the 2nd international conference on information systems security and privacy, ICISSP 2016, Rome, Italy, February 19–21, 2016, pp 542–547. SciTePress

  10. Baysa D, Low RM, Stamp M (2013) Structural entropy and metamorphic malware. J Comput Virol Hacking Tech 9(4):179–192

    Article  Google Scholar 

  11. Bernardi ML, Cimitile M, Martinelli F, Mercaldo F (2018) Driver and path detection through time-series classification. J Adv Transp

  12. Borda M (2011) Fundamentals in information theory and coding. Springer

  13. Bouckaert RR. (2004) Bayesian network classifiers in weka

  14. Braun P, Cameron J, Cuzzocrea A, Jiang F, Leung C (2014) Effectively and efficiently mining frequent patterns from dense graph streams on disk. Proc Comput Sci 35:338–347

    Article  Google Scholar 

  15. Canfora G, Mercaldo F, Visaggio CA (2013) A classifier of malicious android applications. In: Eighth international conference on availability, reliability and security (ARES), 2013, pp 607–614. IEEE

  16. Canfora G, Mercaldo F, Visaggio CA (2016) An hmm and structural entropy based detector for android malware: an empirical study. Comput Secur 61:1–18

    Article  Google Scholar 

  17. Canfora G, Mercaldo F, Visaggio CA, Di Notte P (2014) Metamorphic malware detection using code metrics. Inform Secur J Glob Perspect 23(3):57–67

    Article  Google Scholar 

  18. Cannataro M, Cuzzocrea A, Mastroianni C, Ortale R, Pugliese A (2002) Modeling adaptive hypermedia with an object-oriented approach and xml. In: Second international workshop on web dynamics

  19. Cannataro M, Cuzzocrea A, Pugliese A (2001) A probabilistic approach to model adaptive hypermedia systems. In: Proceedings of the international workshop for web dynamics, pp 12–30

  20. Chaabane A, Manils P, Kaafar MA (2010) Digging into anonymous traffic: a deep analysis of the tor anonymizing network. In: 4th International conference on network and system security (NSS), 2010, pp 167–174. IEEE

  21. Chakravarty S, Barbera MV, Portokalidis G, Polychronakis M, Keromytis AD (2014) On the effectiveness of traffic analysis against anonymity networks using flow records. In: PAM, pp 247–257. Springer

  22. Cimitile A, Martinelli F, Mercaldo F (2017) Machine learning meets ios malware: identifying malicious applications on apple environment. In: Proceedings of the 3rd international conference on information systems security and privacy, pp 487–492

  23. Cimitile A, Mercaldo F, Nardone V, Santone A, Visaggio CA (2017) Talos: no more ransomware victims with formal methods. Int J Inform Secur

  24. Cuzzocrea A (2006) Accuracy control in compressed multidimensional data cubes for quality of answer-based OLAP tools. In: 18th International conference on scientific and statistical database management, SSDBM 2006, 3–5 July 2006, Vienna, Austria, Proceedings, pp 301–310

  25. Cuzzocrea A (2006) Combining multidimensional user models and knowledge representation and management techniques for making web services knowledge-aware. Web Intell Agent Syst 4(3):289–312

    Google Scholar 

  26. Cuzzocrea A (2006) Improving range-sum query evaluation on data cubes via polynomial approximation. Data Knowl Eng 56(2):85–121

    Article  Google Scholar 

  27. Cuzzocrea A, Fortino G, Rana OF (2013) Managing data and processes in cloud-enabled large-scale sensor networks: state-of-the-art and future research directions. In: 13th IEEE/ACM international symposium on cluster, cloud, and grid computing, CCGrid 2013, Delft, Netherlands, May 13–16, 2013, pp 583–588

  28. Cuzzocrea A, Furfaro F, Greco S, Masciari E, Mazzeo GM, Saccà D (2005) A distributed system for answering range queries on sensor network data. In: 3rd IEEE conference on pervasive computing and communications workshops (PerCom 2005 Workshops), 8–12 March 2005, Kauai Island, HI, USA, pp 369–373

  29. Cuzzocrea A, Furfaro F, Saccà D (2009) Enabling OLAP in mobile environments via intelligent data cube compression techniques. J Intell Inf Syst 33(2):95–143

    Article  Google Scholar 

  30. De Francesco N, Lettieri G, Santone A, Vaglini G (2014) Grease: a tool for efficient nonequivalence checking. ACM Trans Softw Eng Methodol 23(3):24

    Article  Google Scholar 

  31. De Francesco N, Lettieri G, Santone A, Vaglini G (2016) Heuristic search for equivalence checking. Softw Syst Model 15(2):513–530

    Article  Google Scholar 

  32. Ding L, Fang W, Luo H, Love PE, Zhong B, Ouyang X (2018) A deep hybrid learning model to detect unsafe behavior: integrating convolution neural networks and long short-term memory. Autom Constr 86:118–124

    Article  Google Scholar 

  33. Dingledine R, Mathewson N, Syverson P (2004) Tor: the second-generation onion router. Tech. rep, DTIC Document

  34. Draper-Gil G, Lashkari AH, Mamun MSI, Ghorbani AA (2016) Characterization of encrypted and vpn traffic using time-related

  35. Ferrante A, Malek M, Martinelli F, Mercaldo F, Milosevic J (2017) Extinguishing ransomware-a hybrid approach to android ransomware detection. In: The 10th international symposium on foundations practice of security

  36. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

    Article  Google Scholar 

  37. Gharib A, Ghorbani A (2017) Dna-droid: a real-time android ransomware detection framework. In: Yan Z, Molva R, Mazurczyk W, Kantola R (eds) Network and system security: 11th International conference, NSS 2017, Helsinki, Finland, August 21–23, 2017, Proceedings

  38. Goldszmidt M (2010) Bayesian network classifiers. Wiley encyclopedia of operations research and management science

  39. Gradara S, Santone A, Villani M, Vaglini G (2004) Model checking multithreaded programs by means of reduced models. Electr Notes Theor Comput Sci 110:55–74

    Article  Google Scholar 

  40. He G, Yang M, Luo J, Gu X (2014) Inferring application type information from tor encrypted traffic. In: Second international Conference on advanced cloud and big data (CBD), 2014, pp 220–227. IEEE

  41. Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11(1):63–90

    Article  MathSciNet  Google Scholar 

  42. Hühn J, Hüllermeier E (2009) Furia: an algorithm for unordered fuzzy rule induction. Data Min Knowl Discov 19(3):293–319

    Article  MathSciNet  Google Scholar 

  43. Ilisei I, Inkpen D, Pastor GC, Mitkov, R (2010) Identification of translationese: a machine learning approach. In: CICLing, vol 6008, pp 503–511. Springer

  44. Ishibuchi H, Yamamoto T (2004) Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining. Fuzzy Sets Syst 141(1):59–88

    Article  Google Scholar 

  45. Jayanthi S, Sasikala S (2013) Reptree classifier for identifying link spam in web search engines. IJSC 3(2):498–505

    Article  Google Scholar 

  46. Jensen R, Cornelis C (2008) A new approach to fuzzy-rough nearest neighbour classification. In: International conference on rough sets and current trends in computing, pp 310–319. Springer

  47. Jensen R, Cornelis C (2011) Fuzzy-rough nearest neighbour classification. In: Transactions on rough sets XIII, pp 56–72. Springer

  48. Kwak BI, Woo J, Kim HK (2016) Know your master: driver profiling-based anti-theft method. In: PST 2016

  49. Lashkari AH, Gil GD, Mamun MSI, Ghorbani AA (2017) Characterization of tor traffic using time based features. In: Proceedings of the 3rd international conference on information systems security and privacy, vol 1, ICISSP,, pp 253–262. INSTICC, SciTePress

  50. Li KC, Jiang H, Yang LT, Cuzzocrea A (2015) Big data: algorithms, analytics, and applications, 1st edn. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  51. Lyda R, Hamrock J (2007) Using entropy analysis to find encrypted and packed malware. Secur Priv IEEE 5(2):40–45

    Article  Google Scholar 

  52. Maiorca D, Mercaldo F, Giacinto G, Visaggio CA, Martinelli F (2017) R-packdroid: Api package-based characterization and detection of mobile ransomware. In: Proceedings of the symposium on applied computing, pp 1718–1723. ACM

  53. Martinelli F, Marulli F, Mercaldo F (2017) Evaluating convolutional neural network for effective mobile malware detection. Proc Comput Sci 112:2372–2381

    Article  Google Scholar 

  54. Martinelli F, Mercaldo F, Nardone V, Orlando A, Santone A (2018) Whos driving my car? a machine learning based approach to driver identification. In: ICISSP

  55. Martinelli F, Mercaldo F, Nardone V, Santone A (2017) Car hacking identification through fuzzy logic algorithms. In: IEEE International Conference on fuzzy systems (FUZZ-IEEE), IEEE

  56. McCoy D, Bauer K, Grunwald D, Kohno T, Sicker D (2008) Shining light in dark places: understanding the tor network. In: International symposium on privacy enhancing technologies symposium, pp 63–76. Springer

  57. Mercaldo F, Nardone V, Santone A (2016) Ransomware inside out. In: 11th International Conference on availability, reliability and security (ARES), 2016, pp 628–637. IEEE

  58. Mercaldo F, Nardone V, Santone A (2017) Diabetes mellitus affected patients classification and diagnosis through machine learning techniques. Proc Comput Sci 112(C):2519–2528

    Article  Google Scholar 

  59. Mercaldo F, Nardone V, Santone A, Visaggio CA (2016) Ransomware steals your phone. Formal methods rescue it. In: International conference on formal techniques for distributed objects, components, and systems, pp 212–221. Springer

  60. Mercaldo F, Visaggio CA, Canfora G, Cimitile A (2016) Mobile malware detection in the real world. In: IEEE/ACM international conference on software engineering companion (ICSE-C), pp 744–746. IEEE

  61. Mitchell TM (1999) Machine learning and data mining. Commun ACM 42(11):30–36

    Article  Google Scholar 

  62. Pérez JM, Muguerza J, Arbelaitz O, Gurrutxaga I, Martín JI (2007) Combining multiple class distribution modified subsamples in a single tree. Pattern Recognit Lett 28(4):414–422

    Article  Google Scholar 

  63. Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo

    Google Scholar 

  64. Rajput A, Aharwal RP, Dubey M, Saxena S, Raghuvanshi M (2011) J48 and jrip rules for e-governance data. Int J Comput Sci Secur 5(2):201

    Google Scholar 

  65. Samara G, Al-Salihy WA, Sures R (2010) Security issues and challenges of vehicular ad hoc networks (vanet). In: 4th International Conference on new trends in information science and service science (NISS), 2010, pp 393–398. IEEE

  66. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  67. Shahzad W, Asad S, Khan MA (2013) Feature subset selection using association rule mining and jrip classifier. Int J Phys Sci 8(18):885–896

    Article  Google Scholar 

  68. Snader R, Borisov N (2008) A tune-up for tor: improving security and performance in the tor network. In: ndss, vol 8, p 127

  69. Song S, Kim B, Lee S (2016) The effective ransomware prevention technique using process monitoring on android platform. Mobile Inform Syst

  70. Sorokin I (2011) Comparing files using structural entropy. J Comput Virol Hacking Tech 7(4):259–265

    Article  MathSciNet  Google Scholar 

  71. Srinivasan DB, Mekala P (2014) Mining social networking data for classification using reptree. Int J Adv Res Comput Sci Manag Stud 2(10)

  72. Syverson P, Tsudik G, Reed M, Landwehr C (2001) Towards an analysis of onion routing security. In: Designing privacy enhancing technologies. Springer, pp 96–114

  73. Ugarte-Pedrero X, Santos I, Sanz B, Laorden C, Bringas PG (2012) Countering entropy measure attacks on packed software detection. In: The 9th annual IEEE consumer communications and networking conference—security and content protection, pp 164–168

  74. Villarrubia G, De Paz JF, Chamoso P, De la Prieta F (2018) Artificial neural networks used in optimization problems. Neurocomputing 272:10–16

    Article  Google Scholar 

  75. Webb G (1999) Decision tree grafting from the all-tests-but-one partition. Morgan Kaufmann, San Francisco

    Google Scholar 

  76. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY (2008) Top 10 algorithms in data mining. Knowl Inform Syst 14(1):1–37

    Article  Google Scholar 

  77. Xiao X, Zhang S, Mercaldo F, Hu G, Sangaiah AK (2017) Android malware detection based on system call sequences and lstm. Multimed Tools Appl 1–21

  78. Yang T, Yang Y, Qian K, Lo DCT, Qian Y, Tao L (2015) Automated detection and analysis for android ransomware. In: IEEE 17th international conference on high performance computing and communications, IEEE 7th international symposium on cyberspace safety and security, IEEE 12th international conference on embedded software and systems, pp 1338–1343. IEEE

  79. Zhou Y, Jiang X (2012) Dissecting android malware: characterization and evolution. In: IEEE symposium on security and privacy (SP), 2012, pp 95–109. IEEE

Download references

Acknowledgements

This work has been partially supported by H2020 EU-funded projects NeCS and C3ISP and EIT-Digital Project HII.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfredo Cuzzocrea.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cuzzocrea, A., Martinelli, F., Mercaldo, F. et al. Experimenting and assessing machine learning tools for detecting and analyzing malicious behaviors in complex environments. J Reliable Intell Environ 4, 225–245 (2018). https://doi.org/10.1007/s40860-018-0072-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40860-018-0072-3

Keywords

Navigation