ABSTRACT
Software debugging, audit, and compliance testing are some of the tasks we perform using execution traces of an operating system. However, these actions gather information about the behavior of the software vis-a-vis its design aims. In this work, our analysis of the execution traces of an embedded real-time operating system (RTOS) is rather to model the behavior of the physical system being managed by the software application via the embedded operating system. Hence, for an event-triggered embedded RTOS that controls the behavior of a bespoke system like an unmanned aerial vehicle (UAV), the events in the execution traces of the embedded RTOS is directly linked to the operation of the controlled physical system. Therefore, we hypothesize that the frequency of events (method/function calls) per observation is a useful feature for modeling the behavior of the physical system controlled by the operating system.
Furthermore, we tackle the challenge of lack of data that sufficiently captures the possible degree of aberration that may occur in a system. We model augmentation via artificial missingness and imputation in the data we have to generate new cases. We implement missingness using the missing completely at random (MCAR) strategy, and we use the overall single mean imputation method at the imputation stage. This imputation method takes the average of the remaining values in the dataset and replaces missing values with this average. This accretion leads to an imputation-based augmented anomaly detection model that enables us to expand both the training and validation/test data. Expansion of the test data ensures that we reduce the misclassification resulting from the non-parametric nature of the anomalies that may occur on the physical system, while the use of injected data for training helps us to do a stress test on our model.
We test our model with traces of a real-time operating system kernel of a UAV, and the results show that the model achieves an improved anomalous trace detection accuracy even under the induced missingness.
- Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. 2001. On the surprising behavior of distance metrics in high dimensional space International Conference on Database Theory. Springer, 420--434. Google ScholarDigital Library
- Paul D. Allison. 2002. Missing data: Quantitative applications in the social sciences. Brit. J. Math. Statist. Psych. Vol. 55, 1 (2002), 193--196.Google ScholarCross Ref
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) Vol. 41, 3 (2009), 15. Google ScholarDigital Library
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2012. Anomaly detection for discrete sequences: A survey. IEEE Transactions on Knowledge and Data Engineering, Vol. 24, 5 (2012), 823--839. Google ScholarDigital Library
- A. Rogier T. Donders, Geert J.M.G. van der Heijden, Theo Stijnen, and Karel G.M. Moons. 2006. Review: a gentle introduction to imputation of missing values. Journal of clinical epidemiology Vol. 59, 10 (2006), 1087--1091.Google ScholarCross Ref
- Pedro Garcia-Teodoro, J. Diaz-Verdejo, Gabriel Maciá-Fernández, and Enrique Vázquez. 2009. Anomaly-based network intrusion detection: Techniques, systems and challenges. computers & security Vol. 28, 1 (2009), 18--28. Google ScholarDigital Library
- Yu Gu, Andrew McCallum, and Don Towsley. 2005. Detecting anomalies in network traffic using maximum entropy estimation Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement. USENIX Association, 32--32. Google ScholarDigital Library
- Stamatis Karnouskos. 2011. Stuxnet worm impact on industrial cyber-physical system security IECON 2011-37th Annual Conference on IEEE Industrial Electronics Society. IEEE, 4490--4494.Google Scholar
- Roderick J.A. Little. 1988. A test of missing completely at random for multivariate data with missing values. J. Amer. Statist. Assoc. Vol. 83, 404 (1988), 1198--1202.Google ScholarCross Ref
- George Nychis, Vyas Sekar, David G. Andersen, Hyong Kim, and Hui Zhang. 2008. An empirical evaluation of entropy-based traffic anomaly detection Proceedings of the 8th ACM SIGCOMM conference on Internet measurement. ACM, 151--156. Google ScholarDigital Library
- Animesh Patcha and Jung-Min Park. 2007. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Computer networks, Vol. 51, 12 (2007), 3448--3470. Google ScholarDigital Library
- Per Runeson, Magnus Alexandersson, and Oskar Nyholm. 2007. Detection of duplicate defect reports using natural language processing Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, 499--510. Google ScholarDigital Library
- Mahmoud Salem, Mark Crowley, and Sebastian Fischmeister. 2016 a. Anomaly detection using inter-arrival curves for real-time systems Real-Time Systems (ECRTS), 2016 28th Euromicro Conference on. IEEE, 97--106.Google Scholar
- Mahmoud Salem, Mark Crowley, and Sebastian Fischmeister. 2016 b. Dataset for Anomaly Detection Using Inter-Arrival Curves for Real-time Systems. (July. 2016).Google Scholar
- Robert R. Sokal. 1958. A statistical method for evaluating systematic relationships. Univ Kans Sci Bull Vol. 38 (1958), 1409--1438.Google Scholar
- Robert R. Sokal and F. James Rohlf. 1962. The comparison of dendrograms by objective methods. Taxon (1962), 33--40.Google Scholar
- Marina Soley-Bori. 2013. Dealing with missing data: Key assumptions and methods for applied analysis. Boston University (2013).Google Scholar
- Arno Wagner and Bernhard Plattner. 2005. Entropy based worm and anomaly detection in fast IP networks Enabling Technologies: Infrastructure for Collaborative Enterprise, 2005. 14th IEEE International Workshops on. IEEE, 172--177. Google ScholarDigital Library
- Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. 2008. An approach to detecting duplicate bug reports using natural language and execution information. In Software Engineering, 2008. ICSE'08. ACM/IEEE 30th International Conference on. IEEE, 461--470. Google ScholarDigital Library
Index Terms
- An Imputation-based Augmented Anomaly Detection from Large Traces of Operating System Events
Recommendations
Four Factors Affecting Missing Data Imputation
SSDBM '23: Proceedings of the 35th International Conference on Scientific and Statistical Database ManagementMissing data is a common problem in datasets and impacts the reliability of data analysis. Numerous methods to impute (i.e., predict and replace) missing values have been proposed. The quality of these imputed values depends on factors like correlation,...
Autoencoding Binary Classifiers for Supervised Anomaly Detection
PRICAI 2019: Trends in Artificial IntelligenceAbstractWe propose the Autoencoding Binary Classifiers (ABC), a novel supervised anomaly detector based on the Autoencoder (AE). There are two main approaches in anomaly detection: supervised and unsupervised. The supervised approach accurately detects ...
Unsupervised Anomaly Detection on Microservice Traces through Graph VAE
WWW '23: Proceedings of the ACM Web Conference 2023The microservice architecture is widely employed in large Internet systems. For each user request, a few of the microservices are called, and a trace is formed to record the tree-like call dependencies among microservices and the time consumption at ...
Comments