Abstract
To prevent internal data leakage, database activity monitoring uses software agents to analyze protocol traffic over networks and to observe local database activities. However, the large size of data obtained from database activity monitoring has presented a significant barrier to effective monitoring and analysis of database activities. In this paper, we present database activity monitoring by means of a density-based outlier detection method and a commercial database activity monitoring solution. In order to provide efficient computing of outlier detection, we exploited a kd-tree index and an Approximated k-nearest neighbors (ANN) search method. By these means, the outlier computation time could be significantly reduced. The proposed methodology was successfully applied to a very large log dataset collected from the Korea Atomic Energy Research Institute (KAERI). The results showed that the proposed method can effectively detect outliers of database activities in a shorter computation time.
Similar content being viewed by others
Notes
It should be noted that the LOF algorithm application explained following section is not restricted to a specific solution; provided that database transaction logs are collected, any solution can be utilized.
References
Agyemang, M., & Ezeife, C. I. (2004). LSC-Mine: algorithm for mining local outliers. Proceedings of the 15th Information Resource Management Association (IRMA) International Conference, New Orleans, vol. 1 (pp. 5–8). Hershey: IRM Press.
Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., & Wu, A. (1998). An optimal algorithm for approximate nearest neighbor searching. Journal of the ACM, 45(6), 891–923.
Barnett, V., & Lewis, T. (1994). Outliers in statistical data. John Wiley.
Bentley, J. L. (1990). K-d trees for semidynamic point sets. Proceedings of 6th Annual ACM Symposium Computational Geometry (pp. 187–197). New York: ACM Press.
Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF: identifying density based local outliers. Proceedings of the ACM SIGMOD Conference, Dallas, Texas (pp. 93–104). New York: ACM.
Chaudhary, A., Szalay, A. S., Szalay, E. S., & Moore, A. W. (2002). Very fast outlier detection in large multidimensional data sets. The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery Madison, Wisconsin. New York: ACM Press.
Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3), 209–226.
Gartner (2007). Hype cycle for regulations and related standards, ID: G00141115. http://www.gartner.com/DisplayDocument?id=500178.
Gartner (2009). Hype cycle for governance, risk and compliance technologies, ID: G00168610. http://www.gartner.com/DisplayDocument?id=1080715.
Hawkins, D. (1980). Identification of outliers. Chapman and Hall.
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., & Srivastava, J. (2003). A comparative study of anomaly detection schemes in network intrusion detection. Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA.
Pokrajac, D., Lazarevic, A., & Latecki, L. J. (2007). Incremental local outlier detection for data streams. IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Honolulu, Hawaii (pp. 504–515). New York: IEEE Press.
Richardson, R. (2008). CSI/FBI Computer crime and security survey, available at http://www.goscsi.com.
Somansa (2009). Electronics data and management solution. http://www.somansatech.com.
Yuhanna, N., Heffner, R., & Schwaber, C. (2005). Comprehensive Database Security Requires Native DBMS Features and Third-Party Tools, Forrester. http://www.forrester.com/Research/Document/Excerpt/0,7211,36301,00.html.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, S., Cho, N.W., Lee, Y.J. et al. Application of density-based outlier detection to database activity monitoring. Inf Syst Front 15, 55–65 (2013). https://doi.org/10.1007/s10796-010-9266-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-010-9266-9