Abstract
Ensemble methods have been recently used in many applications of machine learning in different areas. In this context, outlier detection is an area where recently these methods have received increasing attention. This paper deals with randomization in ensemble methods for outlier detection. We have developed a novel algorithm exploiting stochastic local search heuristics to induce diversity in an ensemble outlier detection algorithm. We exploit the capability of the GRASP heuristic to induce diversity into the search process and to maintain a good balance of exploitation and diversification in building the ensemble. The conducted experiments show interesting improvements over the greedy ensemble method and open the path for novel research in this direction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hawkins, D.: Identification of outliers. Monographs on Applied Probability and Statistics (1980)
Grubbs, F.E.: Procedures for Detecting outlying observations in samples. In: Technometrics 11.1 (1969), pp. 1–21 (1969)
Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, Hoboken (1994)
Ng, R., Subrahmanian, V.: Stable for Semantics for Probabilistic Deductive Database. University of MaryLand (1990)
Blakeslee, S.: Lost on Earth: Wealth of Data Found in Space. The New York Times (1990)
Ester, M., Kriegel, H-P., Sander, J., Xu, X.: A Density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996 Proceedings of AAAI (1996). Copyright © 1996. www.aaai.org
Ankerst, M., Breunig, M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: SIGMOD 1999 Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, 31 May–03 June 1999, pp. 49–60. ACM, New York (1999). ©1999
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, 16–18 May 2000, pp. 93–104. ACM (2000)
Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Taipei, Taiwan (2002)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B.: LOCI: fast outlier detection using the local correlation integral. In: IEEE 19th International Conference on Data Engineering (ICDE 2003) (2003)
Kriegel, H.-P., Kroger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of CIKM, pp. 1649–1652 (2009)
Schubert, E., Zimek, A., Kriegel, H.-P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Disc. (2012). doi:10.1007/s10618-012-0300-z
Hadi, A.S., Imon, A.H.M.R., Werner, M.: Detection of outliers. Wiley Interdisc. Rev.: Comput. Stat. 1(1), 57–70 (2009)
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Proceedings of European Conference on Principles of Knowledge Discovery and Data Mining, Helsinki, Finland (2002)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets, pp. 392–403 (1998)
Orair, G.H., Teixeira, C.H.C., Meira Jr., W., Wang, Y., Parthasarathy, S.: Distance-based outlier detection: consolidation
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD Record, vol. 29, pp. 427–438. ACM (2000)
Vu, N.H., Gopalkrishnan, V.: Efficient Pruning Schemes for Distance-Based Outlier Detection. In: Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 5782, pp. 160–175
Zhang, K., Hutter, M., Jin, H.: A New Local Distance-Based Outlier Detection (2009)
Approach for scattered real-world data. In: Proceedings of 13th Pacific-Asia Conference on Knowledge and Discovery and Data Mining (PAKDD 2000), pp. 813–822
de Vries, T., Chawla, S., Houle, M.E.: Finding local anomalies in very high dimensional space. In: Proceedings of the 10th IEEE International Conference on Data Mining (ICDM), Sydney, Australia, pp. 128–137 (2010). doi:10.1109/ICDM.2010.151
Keller, F., Müller, E., Böhm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of the 28th International Conference on Data Engineering (ICDE), Washington, DC (2012)
Zimek, A., Gaudet, R.J.G., Campello, B., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: Proceedings of the 19th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Chicago, IL, pp. 428–436 (2013). doi:10.1145/2487575.2487676
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996). © 1996 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands
Schubert, E., Wojdanowski, R., Zimek, A., Kriegel, H.: On evaluation of outlier rankings and outlier scores. In: Proceedings of the SIAM International Conference on Data Mining, (SIAM 2012), Anaheim, CA, pp. 1047–1058 (2012)
Schubert, E.: Generalized and Efficient Outlier Detection for Spatial, Temporal, and High-Dimensional Data Mining. Munchen, Germany (2013)
Hoos, H.H., Stützle, T.: Stochastic Local Search Foundations and Applications. Elsevier Inc., San Francisco (2005)
Resende, M., Ribeiro, C.: Greedy randomized adaptive search procedures. J. Glob. Optim. 6, 109–133 (1995). Kluwer Academic Publisher, Netherlands
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Nishani, L., Biba, M. (2018). Randomizing Greedy Ensemble Outlier Detection with GRASP. In: Barolli, L., Terzo, O. (eds) Complex, Intelligent, and Software Intensive Systems. CISIS 2017. Advances in Intelligent Systems and Computing, vol 611. Springer, Cham. https://doi.org/10.1007/978-3-319-61566-0_92
Download citation
DOI: https://doi.org/10.1007/978-3-319-61566-0_92
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61565-3
Online ISBN: 978-3-319-61566-0
eBook Packages: EngineeringEngineering (R0)