Abstract
In this paper, we study the problem of subspace outlier detection in high dimensional data space and propose a new genetic algorithm-based technique to identify outliers embedded in subspaces. The existing technique, mainly using genetic algorithm (GA) to carry out the subspace search, is generally slow due to its expensive fitness evaluation and long solution encoding scheme. In this paper, we propose a novel technique to improve the performance of the existing GA-based outlier detection method using a bit freezing approach to achieve a faster convergence. Through freezing converged bits in the solution encoding strings, this innovative approach can contribute to fast crossover and mutation operations and achieve an early stop of the GA that leads to more accurate approximation of fitness function. This research work can contribute to the development of a more efficient search method for detecting subspace outliers. The experimental results demonstrate the improved efficiency of our technique compared with the existing method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14, 211–221 (2005)
Aggarwal, C.C.: On abnormality detection in spuriously populated data streams. In: SDM 2005, Newport Beach, CA (2005)
Aggarwal, C.C., Yu, P.S.: Outlier detection in high dimensional data. In: SIGMOD 2001, Santa Barbara, California, USA, pp. 37–46 (2001)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB 2003, Berlin, Germany, pp. 81–92 (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: VLDB 2004, Toronto, Canada, pp. 852–863 (2004)
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–27. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45681-3_2
Breuning, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD 2000, Dallas, Texas, pp. 93–104 (2000)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD 1984, Boston, Massachusetts, pp. 47–57 (1984)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, Burlington (2000)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large dataset. In: VLDB 1998, New York, NY, pp. 392–403 (1998)
Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: VLDB 1999, Edinburgh, Scotland, pp. 211–222 (1999)
Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Distributed deviation detection in sensor networks. SIGMOD Rec. 32(4), 77–82 (2003)
Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD 2000, Dallas Texas, pp. 427–438 (2000)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: ICDE 2003, Bangalore, India, p. 315 (2003)
Pokrajac, D., Lazarevic, A., Latecki, L.: Incremental local outlier detection for data streams. In: CIDM 2007, Honolulu, Hawaii, USA, pp. 504–515 (2007)
Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online outlier detection in sensor data using non-parametric models. In: VLDB 2006, Seoul, Korea, pp. 187–198 (2006)
Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_53
Zhang, J., Lou, M., Ling, T.W., Wang, H.: HOS-miner: a system for detecting outlying subspaces of high-dimensional data. In: VLDB 2004, Toronto, Canada, pp. 1265–1268 (2004)
Zhang, J., Gao, Q., Wang, H.: A novel method for detecting outlying subspaces in high-dimensional databases using genetic algorithm. In: ICDM 2006, Hong Kong, China, pp. 731–740 (2006)
Zhang, J., Wang, H.: Detecting outlying subspaces for high-dimensional data the new task, algorithms and performance. Knowl. Inf. Syst. (KAIS) 10, 333–355 (2006)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: SIGMOD 1996, Montreal, Canada, pp. 103–114 (1996)
Zhu, C., Kitagawa, H., Faloutsos, C.: Example-based robust outlier detection in high dimensional datasets. In: ICDM 2005, Houston, Texas, pp. 829–832 (2005)
Zhang, J., Gao, Q., Wang, H., Liu, Q., Xu, K.: Detecting projected outliers in high-dimensional data streams. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 629–644. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03573-9_53
Zhang, J., Tao, X., Wang, H.: Outlier detection from large distributed databases. World Wide Web J. (WWWJ) 17(4), 539–568 (2014). https://doi.org/10.1007/s11280-013-0218-4
Zhu, X., Zhang, J., Li, H., Fournier-Viger, P., Lin, J.C.-W., Chang, L.: FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection. IEEE Access 5, 25682–25695 (2017)
Acknowledgment
This research was partially supported by National Key Research and Development Program of China (No. 2017YFB0802300), the National Natural Science Foundation of China (No. 61602240), Guangxi Key Laboratory of Trusted Software (No. kx201615) and Capacity Building Project for Young University Staff in Guangxi Province, Department of Education, Guangxi Province (No. ky2016YB149).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, X. et al. (2018). A Genetic Algorithm Based Technique for Outlier Detection with Fast Convergence. In: Gan, G., Li, B., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2018. Lecture Notes in Computer Science(), vol 11323. Springer, Cham. https://doi.org/10.1007/978-3-030-05090-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-05090-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05089-4
Online ISBN: 978-3-030-05090-0
eBook Packages: Computer ScienceComputer Science (R0)