Skip to main content

A Genetic Algorithm Based Technique for Outlier Detection with Fast Convergence

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11323))

Abstract

In this paper, we study the problem of subspace outlier detection in high dimensional data space and propose a new genetic algorithm-based technique to identify outliers embedded in subspaces. The existing technique, mainly using genetic algorithm (GA) to carry out the subspace search, is generally slow due to its expensive fitness evaluation and long solution encoding scheme. In this paper, we propose a novel technique to improve the performance of the existing GA-based outlier detection method using a bit freezing approach to achieve a faster convergence. Through freezing converged bits in the solution encoding strings, this innovative approach can contribute to fast crossover and mutation operations and achieve an early stop of the GA that leads to more accurate approximation of fitness function. This research work can contribute to the development of a more efficient search method for detecting subspace outliers. The experimental results demonstrate the improved efficiency of our technique compared with the existing method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14, 211–221 (2005)

    Article  Google Scholar 

  2. Aggarwal, C.C.: On abnormality detection in spuriously populated data streams. In: SDM 2005, Newport Beach, CA (2005)

    Google Scholar 

  3. Aggarwal, C.C., Yu, P.S.: Outlier detection in high dimensional data. In: SIGMOD 2001, Santa Barbara, California, USA, pp. 37–46 (2001)

    Google Scholar 

  4. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB 2003, Berlin, Germany, pp. 81–92 (2003)

    Google Scholar 

  5. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: VLDB 2004, Toronto, Canada, pp. 852–863 (2004)

    Chapter  Google Scholar 

  6. Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–27. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45681-3_2

    Chapter  Google Scholar 

  7. Breuning, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD 2000, Dallas, Texas, pp. 93–104 (2000)

    Google Scholar 

  8. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD 1984, Boston, Massachusetts, pp. 47–57 (1984)

    Google Scholar 

  9. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, Burlington (2000)

    MATH  Google Scholar 

  10. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large dataset. In: VLDB 1998, New York, NY, pp. 392–403 (1998)

    Google Scholar 

  11. Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: VLDB 1999, Edinburgh, Scotland, pp. 211–222 (1999)

    Google Scholar 

  12. Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Distributed deviation detection in sensor networks. SIGMOD Rec. 32(4), 77–82 (2003)

    Article  Google Scholar 

  13. Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD 2000, Dallas Texas, pp. 427–438 (2000)

    Article  Google Scholar 

  14. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: ICDE 2003, Bangalore, India, p. 315 (2003)

    Google Scholar 

  15. Pokrajac, D., Lazarevic, A., Latecki, L.: Incremental local outlier detection for data streams. In: CIDM 2007, Honolulu, Hawaii, USA, pp. 504–515 (2007)

    Google Scholar 

  16. Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online outlier detection in sensor data using non-parametric models. In: VLDB 2006, Seoul, Korea, pp. 187–198 (2006)

    Google Scholar 

  17. Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_53

    Chapter  Google Scholar 

  18. Zhang, J., Lou, M., Ling, T.W., Wang, H.: HOS-miner: a system for detecting outlying subspaces of high-dimensional data. In: VLDB 2004, Toronto, Canada, pp. 1265–1268 (2004)

    Google Scholar 

  19. Zhang, J., Gao, Q., Wang, H.: A novel method for detecting outlying subspaces in high-dimensional databases using genetic algorithm. In: ICDM 2006, Hong Kong, China, pp. 731–740 (2006)

    Google Scholar 

  20. Zhang, J., Wang, H.: Detecting outlying subspaces for high-dimensional data the new task, algorithms and performance. Knowl. Inf. Syst. (KAIS) 10, 333–355 (2006)

    Article  Google Scholar 

  21. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: SIGMOD 1996, Montreal, Canada, pp. 103–114 (1996)

    Article  Google Scholar 

  22. Zhu, C., Kitagawa, H., Faloutsos, C.: Example-based robust outlier detection in high dimensional datasets. In: ICDM 2005, Houston, Texas, pp. 829–832 (2005)

    Google Scholar 

  23. Zhang, J., Gao, Q., Wang, H., Liu, Q., Xu, K.: Detecting projected outliers in high-dimensional data streams. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 629–644. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03573-9_53

    Chapter  Google Scholar 

  24. Zhang, J., Tao, X., Wang, H.: Outlier detection from large distributed databases. World Wide Web J. (WWWJ) 17(4), 539–568 (2014). https://doi.org/10.1007/s11280-013-0218-4

    Article  Google Scholar 

  25. Zhu, X., Zhang, J., Li, H., Fournier-Viger, P., Lin, J.C.-W., Chang, L.: FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection. IEEE Access 5, 25682–25695 (2017)

    Article  Google Scholar 

Download references

Acknowledgment

This research was partially supported by National Key Research and Development Program of China (No. 2017YFB0802300), the National Natural Science Foundation of China (No. 61602240), Guangxi Key Laboratory of Trusted Software (No. kx201615) and Capacity Building Project for Young University Staff in Guangxi Province, Department of Education, Guangxi Province (No. ky2016YB149).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, X. et al. (2018). A Genetic Algorithm Based Technique for Outlier Detection with Fast Convergence. In: Gan, G., Li, B., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2018. Lecture Notes in Computer Science(), vol 11323. Springer, Cham. https://doi.org/10.1007/978-3-030-05090-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05090-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05089-4

  • Online ISBN: 978-3-030-05090-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics