Skip to main content

sGrid++: Revising Simple Grid Based Density Estimator for Mining Outlying Aspect

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2022 (WISE 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13724))

Included in the following conference series:

Abstract

In this paper, we address the problem of outlying aspect mining, which aims to identify a set of features (subspace(s) a.k.a aspect(s)) where a given data object stands out from the rest of the data. To detect the most outlying aspect of a given data object, outlying aspect mining algorithms need to compare and rank subspaces with different dimensionality. Thus, they require a fast and dimensionally unbias scoring measure. Existing measures use density or distance to compute the outlyingness of the query in each subspace. Density and distance are dimensionally bias, i.e. density decreases as the dimension of subspace increases. To make them comparable (dimensionally unbias), Z-score normalization is used in the previous works. However, to compute Z-score normalization, we need to compute the outlyingness of each data point in each subspace, which adds significant computational overhead on top of the already expensive density or distance computation.

Recently developed measure called sGrid is a simple and efficient density estimator which allows a fast systemic search. While it is efficient compared to other distance and density-based measures, it is also a dimensionally bias measure and it requires to use Z-score normalization to make it dimensionality unbiased, which makes it computationally expensive. In this paper, we propose a simpler version of sGrid called sGrid++ that is not only efficient and effective but also dimensionality unbiased. It does not require Z-score normalization. We demonstrate the effectiveness and efficiency of the proposed scoring measure in outlying aspect mining using synthetic and real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Data set is available at https://www.kaggle.com/benroshan/factors-affecting-campus-placement.

  2. 2.

    The synthetic datasets are from Keller et al. (2012) [6]. Available at https://www.ipd.kit.edu/~muellere/HiCS/.

  3. 3.

    We reported three queries only due to page limitation.

  4. 4.

    We used a state-of-the-art anomaly detection algorithm called LOF [1] to identify top k = 5 anomalies; and used them as queries.

  5. 5.

    Available at https://www.kaggle.com/benroshan/factors-affecting-campus-placement.

  6. 6.

    https://www.foxsports.com/nba/stats.

References

  1. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 93–104. Association for Computing Machinery, New York (2000). https://doi.org/10.1145/342009.335388

  2. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009). https://doi.org/10.1145/1541880.1541882

    Article  Google Scholar 

  3. Duan, L., Tang, G., Pei, J., Bailey, J., Campbell, A., Tang, C.: Mining outlying aspects on numeric data. Data Min. Knowl. Disc. 29(5), 1116–1151 (2015). https://doi.org/10.1007/s10618-014-0398-2

    Article  MathSciNet  MATH  Google Scholar 

  4. Freedman, D., Diaconis, P.: On the histogram as a density estimator: L 2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57(4), 453–476 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  5. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). https://doi.org/10.1145/1656274.1656278

    Article  Google Scholar 

  6. Keller, F., Muller, E., Bohm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 1037–1048 (2012). https://doi.org/10.1109/ICDE.2012.88

  7. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  8. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall Press, Upper Saddle River (2009)

    MATH  Google Scholar 

  9. Samariya, D., Aryal, S., Ting, K.M., Ma, J.: A new effective and efficient measure for outlying aspect mining. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2020. LNCS, vol. 12343, pp. 463–474. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62008-0_32

    Chapter  Google Scholar 

  10. Samariya, D., Ma, J.: Mining outlying aspects on healthcare data. In: Siuly, S., Wang, H., Chen, L., Guo, Y., Xing, C. (eds.) HIS 2021. LNCS, vol. 13079, pp. 160–170. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-90885-0_15

    Chapter  Google Scholar 

  11. Samariya, D., Ma, J.: A new dimensionality-unbiased score for efficient and effective outlying aspect mining. Data Sci. Eng. 7, 1–16 (2022). https://doi.org/10.1007/s41019-022-00185-5

    Article  Google Scholar 

  12. Samariya, D., Ma, J., Aryal, S.: A comprehensive survey on outlying aspect mining methods. arXiv preprint arXiv:2005.02637 (2020)

  13. Samariya, D., Thakkar, A.: A comprehensive survey of anomaly detection algorithms. Ann. Data Sci. 1–22 (2021). https://doi.org/10.1007/s40745-021-00362-9

  14. Vinh, N.X., et al.: Discovering outlying aspects in large datasets. Data Min. Knowl. Disc. 30(6), 1520–1555 (2016). https://doi.org/10.1007/s10618-016-0453-2

    Article  MathSciNet  MATH  Google Scholar 

  15. Wells, J.R., Ting, K.M.: A new simple and efficient density estimator that enables fast systematic search. Pattern Recogn. Lett. 122, 92–98 (2019). https://doi.org/10.1016/j.patrec.2018.12.020, http://www.sciencedirect.com/science/article/pii/S0167865518309371

Download references

Acknowledgments

This work is supported by Federation University Research Priority Area (RPA) scholarship, awarded to Durgesh Samariya. Dr Sunil Aryal is supported by an Air Force Office of Scientific Research (AFOSR) research grant under award number FA2386-20-1-4005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Durgesh Samariya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Samariya, D., Ma, J., Aryal, S. (2022). sGrid++: Revising Simple Grid Based Density Estimator for Mining Outlying Aspect. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham. https://doi.org/10.1007/978-3-031-20891-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20891-1_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20890-4

  • Online ISBN: 978-3-031-20891-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics