sGrid++: Revising Simple Grid Based Density Estimator for Mining Outlying Aspect

Samariya, Durgesh; Ma, Jiangang; Aryal, Sunil

doi:10.1007/978-3-031-20891-1_15

Durgesh Samariya¹²,
Jiangang Ma¹² &
Sunil Aryal¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13724))

Included in the following conference series:

International Conference on Web Information Systems Engineering

999 Accesses
2 Citations

Abstract

In this paper, we address the problem of outlying aspect mining, which aims to identify a set of features (subspace(s) a.k.a aspect(s)) where a given data object stands out from the rest of the data. To detect the most outlying aspect of a given data object, outlying aspect mining algorithms need to compare and rank subspaces with different dimensionality. Thus, they require a fast and dimensionally unbias scoring measure. Existing measures use density or distance to compute the outlyingness of the query in each subspace. Density and distance are dimensionally bias, i.e. density decreases as the dimension of subspace increases. To make them comparable (dimensionally unbias), Z-score normalization is used in the previous works. However, to compute Z-score normalization, we need to compute the outlyingness of each data point in each subspace, which adds significant computational overhead on top of the already expensive density or distance computation.

Recently developed measure called sGrid is a simple and efficient density estimator which allows a fast systemic search. While it is efficient compared to other distance and density-based measures, it is also a dimensionally bias measure and it requires to use Z-score normalization to make it dimensionality unbiased, which makes it computationally expensive. In this paper, we propose a simpler version of sGrid called sGrid++ that is not only efficient and effective but also dimensionality unbiased. It does not require Z-score normalization. We demonstrate the effectiveness and efficiency of the proposed scoring measure in outlying aspect mining using synthetic and real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Data set is available at https://www.kaggle.com/benroshan/factors-affecting-campus-placement.
2.
The synthetic datasets are from Keller et al. (2012) [6]. Available at https://www.ipd.kit.edu/~muellere/HiCS/.
3.
We reported three queries only due to page limitation.
4.
We used a state-of-the-art anomaly detection algorithm called LOF [1] to identify top k = 5 anomalies; and used them as queries.
5.
Available at https://www.kaggle.com/benroshan/factors-affecting-campus-placement.
6.
https://www.foxsports.com/nba/stats.

References

Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 93–104. Association for Computing Machinery, New York (2000). https://doi.org/10.1145/342009.335388
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009). https://doi.org/10.1145/1541880.1541882
Article Google Scholar
Duan, L., Tang, G., Pei, J., Bailey, J., Campbell, A., Tang, C.: Mining outlying aspects on numeric data. Data Min. Knowl. Disc. 29(5), 1116–1151 (2015). https://doi.org/10.1007/s10618-014-0398-2
Article MathSciNet MATH Google Scholar
Freedman, D., Diaconis, P.: On the histogram as a density estimator: L 2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57(4), 453–476 (1981)
Article MathSciNet MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). https://doi.org/10.1145/1656274.1656278
Article Google Scholar
Keller, F., Muller, E., Bohm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 1037–1048 (2012). https://doi.org/10.1109/ICDE.2012.88
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall Press, Upper Saddle River (2009)
MATH Google Scholar
Samariya, D., Aryal, S., Ting, K.M., Ma, J.: A new effective and efficient measure for outlying aspect mining. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2020. LNCS, vol. 12343, pp. 463–474. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62008-0_32
Chapter Google Scholar
Samariya, D., Ma, J.: Mining outlying aspects on healthcare data. In: Siuly, S., Wang, H., Chen, L., Guo, Y., Xing, C. (eds.) HIS 2021. LNCS, vol. 13079, pp. 160–170. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-90885-0_15
Chapter Google Scholar
Samariya, D., Ma, J.: A new dimensionality-unbiased score for efficient and effective outlying aspect mining. Data Sci. Eng. 7, 1–16 (2022). https://doi.org/10.1007/s41019-022-00185-5
Article Google Scholar
Samariya, D., Ma, J., Aryal, S.: A comprehensive survey on outlying aspect mining methods. arXiv preprint arXiv:2005.02637 (2020)
Samariya, D., Thakkar, A.: A comprehensive survey of anomaly detection algorithms. Ann. Data Sci. 1–22 (2021). https://doi.org/10.1007/s40745-021-00362-9
Vinh, N.X., et al.: Discovering outlying aspects in large datasets. Data Min. Knowl. Disc. 30(6), 1520–1555 (2016). https://doi.org/10.1007/s10618-016-0453-2
Article MathSciNet MATH Google Scholar
Wells, J.R., Ting, K.M.: A new simple and efficient density estimator that enables fast systematic search. Pattern Recogn. Lett. 122, 92–98 (2019). https://doi.org/10.1016/j.patrec.2018.12.020, http://www.sciencedirect.com/science/article/pii/S0167865518309371

Download references

Acknowledgments

This work is supported by Federation University Research Priority Area (RPA) scholarship, awarded to Durgesh Samariya. Dr Sunil Aryal is supported by an Air Force Office of Scientific Research (AFOSR) research grant under award number FA2386-20-1-4005.

Author information

Authors and Affiliations

School of Engineering, Information Technology and Physical Sciences, Federation University, Churchill, VIC, Australia
Durgesh Samariya & Jiangang Ma
School of Information Technology, Deakin University, Geelong, VIC, Australia
Sunil Aryal

Authors

Durgesh Samariya
View author publications
You can also search for this author in PubMed Google Scholar
Jiangang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Sunil Aryal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Durgesh Samariya .

Editor information

Editors and Affiliations

University of Pau and Pays de l'Adour, Anglet, France
Richard Chbeir
The University of Queensland, Brisbane, QLD, Australia
Helen Huang
Sapienza Università di Roma, Rome, Italy
Fabrizio Silvestri
Open University of Cyprus, Nicosia, Cyprus
Yannis Manolopoulos
The New Cyber Research Department, Peng Cheng Laboratory, Shenzhen, China
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Samariya, D., Ma, J., Aryal, S. (2022). sGrid++: Revising Simple Grid Based Density Estimator for Mining Outlying Aspect. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham. https://doi.org/10.1007/978-3-031-20891-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-20891-1_15
Published: 07 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20890-4
Online ISBN: 978-3-031-20891-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

sGrid++: Revising Simple Grid Based Density Estimator for Mining Outlying Aspect