Abstract
Existing proposals on outlier detection didn’t take the semantic knowledge of the dataset into consideration. They only tried to find outliers from dataset itself, which prevents finding more meaningful outliers. In this paper, we consider the problem of outlier detection integrating semantic knowledge. We introduce new definition for outlier: semantic outlier. A semantic outlier is a data point, which behaves differently with other data points in the same class. A measure for identifying the degree of each object being an outlier is presented, which is called semantic outlier factor (SOF). An efficient algorithm for mining semantic outliers based on SOF is also proposed. Experimental results show that meaningful and interesting outliers can be found with our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. M. Knorr, R. T. Ng: Algorithms for Mining Distance-Based Outliers in Large Datasets. Proc. 24th Int. Conf. on Very Large Database, New York, NY, 1998, pp. 392–403.
S. Ramaswamy, R. Rastogi, S. Kyuseok: Efficient Algorithms for Mining Outliers from Large Data Sets. Proc. ACM SIGMOD 2000 Int. Conf. on Management of Data, Dallas, Texas, 2000.
M. M. Breunig, H. P. Kriegel, R. T. Ng, J. Sander: LOF: Identifying Density-Based Local Outliers”. Proc. ACM SIGMOD 2000 Int. Conf. on Management of Data, Dallas, Texas, 2000.
C. Aggarwal, P. Yu: Outlier Detection for High Dimensional Data. Proc. of the 2001 ACM SIGMOD Int’ 1 Conf. Management of Data, pp. 37–46, Santa Barbara, CA, USA.
Z. He, S. Deng and X. Xu: Squeezer: An Efficient Algorithm for Clustering Categorical Data. Technical Report, HIT, 2001. http://202.118.239.67/tech/squeezer.pdf To appear in Journal of Computer Science and Technology.
C. J. Merz, Murphy: UCI Repository of Machine Learning Databases. (http://www.ics.uci.edu/~mlearn/MLRRepository.html).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
He, Z., Deng, S., Xu, X. (2002). Outlier Detection Integrating Semantic Knowledge. In: Meng, X., Su, J., Wang, Y. (eds) Advances in Web-Age Information Management. WAIM 2002. Lecture Notes in Computer Science, vol 2419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45703-8_12
Download citation
DOI: https://doi.org/10.1007/3-540-45703-8_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44045-1
Online ISBN: 978-3-540-45703-9
eBook Packages: Springer Book Archive