Abstract
In this paper, we apply data squashing to speed up outlier detection based on boosting. One person’s noise is another person’s signal. Outlier detection is gaining increasing attention in data mining. In order to improve computational time for AdaBoost-based outlier detection, we beforehand compress a given data set based on a simplified method of BIRCH. Effectiveness of our approach in terms of detection accuracy and computational time is investigated by experiments with two real-world data sets of drug stores in Japan and an artificial data set of unlawful access to a computer network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Bay: UCI Repository of KDD databases, http://kdd.ics.uci.edu/, University of California, Department of Information and Computer Science (1999)
D. Comer: “The Ubiquitous B-Tree”, ACM Computing Surveys, Vol. 11, No. 2, pp. 121–137 (1979)
W. DuMouchel, C. Volinsky, T. Johnson, C. Cortes, and D. Pregibon: “Squashing Flat Files Flatter”, Proc. Fifth ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), pp. 6–15 (1999)
U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth: “From Data Mining to Knowledge Discovery: An Overview”, Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, pp. 1–34, Menlo Park, Calif. (1996).
Y. Freund and R. E. Schapire: “Experiments with a New Boosting Algorithm”, Proc. Thirteenth Int’l Conf. on Machine Learning (ICML), pp. 148–156 (1996)
L. Kaufman and P. J. Rousseeuw: Finding Groups in Data, Wiley, New York (1990)
E. M. Knorr and R. T. Ng: “A Unified Notion of Outliers: Properties and Computation”, Proc. Third Int’l Conf. on Knowledge Discovery and Data Mining (KDD), pp. 219–222 (1997).
E. M. Knorr and R. T. Ng: “Algorithms for Mining Distance-Based Outliers in Large Datasets”, Proc. 24th Ann. Int’l Conf. Very Large Data Bases (VLDB), pp. 392–403 (1998).
E. M. Knorr and R. T. Ng: “Finding Intensional Knowledge of Distance-Based Outliers”, Proc. 25th Ann. Int’l Conf. Very Large Data Bases (VLDB), pp. 211–222 (1999).
E. M. Knorr, R. T. Ng, and V. Tucakov: “Distance-Based Outliers: Algorithms and Applications”, VLDB Journal, Vol. 8, No. 3/4, pp. 237–253 (2000).
W. Lee, S. J. Stolfo, and K. W. Mok: “Mining Audit Data to Build Intrusion Detection Models”, Proc. Fourth Int’l Conf. on Knowledge Discovery and Data Mining (KDD), pp. 66–72 (1998)
H. Liu and H. Motoda: Feature Selection for Knowledge Discovery and Data Mining, Kluwer, Norwell, Mass. (1998)
H. Liu and Hiroshi Motoda (eds.): Instance Selection and Construction for Data Mining, Kluwer, Norwell, Mass. (2001)
T. M. Mitchell: “Machine Learning and Data Mining”, CACM, Vol. 42, No. 11, pp. 31–36 (1999).
S. Rosset, U. Murad, E. Neumann, Y. Idan, and G. Pinkas: “Discovery of Fraud Rules for Telecommunications-Challenges and Solutions”, Proc. Fifth ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (KDD), pp. 409–413 (1999)
R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee: “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods”, The Annals of Statistics, Vol. 26, No. 5, pp. 1651–1686 (1998)
R. E. Schapire: “A Brief Introduction to Boosting”, Proc. Sixteenth Int’l Joint Conf. on Artificial Intelligence (IJCAI), pp. 1401–1406 (1999)
S. Sugaya, E. Suzuki, and S. Tsumoto: “Support Vector Machines for Knowledge Discovery”, Principles of Data Mining and Knowledge Discovery, LNAI1704 (PKDD), pp. 561–567 (1999)
S. Sugaya, E. Suzuki, and S. Tsumoto: “Instance Selection Based on Support Vector Machine for Knowledge Discovery in Medical Database”, Instance Selection and Construction for Data Mining, H. Liu and H. Motoda (eds.), pp. 395–412, Kluwer, Norwell, Mass. (2001)
E. Suzuki: “Autonomous Discovery of Reliable Exception Rules”, Proc. Third Int’l Conf. on Knowledge Discovery and Data Mining (KDD), pp. 259–262 (1997).
T. Zhang, R. Ramakrishnan, and M. Livny: “BIRCH: An Efficient Data Clustering Method for Very Large Databases”, Proc. 1996 ACM SIGMOD Int’l Conf. on Management of Data, pp. 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Inatani, S., Suzuki, E. (2002). Data Squashing for Speeding Up Boosting-Based Outlier Detection. In: Hacid, MS., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds) Foundations of Intelligent Systems. ISMIS 2002. Lecture Notes in Computer Science(), vol 2366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48050-1_64
Download citation
DOI: https://doi.org/10.1007/3-540-48050-1_64
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43785-7
Online ISBN: 978-3-540-48050-1
eBook Packages: Springer Book Archive