Abstract
We consider a new problem of detecting members of a rare class of data, the needles, which have been hidden in a set of records, the haystack. The only information regarding the characterization of the rare class is a single instance of a needle. It is assumed that members of the needle class are similar to each other according to an unknown needle characterization. The goal is to find the needle records hidden in the haystack. This paper describes an algorithm for that task and applies it to several example cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abidi, S., Hoe, K.: Symbolic exposition of medical data-sets: A data mining workbench to inductively derive data-defining symbolic rules. In: Proceedings of the 15th IEEE Symposium on Computer-based Medical Systems (CBMS 2002) (2002)
Aggarwal, C., Yu, P.: Outlier detection for high dimensional data. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (2001)
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (1993)
An, A., Cercone, N.: Discretization of continuous attributes for learning classification rules. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS, vol. 1574, pp. 509–514. Springer, Heidelberg (1999)
Bay, S., Pazzani, M.: Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5, 213–246 (2001)
Boros, E., Hammer, P., Ibaraki, T., Kogan, A.: A logical analysis of numerical data. Mathematical Programming 79, 163–190 (1997)
Boros, E., Hammer, P., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering 12, 292–306 (2000)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482. Springer, Heidelberg (1991)
Cohen, W.W.: Fast effective rule induction. In: Machine Learning: Proceedings of the Twelfth International Conference (1995)
Cohen, W.W., Singer, Y.: A simple, fast, and effective rule learner. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (1999)
Dokas, P., Ertoz, L., Kumar, V., Lazarevic, A., Srivastava, J., Tan, P.-N.: Data mining for network intrusion detection. In: Proc. 2002 NSF Workshop on Data Mining (2002)
Felici, G., Sun, F., Truemper, K.: Learning logic formulas and related error distributions. In: Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Springer, Heidelberg (2006)
Felici, G., Truemper, K.: A MINSAT approach for learning in logic domain. INFORMS Journal of Computing 14, 20–36 (2002)
Joshi, M.V., Agarwal, R.C., Kumar, V.: Mining needle in a haystack: classifying rare classes via two-phase rule induction. In: SIGMOD 2001: Proceedings of the 2001 ACM SIGMOD international conference on Management of data, pp. 91–102 (2001)
Joshi, M.V., Kumar, V., Agarwal, R.: Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In: IEEE International Conference on Data Mining, p. 257 (2001)
Lee, W., Stolfo, S.: Real time data mining-based intrusion detection. In: Proceedings of the 7th USENIX Security Symposium (1998)
Sequeira, K., Zaki, M.: Admit: Anomaly-based data mining for intrusions. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Triantaphyllou, E.: Data Mining and Knowledge Discovery via a Novel Logic-based Approach. Springer, Heidelberg (2008)
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl. 6, 7–19 (2004)
Yan, R., Liu, Y., Jin, R., Hauptmann, A.: On predicting rare classes with svm ensembles in scene classification. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), April 2003, vol. 3, pp. III–21–III–24 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moreland, K., Truemper, K. (2009). The Needles-in-Haystack Problem. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-03070-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03069-7
Online ISBN: 978-3-642-03070-3
eBook Packages: Computer ScienceComputer Science (R0)