Skip to main content

The Needles-in-Haystack Problem

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5632))

  • 2422 Accesses

Abstract

We consider a new problem of detecting members of a rare class of data, the needles, which have been hidden in a set of records, the haystack. The only information regarding the characterization of the rare class is a single instance of a needle. It is assumed that members of the needle class are similar to each other according to an unknown needle characterization. The goal is to find the needle records hidden in the haystack. This paper describes an algorithm for that task and applies it to several example cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abidi, S., Hoe, K.: Symbolic exposition of medical data-sets: A data mining workbench to inductively derive data-defining symbolic rules. In: Proceedings of the 15th IEEE Symposium on Computer-based Medical Systems (CBMS 2002) (2002)

    Google Scholar 

  2. Aggarwal, C., Yu, P.: Outlier detection for high dimensional data. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (2001)

    Google Scholar 

  3. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (1993)

    Google Scholar 

  4. An, A., Cercone, N.: Discretization of continuous attributes for learning classification rules. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS, vol. 1574, pp. 509–514. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  5. Bay, S., Pazzani, M.: Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5, 213–246 (2001)

    Article  MATH  Google Scholar 

  6. Boros, E., Hammer, P., Ibaraki, T., Kogan, A.: A logical analysis of numerical data. Mathematical Programming 79, 163–190 (1997)

    MathSciNet  MATH  Google Scholar 

  7. Boros, E., Hammer, P., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering 12, 292–306 (2000)

    Article  Google Scholar 

  8. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  9. Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482. Springer, Heidelberg (1991)

    Google Scholar 

  10. Cohen, W.W.: Fast effective rule induction. In: Machine Learning: Proceedings of the Twelfth International Conference (1995)

    Google Scholar 

  11. Cohen, W.W., Singer, Y.: A simple, fast, and effective rule learner. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (1999)

    Google Scholar 

  12. Dokas, P., Ertoz, L., Kumar, V., Lazarevic, A., Srivastava, J., Tan, P.-N.: Data mining for network intrusion detection. In: Proc. 2002 NSF Workshop on Data Mining (2002)

    Google Scholar 

  13. Felici, G., Sun, F., Truemper, K.: Learning logic formulas and related error distributions. In: Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Springer, Heidelberg (2006)

    Google Scholar 

  14. Felici, G., Truemper, K.: A MINSAT approach for learning in logic domain. INFORMS Journal of Computing 14, 20–36 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  15. Joshi, M.V., Agarwal, R.C., Kumar, V.: Mining needle in a haystack: classifying rare classes via two-phase rule induction. In: SIGMOD 2001: Proceedings of the 2001 ACM SIGMOD international conference on Management of data, pp. 91–102 (2001)

    Google Scholar 

  16. Joshi, M.V., Kumar, V., Agarwal, R.: Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In: IEEE International Conference on Data Mining, p. 257 (2001)

    Google Scholar 

  17. Lee, W., Stolfo, S.: Real time data mining-based intrusion detection. In: Proceedings of the 7th USENIX Security Symposium (1998)

    Google Scholar 

  18. Sequeira, K., Zaki, M.: Admit: Anomaly-based data mining for intrusions. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

    Google Scholar 

  19. Triantaphyllou, E.: Data Mining and Knowledge Discovery via a Novel Logic-based Approach. Springer, Heidelberg (2008)

    Google Scholar 

  20. Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl. 6, 7–19 (2004)

    Article  Google Scholar 

  21. Yan, R., Liu, Y., Jin, R., Hauptmann, A.: On predicting rare classes with svm ensembles in scene classification. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), April 2003, vol. 3, pp. III–21–III–24 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moreland, K., Truemper, K. (2009). The Needles-in-Haystack Problem. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03070-3_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03069-7

  • Online ISBN: 978-3-642-03070-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics