Skip to main content
Log in

On Issues of Instance Selection

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  • Aha, D. (Ed.). 1997. Lazy Learning. Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  • Aha, D.W., Kibler, D., and Albert, M.K. 1991. Instance-based learning algorithms. Machine Learning 6:37–66.

    Google Scholar 

  • Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Morden Information Retrieval. New York: Addison Wesley and ACM Press.

    Google Scholar 

  • Bloedorn, E. and Michalski, R. 1998. Data-Driven Constructive Induction: A Methodology and Its Applications. In Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers, pp. 51–68.

    Google Scholar 

  • Blum, A. and Langley, P. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.

    Google Scholar 

  • Bradley, P., Fayyad, U., and Reina, C. 1998. Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery & Data Mining, pp. 9–15.

  • Breiman, L. and Friedman, J. 1984. Tool for large data set analysis. In Statistical Signal Processing, E. Wegman and J. Smith (Eds.). New York: M. Dekker, pp. 191–197.

    Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey: CA.

    Google Scholar 

  • Brighton, H. and Mellish, C. 2002. Advances in instance selection for instance-based learning. Data Mining and Knowledge Disovery, An International Journal, 6(2):153–172.

    Google Scholar 

  • Brodley, C.E. 1995. Recursive automatic bias selection for classifier construction. Machine Learning, 20(1/2): 63–94.

    Google Scholar 

  • Burges, C. 1998. A tutorial on support vector machines. Journal of Data Mining and Knowledge Discovery, 2:121–167.

    Google Scholar 

  • Chang, C. 1974. Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, C-23.

  • Chaudhuri, S., Motwani, R., and Narasayya, V. 1998. Random sampling for histogram construction: How much is enough? In Proceedings of ACM SIGMOD, International Conference on Management of Data, L. Haas and A. Tiwary (Eds.). New York: ACM, pp. 436–447.

    Google Scholar 

  • Cochran, W. 1977. Sampling Techniques. New York: John Wiley & Sons.

    Google Scholar 

  • Cohn, D., Atlas, L., and Ladner, R. 1994. Improving generalization with active learning. Machine Learning, 15:201–221.

    Google Scholar 

  • Cohn, D., Ghahramani, Z., and Jordan, M. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129–145.

    Google Scholar 

  • Cover, T. and Hart, P. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, IT-13:21–27.

    Google Scholar 

  • Cover, T.M. and Thomas, J.A. 1991. Elements of Information Theory. New York: Wiley.

    Google Scholar 

  • Devlin, B. 1997. Data Warehouse from Architecture to Implementations. Reading, MA: Addison Wesley Longman, Inc.

    Google Scholar 

  • Domingo, C., Gavaldà, R., and Watanabe, O. 2002. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Disovery, An International Journal, 6(2):131–152.

    Google Scholar 

  • DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., and Pregibon, D. 1999. Squashing flat files flatter. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining.

  • Everitt, B. 1974. Cluster Analysis. London: Heinemann.

    Google Scholar 

  • Fayyad, U. and Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027.

  • Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.). Menlo Park, CA: AAAI Press/The MIT Press, pp. 495–515.

    Google Scholar 

  • Fisher, D. 1987. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.

    Google Scholar 

  • Freund, Y. 1994. Sifting informative examples from a random source. In Advances in Neural Information Processing Systems, pp. 85–89.

  • Freund, Y. 1995. Boosting a weak learning algorithm by majority algorithm. Information and Computation, 121(2):256–285.

    Google Scholar 

  • Freund, Y. and Schapire, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer Systems and Science, 55(1):119–139.

    Google Scholar 

  • Harris-Jones, C. and Haines, T.L. 1997. Sample size and misclassification: Is more always better? Working Paper AMSCAT-WP-97-118, AMS Center for Advanced Technologies.

  • Hussain, F., Liu, H., Tan, C., and Dash, M. 1999. Discretization: An enabling technique. Technical Report: TRC6/99, School of Computing, National University of Singapore.

  • Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of 10th European Conference on Machine Learning, C. Nedellec and C. Rouveirol (Eds.). Chemnitz, Germany, pp. 137–142.

  • Kivinen, J. and Mannila, H. 1994. The power of sampling in knowledge discovery. In SIGMOD/PODS' 94, pp. 77–85.

  • Langley, P. 1996. Elements of Machine Learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Lewis, D. and Catlett, J. 1994. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh Conference on Machine Learning, pp. 148–156.

  • Lewis, D. and Gale, W. 1994. A sequential algorithm for training text classifiers. In Proceedings of the Seventeenth Annual ACM-SIGR Conference on Research and Development in Information Retrieval, pp. 3–12.

  • Liu, H. and Motoda, H. (Eds.). 1998a. Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers.

    Google Scholar 

  • Liu, H. and Motoda, H. 1998b. Feature Selection for Knowledge Discovery Data Mining. Boston: Kluwer Academic Publishers.

    Google Scholar 

  • Madigan, D., Raghavan, N., DuMouchel, W., Nason, M., Posse, C., and Ridgeway, G. 2002. Liklihood-based data squashing: A modeling approach to instance construction. Data Mining and Knowledge Discovery, An International Journal, 6(2):173–190.

    Google Scholar 

  • McCallum, A. and Nigam, K. 1998. Employing EM in pool-based active learning for text classification. In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 350–358.

  • Mitchell, T. 1997 Machine Learning. New York: McGraw-Hill.

    Google Scholar 

  • Piatetsky-Shapiro, G. and Connell, C. 1984. Accurate estimate of the number of tuples satisfying a condition. In ACM SIGMOD Conference, pp. 256–276.

  • Provost, F., Jensen, D., and Oates, T. 1999. Efficient progressive sampling. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining.

  • Provost, F. and Kolluri, V. 1999. A survey of methods for scaling up inductive algorithms. Journal of Data Mining and Knowledge Discovery, 3:131–169.

    Google Scholar 

  • Quinlan, J. 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Reinartz, T. 1999. Focusing Solutions for Data Mining. New York: Springer. LNAI 1623.

    Google Scholar 

  • Reinartz, T. 2002. A unifying view on instance selection. Data Mining and Knowledge Disovery, An International Journal, 6(2):191–210.

    Google Scholar 

  • Schapire, R. 1990. The strength of weak learnability. Machine Learning, 5(2):197–227.

    Google Scholar 

  • Scholkopf, B., Burges, C., and Vapnik, V. 1995. Extracting support data for a given task. In Proceedings of the First International Conference on Knowledge Discvoery and Data Mining, U. Fayyad and R. Uthurusamy (Eds.). pp. 252–257.

  • Seung, H., Opper, M., and Sompolinsky, H. 1992. Query by committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 287–294.

  • Smith, P. 1998. Into Statistics. Singapore: Springer-Verlag.

    Google Scholar 

  • Syed, N., Liu, H., and Sung, K. 1999a. Handling concept drifts in incremental learning with support vector machines. In Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, New York, S. Chaudhuri and D. Madigan (Eds.). pp. 317–321.

  • Syed, N., Liu, H., and Sung, K. 1999b. A study of support vectors on model independent example selection. In Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, New York, S. Chaudhuri and D. Madigan (Eds.). pp. 272–276.

  • Szalay, A. and Gray, J. 1999. Drowning in data. Scientific American www.sciam.com/explorations/1999/.

  • Utogoff, P. 1989. Incremental induction of decision trees. Machine Learning, 4:161–186.

    Google Scholar 

  • Valiant, L. 1984. A theory of the learnable. Communications of the Association for Computing Machinery, 27:1134–1142.

    Google Scholar 

  • Vapnik, V. 1995. The Nature of Statistical Learning Theory. New York: Springer-Verlag.

    Google Scholar 

  • Weiss, S. and Indurkhya, N. 1998. Predictive Data Mining. San Francisco, California: Morgan Kaufmann.

    Google Scholar 

  • Weiss, S. and Kulikowski, C. 1991. Computer Systems That Learn. San Mateo, California: Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Motoda, H. On Issues of Instance Selection. Data Mining and Knowledge Discovery 6, 115–130 (2002). https://doi.org/10.1023/A:1014056429969

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1014056429969

Keywords

Navigation