On Issues of Instance Selection

Liu, Huan; Motoda, Hiroshi

doi:10.1023/A:1014056429969

On Issues of Instance Selection

Published: April 2002

Volume 6, pages 115–130, (2002)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Huan Liu¹ &
Hiroshi Motoda²

494 Accesses
134 Citations
Explore all metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aha, D. (Ed.). 1997. Lazy Learning. Dordrecht: Kluwer Academic Publishers.
Google Scholar
Aha, D.W., Kibler, D., and Albert, M.K. 1991. Instance-based learning algorithms. Machine Learning 6:37–66.
Google Scholar
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Morden Information Retrieval. New York: Addison Wesley and ACM Press.
Google Scholar
Bloedorn, E. and Michalski, R. 1998. Data-Driven Constructive Induction: A Methodology and Its Applications. In Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers, pp. 51–68.
Google Scholar
Blum, A. and Langley, P. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.
Google Scholar
Bradley, P., Fayyad, U., and Reina, C. 1998. Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery & Data Mining, pp. 9–15.
Breiman, L. and Friedman, J. 1984. Tool for large data set analysis. In Statistical Signal Processing, E. Wegman and J. Smith (Eds.). New York: M. Dekker, pp. 191–197.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey: CA.
Google Scholar
Brighton, H. and Mellish, C. 2002. Advances in instance selection for instance-based learning. Data Mining and Knowledge Disovery, An International Journal, 6(2):153–172.
Google Scholar
Brodley, C.E. 1995. Recursive automatic bias selection for classifier construction. Machine Learning, 20(1/2): 63–94.
Google Scholar
Burges, C. 1998. A tutorial on support vector machines. Journal of Data Mining and Knowledge Discovery, 2:121–167.
Google Scholar
Chang, C. 1974. Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, C-23.
Chaudhuri, S., Motwani, R., and Narasayya, V. 1998. Random sampling for histogram construction: How much is enough? In Proceedings of ACM SIGMOD, International Conference on Management of Data, L. Haas and A. Tiwary (Eds.). New York: ACM, pp. 436–447.
Google Scholar
Cochran, W. 1977. Sampling Techniques. New York: John Wiley & Sons.
Google Scholar
Cohn, D., Atlas, L., and Ladner, R. 1994. Improving generalization with active learning. Machine Learning, 15:201–221.
Google Scholar
Cohn, D., Ghahramani, Z., and Jordan, M. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129–145.
Google Scholar
Cover, T. and Hart, P. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, IT-13:21–27.
Google Scholar
Cover, T.M. and Thomas, J.A. 1991. Elements of Information Theory. New York: Wiley.
Google Scholar
Devlin, B. 1997. Data Warehouse from Architecture to Implementations. Reading, MA: Addison Wesley Longman, Inc.
Google Scholar
Domingo, C., Gavaldà, R., and Watanabe, O. 2002. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Disovery, An International Journal, 6(2):131–152.
Google Scholar
DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., and Pregibon, D. 1999. Squashing flat files flatter. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining.
Everitt, B. 1974. Cluster Analysis. London: Heinemann.
Google Scholar
Fayyad, U. and Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.). Menlo Park, CA: AAAI Press/The MIT Press, pp. 495–515.
Google Scholar
Fisher, D. 1987. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.
Google Scholar
Freund, Y. 1994. Sifting informative examples from a random source. In Advances in Neural Information Processing Systems, pp. 85–89.
Freund, Y. 1995. Boosting a weak learning algorithm by majority algorithm. Information and Computation, 121(2):256–285.
Google Scholar
Freund, Y. and Schapire, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer Systems and Science, 55(1):119–139.
Google Scholar
Harris-Jones, C. and Haines, T.L. 1997. Sample size and misclassification: Is more always better? Working Paper AMSCAT-WP-97-118, AMS Center for Advanced Technologies.
Hussain, F., Liu, H., Tan, C., and Dash, M. 1999. Discretization: An enabling technique. Technical Report: TRC6/99, School of Computing, National University of Singapore.
Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of 10th European Conference on Machine Learning, C. Nedellec and C. Rouveirol (Eds.). Chemnitz, Germany, pp. 137–142.
Kivinen, J. and Mannila, H. 1994. The power of sampling in knowledge discovery. In SIGMOD/PODS' 94, pp. 77–85.
Langley, P. 1996. Elements of Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Lewis, D. and Catlett, J. 1994. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh Conference on Machine Learning, pp. 148–156.
Lewis, D. and Gale, W. 1994. A sequential algorithm for training text classifiers. In Proceedings of the Seventeenth Annual ACM-SIGR Conference on Research and Development in Information Retrieval, pp. 3–12.
Liu, H. and Motoda, H. (Eds.). 1998a. Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers.
Google Scholar
Liu, H. and Motoda, H. 1998b. Feature Selection for Knowledge Discovery Data Mining. Boston: Kluwer Academic Publishers.
Google Scholar
Madigan, D., Raghavan, N., DuMouchel, W., Nason, M., Posse, C., and Ridgeway, G. 2002. Liklihood-based data squashing: A modeling approach to instance construction. Data Mining and Knowledge Discovery, An International Journal, 6(2):173–190.
Google Scholar
McCallum, A. and Nigam, K. 1998. Employing EM in pool-based active learning for text classification. In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 350–358.
Mitchell, T. 1997 Machine Learning. New York: McGraw-Hill.
Google Scholar
Piatetsky-Shapiro, G. and Connell, C. 1984. Accurate estimate of the number of tuples satisfying a condition. In ACM SIGMOD Conference, pp. 256–276.
Provost, F., Jensen, D., and Oates, T. 1999. Efficient progressive sampling. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining.
Provost, F. and Kolluri, V. 1999. A survey of methods for scaling up inductive algorithms. Journal of Data Mining and Knowledge Discovery, 3:131–169.
Google Scholar
Quinlan, J. 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Reinartz, T. 1999. Focusing Solutions for Data Mining. New York: Springer. LNAI 1623.
Google Scholar
Reinartz, T. 2002. A unifying view on instance selection. Data Mining and Knowledge Disovery, An International Journal, 6(2):191–210.
Google Scholar
Schapire, R. 1990. The strength of weak learnability. Machine Learning, 5(2):197–227.
Google Scholar
Scholkopf, B., Burges, C., and Vapnik, V. 1995. Extracting support data for a given task. In Proceedings of the First International Conference on Knowledge Discvoery and Data Mining, U. Fayyad and R. Uthurusamy (Eds.). pp. 252–257.
Seung, H., Opper, M., and Sompolinsky, H. 1992. Query by committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 287–294.
Smith, P. 1998. Into Statistics. Singapore: Springer-Verlag.
Google Scholar
Syed, N., Liu, H., and Sung, K. 1999a. Handling concept drifts in incremental learning with support vector machines. In Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, New York, S. Chaudhuri and D. Madigan (Eds.). pp. 317–321.
Syed, N., Liu, H., and Sung, K. 1999b. A study of support vectors on model independent example selection. In Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, New York, S. Chaudhuri and D. Madigan (Eds.). pp. 272–276.
Szalay, A. and Gray, J. 1999. Drowning in data. Scientific American www.sciam.com/explorations/1999/.
Utogoff, P. 1989. Incremental induction of decision trees. Machine Learning, 4:161–186.
Google Scholar
Valiant, L. 1984. A theory of the learnable. Communications of the Association for Computing Machinery, 27:1134–1142.
Google Scholar
Vapnik, V. 1995. The Nature of Statistical Learning Theory. New York: Springer-Verlag.
Google Scholar
Weiss, S. and Indurkhya, N. 1998. Predictive Data Mining. San Francisco, California: Morgan Kaufmann.
Google Scholar
Weiss, S. and Kulikowski, C. 1991. Computer Systems That Learn. San Mateo, California: Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Arizona State University, Tempe, Arizona, USA
Huan Liu
Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan
Hiroshi Motoda

Authors

Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Motoda
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Motoda, H. On Issues of Instance Selection. Data Mining and Knowledge Discovery 6, 115–130 (2002). https://doi.org/10.1023/A:1014056429969

Download citation

Issue Date: April 2002
DOI: https://doi.org/10.1023/A:1014056429969

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Issues of Instance Selection

Access this article

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation