Skip to main content

Meta-Learning of Instance Selection for Data Summarization

  • Chapter
Meta-Learning in Computational Intelligence

Part of the book series: Studies in Computational Intelligence ((SCI,volume 358))

Abstract

The goal of instance selection is to identify which instances (examples, patterns) in a large dataset should be selected as representatives of the entire dataset, without significant loss of information. When a machine learning method is applied to the reduced dataset, the accuracy of the model should not be significantly worse than if the same method were applied to the entire dataset. The reducibility of any dataset, and hence the success of instance selection methods, surely depends on the characteristics of the dataset. However the relationship between data characteristics and the reducibility achieved by instance selection methods has not been extensively tested. This chapter adopts a meta-learning approach, via an empirical study of 112 classification datasets, to explore the relationship between data characteristics and the success of a naïve instance selection method. The approach can be readily extended to explore how the data characteristics influence the performance of many more sophisticated instance selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Blake, C., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Irvine (2002), http://www.ics.uci.edu/_mlearn/MLRepository.html

  2. Liu, H., Motoda, H.: On Issues of Instance Selection. Data Mining and Knowledge Discovery 6, 115–130 (2002)

    Article  MathSciNet  Google Scholar 

  3. Jankowski, N., Grochowski, M.: Comparison of Instances Selection Algorithms I. Algorithms Survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Reinartz, T.: A Unifying View on Instance Selection. Data Mining and Knowledge Discovery 6, 191–210 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  5. Wolpert, D.H., Macready, W.G.: No Free Lunch Theorems for Optimization. IEEE T. Evolut. Comput. 1, 67 (1997)

    Article  Google Scholar 

  6. Grochowski, M., Jankowski, N.: Comparison of Instance Selection Algorithms II. Results and Comments. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 580–585. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  7. Rice, J.R.: The Algorithm Selection Problem. Adv. Comp. 15, 65–118 (1976)

    Article  Google Scholar 

  8. Smith-Miles, K.A.: Cross-Disciplinary Perspectives On Meta-Learning For Algorithm Selection. ACM Computing Surveys 41(1), article 6 (2008)

    Google Scholar 

  9. Vilalta, R., Drissi, Y.: A Perspective View and Survey of Meta-Learning. Artif. Intell. Rev. 18, 77–95 (2002)

    Article  Google Scholar 

  10. Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York (1994)

    MATH  Google Scholar 

  11. Brazdil, P., Soares, C., Costa, J.: Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results. Mach. Learn. 50, 251–277 (2003)

    Article  MATH  Google Scholar 

  12. Ali, S., Smith, K.: On Learning Algorithm Selection for Classification. Appl. Soft Comp. 6, 119–138 (2006)

    Article  Google Scholar 

  13. Prodromidis, A.L., Chan, P., Stolfo, S.J.: Meta-learning in distributed data mining systems: issues and approaches. In: Kargupta, H., Chan, P. (eds.) Advances of Distributed Data Mining. AAAI Press, Menlo Park (2000)

    Google Scholar 

  14. Bernstein, A., Provost, F., Hill, S.: Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering 17, 503–518 (2005)

    Article  Google Scholar 

  15. Charest, M., Delisle, S., Cervantes, O., Shen, Y.: Bridging the gap between data mining and decision support: A case-based reasoning and ontology approach. Intelligent Data Analysis 12, 211–236 (2008)

    Google Scholar 

  16. Wilson, D.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics 2, 408–421 (1972)

    Article  MATH  Google Scholar 

  17. Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics 6, 448–452 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  18. Jankowski, N.: Data regularization. In: Rutkowski, L., Tadeusiewicz, R. (eds.) Neural Networks and Soft Computing, Zakopane, Poland, pp. 209–214 (2000)

    Google Scholar 

  19. Hart, P.E.: The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14, 515–516 (1968)

    Article  Google Scholar 

  20. Gates, G.: The reduced nearest neighbor rule. IEEE Transactions on Information Theory 18, 431–433 (1972)

    Article  Google Scholar 

  21. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)

    Google Scholar 

  22. Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery 6, 153–172 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  23. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38, 257–286 (2000)

    Article  MATH  Google Scholar 

  24. Kohonen, T.: Learning Vector Quantization. Neural Networks 1, 303 (1988)

    Article  Google Scholar 

  25. Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: International Conference on Machine Learning, pp. 293–301 (1994)

    Google Scholar 

  26. Sen, S., Knight, L.: A Genetic Prototype Learner. In: Mellish, C.S. (ed.) Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Quebec, Canada, August 20-25, vol. I, pp. 725–731. Morgan Kaufmann, San Mateo (1995)

    Google Scholar 

  27. Madigan, D., Raghavan, N., DuMouchel, W., Nason, M., Posse, C., Ridgeway, G.: Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction. Data Mining and Knowledge Discovery 6, 173–190 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  28. Li, B., Chi, M., Fan, J., Xue, X.: Support Cluster Machine. In: 25th International Conference on Machine Learning. Morgan Kaufmann, San Francisco (2007)

    Google Scholar 

  29. Evans, R.: Clustering for Classification: Using Standard Clustering Methods to Summarise Datasets with Minimal Loss of Classification Accuracy. VDM Verlag (2008)

    Google Scholar 

  30. Li, X.: Data reduction via Adaptive Sampling. Communications in Information and Systems 2, 5–38 (2002)

    Google Scholar 

  31. Domingo, C., Gavaldà, R., Watanabe, O.: Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms. Data Mining and Knowledge Discovery 6, 131–152 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  32. Hartigan, J.A.: Clustering Algorithms. John Wiley & Sons, Inc., New York (1975)

    MATH  Google Scholar 

  33. Marchiori, E.: Hit Miss Networks with Applications to Instance Selection. Journal of Machine Learning Research 9, 997–1017 (2008)

    MathSciNet  Google Scholar 

  34. Ali, S., Smith-Miles, K.A.: A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing 70, 173–186 (2006)

    Article  Google Scholar 

  35. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  36. Hathaway, R.J., Bezdek, J.C.: Visual cluster validity for prototype generator clustering models. Pattern Recognition Letters 24, 1563–1569 (2003)

    Article  MATH  Google Scholar 

  37. Pfahringer, B., Bensusan, H., Giraud-Carrier, C.G.: Meta-Learning by Landmarking Various Learning Algorithms. In: Proc. ICML, pp. 74–750 (2000)

    Google Scholar 

  38. Smith, K.A.: Neural Networks for Prediction and Classification. In: Wang, J(ed.), Encyclopaedia of Data Warehousing And Mining, vol. 2, pp. 864–869. Information Science Publishing, Hershey PA (2006)

    Google Scholar 

  39. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  40. Kohonen, T.: Self-Organized Formation of Topologically Correct Feature Maps. Biol. Cyber. 43, 59–69 (1982)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Smith-Miles, K.A., Islam, R.M.D. (2011). Meta-Learning of Instance Selection for Data Summarization. In: Jankowski, N., Duch, W., Gra̧bczewski, K. (eds) Meta-Learning in Computational Intelligence. Studies in Computational Intelligence, vol 358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20980-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20980-2_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20979-6

  • Online ISBN: 978-3-642-20980-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics