Meta-Learning of Instance Selection for Data Summarization

Smith-Miles, Kate A.; Islam, Rafiqul M. D.

doi:10.1007/978-3-642-20980-2_2

Kate A. Smith-Miles³ &
Rafiqul M. D. Islam⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 358))

1279 Accesses

Abstract

The goal of instance selection is to identify which instances (examples, patterns) in a large dataset should be selected as representatives of the entire dataset, without significant loss of information. When a machine learning method is applied to the reduced dataset, the accuracy of the model should not be significantly worse than if the same method were applied to the entire dataset. The reducibility of any dataset, and hence the success of instance selection methods, surely depends on the characteristics of the dataset. However the relationship between data characteristics and the reducibility achieved by instance selection methods has not been extensively tested. This chapter adopts a meta-learning approach, via an empirical study of 112 classification datasets, to explore the relationship between data characteristics and the success of a naïve instance selection method. The approach can be readily extended to explore how the data characteristics influence the performance of many more sophisticated instance selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Instance selection improves geometric mean accuracy: a study on imbalanced data classification

Article 06 February 2019

Bi-criteria Data Reduction for Instance-Based Classification

An Efficient Approach for Instance Selection

References

Blake, C., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Irvine (2002), http://www.ics.uci.edu/_mlearn/MLRepository.html
Liu, H., Motoda, H.: On Issues of Instance Selection. Data Mining and Knowledge Discovery 6, 115–130 (2002)
Article MathSciNet Google Scholar
Jankowski, N., Grochowski, M.: Comparison of Instances Selection Algorithms I. Algorithms Survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004)
Chapter Google Scholar
Reinartz, T.: A Unifying View on Instance Selection. Data Mining and Knowledge Discovery 6, 191–210 (2002)
Article MATH MathSciNet Google Scholar
Wolpert, D.H., Macready, W.G.: No Free Lunch Theorems for Optimization. IEEE T. Evolut. Comput. 1, 67 (1997)
Article Google Scholar
Grochowski, M., Jankowski, N.: Comparison of Instance Selection Algorithms II. Results and Comments. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 580–585. Springer, Heidelberg (2004)
Chapter Google Scholar
Rice, J.R.: The Algorithm Selection Problem. Adv. Comp. 15, 65–118 (1976)
Article Google Scholar
Smith-Miles, K.A.: Cross-Disciplinary Perspectives On Meta-Learning For Algorithm Selection. ACM Computing Surveys 41(1), article 6 (2008)
Google Scholar
Vilalta, R., Drissi, Y.: A Perspective View and Survey of Meta-Learning. Artif. Intell. Rev. 18, 77–95 (2002)
Article Google Scholar
Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York (1994)
MATH Google Scholar
Brazdil, P., Soares, C., Costa, J.: Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results. Mach. Learn. 50, 251–277 (2003)
Article MATH Google Scholar
Ali, S., Smith, K.: On Learning Algorithm Selection for Classification. Appl. Soft Comp. 6, 119–138 (2006)
Article Google Scholar
Prodromidis, A.L., Chan, P., Stolfo, S.J.: Meta-learning in distributed data mining systems: issues and approaches. In: Kargupta, H., Chan, P. (eds.) Advances of Distributed Data Mining. AAAI Press, Menlo Park (2000)
Google Scholar
Bernstein, A., Provost, F., Hill, S.: Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering 17, 503–518 (2005)
Article Google Scholar
Charest, M., Delisle, S., Cervantes, O., Shen, Y.: Bridging the gap between data mining and decision support: A case-based reasoning and ontology approach. Intelligent Data Analysis 12, 211–236 (2008)
Google Scholar
Wilson, D.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics 2, 408–421 (1972)
Article MATH Google Scholar
Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics 6, 448–452 (1976)
Article MATH MathSciNet Google Scholar
Jankowski, N.: Data regularization. In: Rutkowski, L., Tadeusiewicz, R. (eds.) Neural Networks and Soft Computing, Zakopane, Poland, pp. 209–214 (2000)
Google Scholar
Hart, P.E.: The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14, 515–516 (1968)
Article Google Scholar
Gates, G.: The reduced nearest neighbor rule. IEEE Transactions on Information Theory 18, 431–433 (1972)
Article Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)
Google Scholar
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery 6, 153–172 (2002)
Article MATH MathSciNet Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38, 257–286 (2000)
Article MATH Google Scholar
Kohonen, T.: Learning Vector Quantization. Neural Networks 1, 303 (1988)
Article Google Scholar
Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: International Conference on Machine Learning, pp. 293–301 (1994)
Google Scholar
Sen, S., Knight, L.: A Genetic Prototype Learner. In: Mellish, C.S. (ed.) Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Quebec, Canada, August 20-25, vol. I, pp. 725–731. Morgan Kaufmann, San Mateo (1995)
Google Scholar
Madigan, D., Raghavan, N., DuMouchel, W., Nason, M., Posse, C., Ridgeway, G.: Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction. Data Mining and Knowledge Discovery 6, 173–190 (2002)
Article MATH MathSciNet Google Scholar
Li, B., Chi, M., Fan, J., Xue, X.: Support Cluster Machine. In: 25^th International Conference on Machine Learning. Morgan Kaufmann, San Francisco (2007)
Google Scholar
Evans, R.: Clustering for Classification: Using Standard Clustering Methods to Summarise Datasets with Minimal Loss of Classification Accuracy. VDM Verlag (2008)
Google Scholar
Li, X.: Data reduction via Adaptive Sampling. Communications in Information and Systems 2, 5–38 (2002)
Google Scholar
Domingo, C., Gavaldà, R., Watanabe, O.: Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms. Data Mining and Knowledge Discovery 6, 131–152 (2002)
Article MATH MathSciNet Google Scholar
Hartigan, J.A.: Clustering Algorithms. John Wiley & Sons, Inc., New York (1975)
MATH Google Scholar
Marchiori, E.: Hit Miss Networks with Applications to Instance Selection. Journal of Machine Learning Research 9, 997–1017 (2008)
MathSciNet Google Scholar
Ali, S., Smith-Miles, K.A.: A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing 70, 173–186 (2006)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Hathaway, R.J., Bezdek, J.C.: Visual cluster validity for prototype generator clustering models. Pattern Recognition Letters 24, 1563–1569 (2003)
Article MATH Google Scholar
Pfahringer, B., Bensusan, H., Giraud-Carrier, C.G.: Meta-Learning by Landmarking Various Learning Algorithms. In: Proc. ICML, pp. 74–750 (2000)
Google Scholar
Smith, K.A.: Neural Networks for Prediction and Classification. In: Wang, J(ed.), Encyclopaedia of Data Warehousing And Mining, vol. 2, pp. 864–869. Information Science Publishing, Hershey PA (2006)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Kohonen, T.: Self-Organized Formation of Topologically Correct Feature Maps. Biol. Cyber. 43, 59–69 (1982)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical Sciences, Monash University, VIC, 3800, Australia
Kate A. Smith-Miles
School of Information Technology, Deakin University, Burwood, VIC, 3125, Australia
Rafiqul M. D. Islam

Authors

Kate A. Smith-Miles
View author publications
You can also search for this author in PubMed Google Scholar
Rafiqul M. D. Islam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Nicolaus Copernicus University, ul. Grudzia̧dzka 5, 87-100, Toruń, Poland
Norbert Jankowski , Włodzisław Duch & Krzysztof Gra̧bczewski , &

Copyright information

About this chapter

Cite this chapter

Smith-Miles, K.A., Islam, R.M.D. (2011). Meta-Learning of Instance Selection for Data Summarization. In: Jankowski, N., Duch, W., Gra̧bczewski, K. (eds) Meta-Learning in Computational Intelligence. Studies in Computational Intelligence, vol 358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20980-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-20980-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20979-6
Online ISBN: 978-3-642-20980-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics