Skip to main content

Controlled Permutations for Testing Adaptive Classifiers

  • Conference paper
Discovery Science (DS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6926))

Included in the following conference series:

Abstract

We study evaluation of online classifiers that are designed to adapt to changes in data distribution over time (concept drift). A standard procedure to evaluate such classifiers is the test-then-train, which iteratively uses the incoming instances for testing and then for updating a classifier. Comparing classifiers based on such a test risks to give biased results, since a dataset is processed only once in a fixed sequential order. Such a test concludes how well classifiers adapt when changes happen at fixed time points, while the ultimate goal is to assess how well they would adapt when changes of a similar type happen unexpectedly. To reduce the risk of biased evaluation we propose to run multiple tests with permuted data. A random permutation is not suitable, as it makes the data distribution uniform over time and destroys the adaptive learning problem. We develop three permutation techniques with theoretical control mechanisms that ensure that different distributions in data are preserved while perturbing the data order. The idea is to manipulate blocks of data keeping individual instances close together. Our permutations reduce the risk of biased evaluation by making it possible to analyze sensitivity of classifiers to variations in the data order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aldous, D., Diaconis, P.: Shuffling cards and stopping times. The American Mathematical Monthly 93(5), 333–348 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  2. Antoch, J., Huskova, M.: Permutation tests in change point analysis. Statistics and Probability Letters 53, 37–46 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  3. Atkinson, M.: Restricted permutations. Discrete Math. 195, 27–38 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bach, S., Maloof, M.: A bayesian approach to concept drift. In: Advances in Neural Information Processing Systems 23 (NIPS), pp. 127–135 (2010)

    Google Scholar 

  5. Baena-Garcia, M., del Campo-Avila, J., Fidalgo, R., Bifet, A., Gavalda, R., Morales-Bueno, R.: Early drift detection method. In: Proc. of ECML/PKDD Workshop on Knowledge Discovery from Data Streams (2006)

    Google Scholar 

  6. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. Journal of Machine Learning Research 11, 1601–1604 (2010)

    Google Scholar 

  7. Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6321, pp. 135–150. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavalda, R.: New ensemble methods for evolving data streams. In: Proc. of the 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2009), pp. 139–148 (2009)

    Google Scholar 

  9. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Diaconis, P.: Group representations in probability and statistics. Lecture Notes–Monograph Series, vol. 11. Hayward Inst. of Mathematical Statistics (1988)

    Google Scholar 

  11. Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1923 (1998)

    Article  Google Scholar 

  12. Durrett, R.: Shuffling chromosomes. J. of Theor. Probability 16(3), 725–750 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  13. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Gama, J., Sebastiao, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 329–338 (2009)

    Google Scholar 

  15. Harries, M.: Splice-2 comparative evaluation: Electricity pricing. Technical report, The University of South Wales (1999)

    Google Scholar 

  16. Ikonomovska, E., Gama, J., Dzeroski, S.: Learning model trees from evolving data streams. Data (2010)

    Google Scholar 

  17. Masud, M., Chen, Q., Khan, L., Aggarwal, C., Gao, J., Han, J., Thuraisingham, B.: Addressing concept-evolution in concept-drifting data streams. In: Proc. of the 10th IEEE Int. Conf. on Data Mining, ICDM 2010 (2010)

    Google Scholar 

  18. Ojala, M., Garriga, G.: Permutation tests for studying classifier performance. Journal of Machine Learning Research 11, 1833–1863 (2010)

    MathSciNet  MATH  Google Scholar 

  19. Pemantle, R.: Randomization time for the overhand shuffle. J. of Theoretical Probability 2(1), 37–49 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  20. Pfahringer, B., Holmes, G., Kirkby, R.: New options for hoeffding trees. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 90–99. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  21. Politis, D.: The impact of bootstrap methods on time series analysis. Statistical Science 18(2), 219–230 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  22. Schiavinotto, T., Stutzle, T.: A review of metrics on permutations for search landscape analysis. Computers and Operations Research 34(10), 3143–3153 (2007)

    Article  MATH  Google Scholar 

  23. Sorensen, K.: Distance measures based on the edit distance for permutation-type representations. Journal of Heuristics 13(1), 35–47 (2007)

    Article  Google Scholar 

  24. Welch, W.: Construction of permutation tests. Journal of the American Statistical Association 85(411), 693–698 (1990)

    Article  Google Scholar 

  25. Zliobaite, I.: Controlled permutations for testing adaptive classifiers. Technical report (2011), https://sites.google.com/site/zliobaite/permutations

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Žliobaitė, I. (2011). Controlled Permutations for Testing Adaptive Classifiers. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24477-3_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24476-6

  • Online ISBN: 978-3-642-24477-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics