Skip to main content
Log in

Controlled permutations for testing adaptive learning models

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We study evaluation of supervised learning models that adapt to changing data distribution over time (concept drift). The standard testing procedure that simulates online arrival of data (test-then-train) may not be sufficient to generalize about the performance, since that single test concludes how well a model adapts to this fixed configuration of changes, while the ultimate goal is to assess the adaptation to changes that happen unexpectedly. We propose a methodology for obtaining datasets for multiple tests by permuting the order of the original data. A random permutation is not suitable, as it makes the data distribution uniform over time and destroys the adaptive learning task. Therefore, we propose three controlled permutation techniques that make it possible to acquire new datasets by introducing restricted variations in the order of examples. The control mechanisms with theoretical guarantees of preserving distributions ensure that the new sets represent close variations of the original learning task. Complementary tests on such sets allow to analyze sensitivity of the performance to variations in how changes happen and this way enrich the assessment of adaptive supervised learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Available at https://sites.google.com/site/zliobaite/permutations

References

  1. Aldous D, Diaconis P (1986) Shuffling cards and stopping times. Am Math Mon 93(5):333–348

    Article  MATH  MathSciNet  Google Scholar 

  2. Antoch J, Huskova M (2001) Permutation tests in change point analysis. Stat Probab Lett 53:37–46

    Article  MATH  MathSciNet  Google Scholar 

  3. Atkinson M (1999) Restricted permutations. Discret Math 195:27–38

    Article  MATH  Google Scholar 

  4. Baena-Garcia M, del Campo-Avila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Proceedings of ECML PKDD workshop on knowledge discovery from Data Streams, p 7786

  5. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  6. Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavalda R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 139–148

  7. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  8. Diaconis P (1988) Group representations in probability and statistics, vol 11 of Lecture notes-monograph series. Hayward Institute of Mathematical Statistics

  9. Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923

    Article  Google Scholar 

  10. Durrett R (2003) Shuffling chromosomes. J Theor Probab 16(3):725–750

    Article  MATH  MathSciNet  Google Scholar 

  11. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings of Brazilian symposium on artificial intelligence (SBIA), pp 286–295

  12. Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 329–338

  13. Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Technical report, The University of South Wales

  14. Ikonomovska E, Gama J, Dzeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Discov 23(1):128–168

    Article  MATH  MathSciNet  Google Scholar 

  15. Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22:371–391

    Article  Google Scholar 

  16. Kolter J, Maloof M (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790

    Google Scholar 

  17. Ojala M, Garriga G (2010) Permutation tests for studying classifier performance. J Mach Learn Res 11:1833–1863

    MATH  MathSciNet  Google Scholar 

  18. Pemantle R (1989) Randomization time for the overhand shuffle. J Theor Probab 2(1):37–49

    Article  MATH  MathSciNet  Google Scholar 

  19. Pfahringer B, Holmes G, Kirkby R (2007) New options for hoeffding trees. In: Proceedings of the 20th Australian joint conference on advances in artificial intelligence (AJCAAI), pp 90–99

  20. Politis D (2003) The impact of bootstrap methods on time series analysis. Stat Sci 18(2):219–230

    Article  MathSciNet  Google Scholar 

  21. Schiavinotto T, Stutzle T (2007) A review of metrics on permutations for search landscape analysis. Comput Oper Res 34(10):3143–3153

    Article  MATH  Google Scholar 

  22. Sorensen K (2007) Distance measures based on the edit distance for permutation-type representations. J Heuristics 13(1):35–47

    Google Scholar 

  23. Welch W (1990) Construction of permutation tests. J Am Stat Assoc 85(411):693–698

    Article  Google Scholar 

  24. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

    Google Scholar 

  25. Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Los Altos, CA

    Google Scholar 

  26. Wozniak M (2011) A hybrid decision tree training method using data streams. Knowl Inf Syst 29(2):335–347

    Article  Google Scholar 

  27. Vlachos M, Yu P, Castelli V, Meek Ch (2006) Structural periodic measures for time-series data. Data Min Knowl Discov 12:1–28

    Article  MathSciNet  Google Scholar 

  28. Zliobaite I (2011) Combining similarity in time and space for training set formation under concept drift. Intell Data Anal 15(4):589–611

    Google Scholar 

  29. Zliobaite I (2011) Controlled permutations for testing adaptive classifiers. In: Proceedings of the 14th international conference discovery science (DS), pp 365–379

Download references

Acknowledgments

The research leading to these results has received funding from the European Commission within the Marie Curie Industry and Academia Partnerships and Pathways (IAPP) programme under grant agreement no. 251617.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Indrė Žliobaitė.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Žliobaitė, I. Controlled permutations for testing adaptive learning models. Knowl Inf Syst 39, 565–578 (2014). https://doi.org/10.1007/s10115-013-0629-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0629-7

Keywords

Navigation