Positive feedback loops lead to concept drift in machine learning systems

Khritankov, Anton

doi:10.1007/s10489-023-04615-3

Positive feedback loops lead to concept drift in machine learning systems

Published: 30 June 2023

Volume 53, pages 22648–22666, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Anton Khritankov ORCID: orcid.org/0000-0003-2889-9436^1,2

231 Accesses
Explore all metrics

Abstract

We have derived conditions when unintended feedback loops occur in supervised machine learning systems. In this paper, we study an important problem of discovering and measuring hidden feedback loops. Such feedback loops occur in web search, recommender systems, healthcare, predictive public policing and other systems. As a possible cause of echo chambers and filter bubbles, these feedback loops tend to produce concept drifts in user behavior. We study systems in their context of use, because both learning algorithms and user interactions are important. Then we decompose the automation bias from the use of the system into users adherence to predictions and their usage rate to derive conditions for a feedback loop to occur. We also provide estimates for the size of a concept drift caused by the loop. A series of controlled simulation experiments with real-world and synthetic data support our findings. This paper builds on our prior results and elaborates the analytical model of feedback loops, extends the experiments, and provides practical application guidelines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Ethics of Privacy in Research and Design: Principles, Practices, and Potential

Ethical Implications and Accountability of Algorithms

Article Open access 07 June 2018

Uncertainty in big data analytics: survey, opportunities, and challenges

Article Open access 04 June 2019

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Cheng L, Varshney KR, Liu H (2021) Socially responsible ai algorithms: issues, purposes, and challenges. J Artif Int Res 71:1137–1181. https://doi.org/10.1613/jair.1.12814
Article MathSciNet Google Scholar
Liu LT, Dean S, Rolf E, Simchowitz M, Hardt M, Kraus S (2019) Delayed impact of fair machine learning. (ed Kraus S) Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19, 6196–6200 (international joint conferences on artificial intelligence organization)
Hu L, Chen Y, Champin P-A, Gandon F, Médini L, (eds) A short-term intervention for long-term fairness in the labor market. (eds Champin P-A, Gandon F, Médini L) Proceedings of the (2018) world wide web conference, WWW ’18, 1389–1398 (international world wide web conferences steering committee. Republic and Canton of Geneva, CHE, p 2018
Khritankov A, Winkler D, Biffl S, Mendez D, Wimmer M, Bergsmann J (eds) Hidden feedback loops in machine learning systems: a simulation model and preliminary results. (eds Winkler D, Biffl S, Mendez D, Wimmer M, Bergsmann J) 13th International conference, SWQD 2021, Vienna, Austria, January 19–21, 2021, vol 404 of software quality: future perspectives on software engineering quality, pp 54–65 (Springer International Publishing, 2021)
Lu J et al (2019) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31(12):2346–2363. https://doi.org/10.1109/TKDE.2018.2876857
Article Google Scholar
Ensign D, Friedler SA, Neville S, Scheidegger C, Venkatasubramanian S, Friedler SA, Wilson C (eds) Runaway feedback loops in predictive policing. (eds Friedler SA, Wilson C) Proceedings of the 1st conference on fairness, accountability and transparency, vol 81 of proceedings of machine learning research, pp 160–171 (PMLR, 2018). https://proceedings.mlr.press/v81/ensign18a.html
Studer S et al (2021) Towards CRISP-ML(Q): a machine learning process model with quality assurance methodology. Mach Learn Knowl Extr 3(2):392–413
Article Google Scholar
Sculley D, Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Hidden technical debt in machine learning systems. (eds Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R) Advances in neural information processing systems, vol 28 (Curran Associates, Inc., 2015). https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf
Bosch J, Olsson HH, Crnkovic I (2021) Engineering AI systems: a research agenda, pp 1–19. Artificial intelligence paradigms for smart cyber-physical systems (IGI Global, Hershey, PA, USA)
Khritankov A (2021) Hidden loop experiments repository. https://github.com/prog-autom/hidden-demo. [Online; accessed 03-February-2021]
Lwakatare LE, Raj A, Crnkovic I, Bosch J, Olsson HH (2020) Large-scale machine learning systems in real-world industrial settings: a review of challenges and solutions. Inform Softw Technol 127:106368. https://doi.org/10.1016/j.infsof.2020.106368
Article Google Scholar
Davies HC (2018) Redefining filter bubbles as (escapable) socio-technical recursion. Socio Res Online 23(3):637–654. https://doi.org/10.1177/1360780418763824
Article Google Scholar
Spohr D (2017) Fake news and ideological polarization. Bus Inf Rev 34(3):150–160
Google Scholar
Michiels L, Leysen J, Smets A, Goethals B, Bellogin A, Boratto L, Santos OC, Ardissono L, Knijnenburg B (eds) What are filter bubbles really? A review of the conceptual and empirical work. (eds Bellogin A, Boratto L, Santos OC, Ardissono L, Knijnenburg B) Adjunct proceedings of the 30th ACM conference on user modeling, adaptation and personalization, UMAP ’22 Adjunct, pp 274–279 (Association for Computing Machinery, New York, NY, USA, 2022)
Kitchens B, Johnson SL, Gray P (2020) Understanding echo chambers and filter bubbles: the impact of social media on diversification and partisan shifts in news consumption. MIS Q 44(4):1619–1649
Article Google Scholar
Chouldechova A, Roth A (2020) A snapshot of the frontiers of fairness in machine learning. Commun ACM 63(5):82–89. https://doi.org/10.1145/3376898
Article Google Scholar
Jiang R, Chiappa S, Lattimore T, György A, Kohli P, Conitzer V, Hadfield G, Vallor S (eds) Degenerate feedback loops in recommender systems. (eds Conitzer V, Hadfield G, Vallor S) , AIES ’19, pp 383–390 (Association for Computing Machinery, New York, NY, USA, 2019)
Riquelme C, Tucker G, Snoek J, Bengio Y, LeCun Y, (eds) Deep bayesian bandits showdown: an empirical comparison of bayesian deep networks for thompson sampling. (eds Bengio Y, LeCun Y) International conference on learning representations (2018). https://openreview.net/forum?id=SyYe6k-CW
Lughofer E (2017) On-line active learning: a new paradigm to improve practical useability of data stream modeling methods. Inform Sci 415–416:356–376. https://doi.org/10.1016/j.ins.2017.06.038
Article Google Scholar
Shan J, Zhang H, Liu W, Liu Q (2019) Online active learning ensemble framework for drifted data streams. IEEE Trans Neural Netw Learning Syst 30(2):486–498. https://doi.org/10.1109/TNNLS.2018.2844332
Article Google Scholar
Gemaque RN, Costa A FJ, Giusti R, Santos EM (2020) An overview of unsupervised drift detection methods. WIREs Data Mining Knowl Discovery 10(6). https://doi.org/10.1002/widm.1381
Arpteg A, Brinne B, Crnkovic-Friis L, Bosch J, Ozkan B (ed) Software engineering challenges of deep learning. (ed Ozkan B) 2018 44th Euromicro conference on software engineering and advanced applications (SEAA), pp 50–59 (2018)
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals Stat 29(5):1189–1232. http://www.jstor.org/stable/2699986
Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manage 5(1):81–102. https://doi.org/10.1016/0095-0696(78)90006-2
Article MATH Google Scholar
Hazelwood K et al O’Conner L (ed) Applied machine learning at facebook: a datacenter infrastructure perspective. (ed O’Conner L) 2018 IEEE International symposium on high performance computer architecture (HPCA), pp 620–629 (organizationIEEE, 2018)
Shalev-Shwartz S, Shamir O, Srebro N, Sridharan K (2010) Learnability, stability and uniform convergence. J Mach Learning Res 11(90):2635–2670. http://jmlr.org/papers/v11/shalev-shwartz10a.html
Pedregosa F et al (2011) Scikit-learn: Machine learning in python. J Mach Learning Res 12:2825–2830
Waskom ML (2021) Seaborn: statistical data visualization. J Open Source Softw 6(60):3021. https://doi.org/10.21105/joss.03021
Raza S, Ding C (2021) News recommender system: a review of recent progress, challenges, and opportunities. Artif Intell Rev 55(1):749–800. https://doi.org/10.1007/s10462-021-10043-x
Article Google Scholar
Siebert J et al (2022) Construction of a quality model for machine learning systems. Softw Qual J 30(2):307–335
Article Google Scholar
Kuwajima H, Yasuoka H, Nakae T (2020) Engineering problems in machine learning systems. Mach Learning 109(5):1103–1126. https://doi.org/10.1007/s10994-020-05872-w
Article MathSciNet Google Scholar
ISO/IEC JTC 1/SC 42 Artificial intelligence. ISO/IEC TR 24027. Information technology – Artificial intelligence (AI) – Bias in AI systems and AI aided decision making (2021). https://www.iso.org/standard/77607.html
ISO/IEC JTC 1 Information technology Subcommittee SC 7, Software and systems engineering. ISO/IEC 25010:2011, systems and software engineering – systems and software quality requirements and evaluation (square) – system and software quality models (2011). https://www.iso.org/obp/ui/#iso:std:iso-iec:25010:ed-1:v1:en
Dietrich J, Pearce D, Stringer J, Tahir A, Blincoe K, O’Conner L (ed) Dependency versioning in the wild. (ed O’Conner L) 2019 IEEE/ACM 16th International conference on mining software repositories (MSR), pp 349–359 (2019)

Download references

Funding

The authors did not receive support from any organization for the submitted work. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

HSE University, 11, Pokrovskiy Blvd., Moscow, 109028, Russian Federation
Anton Khritankov
Moscow Institute of Physics and Technology, 9, Institutsky lane, Dolgoprudny, 141700, Russian Federation
Anton Khritankov

Authors

Anton Khritankov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anton Khritankov.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Apprendix A: Proofs of existence conditions and size of effect

1.1 A.1 Proof of Proposition 1

Proof

As per definition of the positive feedback loop we need to show that loss decreases when we do not update the predictive model.

Let us consider sample mean loss $L_{1} = \frac{1}{|\text {G}_1 |} \sum _{(x_1, y_1) \in \text {G}_1}$ $ l(y^1; f(x^1; \theta (\text {D}_0))$ on the test set $\text {G}_1$ after a single iteration of T and write down a recurrence relation for L:

$$\begin{aligned} L_{1} = L(z_{1}, f(x_{1}; \theta (\text {D}_0)) = (1-p) L_0 + p V_{1} \end{aligned}$$

where $V_{1} = s_0 + s_1 L_0$ is the expected loss for the user accepted item, and $L_0$ is the initial loss on $\text {D}_0$.

As the loss function is additive and bounded, the learning algorithm is symmetric and $\beta $-stable (3) with constant A then for the loop to exist we need at the first step $k = 1$

$$\begin{aligned} L_0 - L_1 = p(1 - s_1) L_0, \end{aligned}$$

which on average should be larger than the maximum expected change in the loss due to imperfect algorithm learning stability constrained by the constant A:

$$\begin{aligned} \mathbb {E\,}_{\text {D}_0} (L_0 - L_1)> A \cdot \mathbb {E\,}_{\text {D}_0} (L_1 + L_0) / 2 > A \, L_0. \end{aligned}$$

The expectation is taken over starting training sets $\text {D}_0 \subset X$.

Dividing by $L_0 > 0$, and transforming we get

$$\begin{aligned} p > p_0 = \frac{A}{1 - s_1}. \end{aligned}$$

$\square $

1.2 A.2 Proof of Proposition 2

Proof

If there is a feedback loop $p \ge p_0$ (4) then according to the Conjecture 1 [4], there exists a steady-state at which the feedback loop no longer proceeds, the loss stays the same $L_{inf} \ge 0$.

The composition of the test set $G_{inf}$ at steady state remains so that a fraction p of items are user decisions and the rest are unaffected $G_0$. Therefore, the expected loss at this state is

$$\begin{aligned} L_{inf} = (1 - p) L_0 + p \, V_{inf}, \end{aligned}$$

where $L_0$ is the expected loss on $G_0$. Taking $V_{inf} = s_0 + s_1 L_{inf}$, and $s_0 = 0$ we get

$$\begin{aligned} L_{inf} = L_0 \frac{1 - p}{1 - s_1 p}. \end{aligned}$$

Therefore, the size of the feedback loop is

$$\begin{aligned} L_0 - L_{inf} = L_0 \frac{p (1 - s_1)}{1 - s_1 p}. \end{aligned}$$

Note that if $p \ge p_0 = A / (1 - s_1)$ then $L_0 \ge L_{inf}$ if $0 \le s_1 \le 1 / (1 + A)$. It can be shown by induction backwards in time from the step just before steady-state that for any preceding round r the expected loss $L_r \ge L_{inf}$. $\square $

Appendix B: Examples with $R^2$ metric

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khritankov, A. Positive feedback loops lead to concept drift in machine learning systems. Appl Intell 53, 22648–22666 (2023). https://doi.org/10.1007/s10489-023-04615-3

Download citation

Accepted: 04 April 2023
Published: 30 June 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04615-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Positive feedback loops lead to concept drift in machine learning systems

Abstract

Access this article

Similar content being viewed by others

The Ethics of Privacy in Research and Design: Principles, Practices, and Potential

Ethical Implications and Accountability of Algorithms

Uncertainty in big data analytics: survey, opportunities, and challenges

Data availability

References

Funding