Skip to main content

A Fair Post-processing Method Based on the MADD Metric for Predictive Student Models

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2023)

Abstract

Predictive student models are increasingly used in learning environments. However, due to the rising social impact of their usage, it is now all the more important for these models to be both sufficiently accurate and fair in their predictions. To evaluate algorithmic fairness, a new metric has been developed in education, namely the Model Absolute Density Distance (MADD). This metric enables us to measure how different a predictive model behaves regarding two groups of students, in order to quantify its algorithmic unfairness. In this paper, we thus develop a post-processing method based on this metric, that aims at improving the fairness while preserving the accuracy of relevant predictive models’ results. We experiment with our approach on the task of predicting student success in an online course, using both simulated and real-world educational data, and obtain successful results. Our source code and data are in open access at https://github.com/melinaverger/MADD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Here, not a mathematical function, but a programming function. See details at https://github.com/melinaverger/MADD.

  2. 2.

    \(\overline{y}_i\) corresponds to the new predictions (1 or 0) obtained thanks to the new \(\overline{p}_{i}^{(\lambda )}\) thresholded with the classification threshold parameter t we primarily set at 0.5 (i.e. \(\overline{y}_i = 0\) when \(\overline{p}_{i}^{(\lambda )} < 0.5\), \(\overline{y}_i = 1\) otherwise)..

References

  1. Castelnovo, A., Crupi, R., Greco, G., Regoli, D., Penco, I.G., Cosentini, A.C.: A clarification of the nuances in the fairness metrics landscape. Sci. Rep. 12(1), 4209 (2022). https://doi.org/10.1038/s41598-022-07939-1

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  2. Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York, NY (1986). https://doi.org/10.1007/978-1-4613-8643-8

  3. Freedman, D., Diaconis, P.: On the histogram as a density estimator: L2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 57(4), 453–476 (1981). https://doi.org/10.1007/BF01025868

    Article  MathSciNet  MATH  Google Scholar 

  4. Holstein, K., Doroudi, S.: Equity and artificial intelligence in education: Will "AIEd" amplify or alleviate inequities in education? ArXiv:abs/2104.12920 (2021)

  5. Kizilcec, R.F., Lee, H.: Algorithmic Fairness in Education. Ethics in Artificial Intelligence in Education (Forthcoming). Arxiv:abs/2007.05443

  6. Kuzilek, J., Hlosta, M., Zdrahal, Z.: Open university learning analytics dataset. Sci. Data 4(1), 170171 (2017). https://doi.org/10.1038/sdata.2017.171

    Article  PubMed  PubMed Central  Google Scholar 

  7. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6) (2021). https://doi.org/10.1145/3457607

  8. Romero, C., Ventura, S.: Educational data mining and learning analytics: an updated survey. WIREs Data Min. Knowl. Discovery 10(3), e1355 (2020). https://doi.org/10.1002/widm.1355

    Article  MATH  Google Scholar 

  9. Verger, M., Lallé, S., Bouchet, F., Luengo, V.: Is your model "madd"? a novel metric to evaluate algorithmic fairness for predictive student models. In: Proceedings of the 16th International Conference on Educational Data Mining, pp. 91–102. International Educational Data Mining Society (2023). https://doi.org/10.5281/zenodo.8115786

  10. Verma, S., Rubin, J.: Fairness definitions explained. In: Proceedings of the International Workshop on Software Fairness, p. 1–7. FairWare ’18, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3194770.3194776

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mélina Verger .

Editor information

Editors and Affiliations

Appendices

Proofs of 3.2

We set C to be a random variable with probability density function \(\mathcal {D}\), representing the predicted probability value of the output of the model \(\mathcal {C}\), and S to be a random variable subject to Bernoulli distribution, representing the value of the sensitive parameter a. Thus, \(\mathcal {D}^{G_0}\) and \(\mathcal {D}^{G_1}\) are the probability density functions of the conditional distributions \(C | S=0\) and \(C | S=1\), respectively. According to the law of total probability, we have:

$$ \begin{alignedat}{2} &\quad &\mathbb {P}\left( C \le t \right) &= \mathbb {P}\left( C \le t \mid S=0 \right) \ \mathbb {P}\left( S=0 \right) + \mathbb {P}\left( C \le t \mid S=1 \right) \ \mathbb {P}\left( S=1 \right) \\ &\Longleftrightarrow & F(t) &= F^{G_0}(t) \ \mathbb {P}\left( S=0 \right) + F^{G_1}(t) \ \mathbb {P}\left( S=1 \right) \\ &\Longleftrightarrow & \mathcal {D}(t) &= \mathcal {D}^{G_0}(t) \ \mathbb {P}\left( S=0 \right) + \mathcal {D}^{G_1}(t) \ \mathbb {P}\left( S=1 \right) \end{alignedat} $$

Where \(F, F^{G_0}, F^{G_1}\) are the cumulative distribution functions (CDFs) of \(\mathcal {D}\), \(\mathcal {D}^{G_0}\), \(\mathcal {D}^{G_1}\), respectively. Since \(\mathbb {P}\left( S=0 \right) + \mathbb {P}\left( S=1 \right) = 1\), f is a linear combination of \(\mathcal {D}^{G_0}\) and \(\mathcal {D}^{G_1}\), and \(\mathcal {D}\) lies between \(\mathcal {D}^{G_0}\) and \(\mathcal {D}^{G_1}\) in the function space (i.e., \(\mathcal {D}^{G_0}, \mathcal {D}, \mathcal {D}^{G_0}\) are collinear.

This property is also true for estimators obtained from observed values. In fact, the definition of the sequence of the heights of the histogram is: for the m equal sub-intervals \(\left[ \frac{k-1}{m}, \frac{k}{m} \right] \) for all \(k \in \{1, \ldots , m\}\) on [0, 1],

$$\begin{aligned} &D_{G_0}^a &:= \left\{ d_{G_0, k} \mid \forall k \in \{ 1,\ldots , m \} \right\} , &\text { with } d_{G_0, k} &:= &\frac{N_{G_0, k}}{n_0} &:= &\frac{1}{n_0}\sum _{i \in G_0} \textbf{1}_{\hat{p}_i \in I_k} \\ &D_{G_1}^a &:= \left\{ d_{G_1, k} \mid \forall k \in \{ 1,\ldots , m \} \right\} , &\text { with } d_{G_1, k} &:= &\frac{N_{G_1, k}}{n_1} &:= &\frac{1}{n_1}\sum _{i \in G_1} \textbf{1}_{\hat{p}_i \in I_k} \\ &D_{G} &:= \left\{ d_{G, k} \mid \forall k \in \{ 1,\ldots , m \} \right\} , &\text { with } d_{G, k} &:= &\frac{N_{G_0, k} + N_{G_1, k}}{n_0 + n_1} &:= &\frac{1}{n_0 + n_1}\sum _{i \in G} \textbf{1}_{\hat{p}_i \in I_k} \end{aligned}$$

And because for all \(k \in \{1,\ldots , m\}\), we have:

$$\begin{aligned} d_{G, k} &= \frac{N_{G_0, k} + N_{G_1, k}}{n_0 + n_1} = \frac{n_0 \ d_{G_0,k} + n_1 \ d_{G_1,k}}{n_0 + n_1} \\ &= \frac{n_0}{n_0 + n_1} d_{G_0,k} + \frac{n_1}{n_0 + n_1} d_{G_1,k} \end{aligned}$$

Also, \(f^{(G_0)}, f, f^{(G_1)}\) are based on \(D_{G_0}^a, D_{G}, D_{G_1}^a\), respectively:

$$\begin{aligned} f^{(G_0)}(x) &:= \sum _{k=i}^m d_{G_0, k} \textbf{1}_{x \in I_k} \\ f^{(G_1)}(x) &:= \sum _{k=i}^m d_{G_1, k} \textbf{1}_{x \in I_k} \\ f(x) &:= \sum _{k=i}^m d_{G, k} \textbf{1}_{x \in I_k} \end{aligned}$$

Therefore, \(f^{(G)}(x) = \frac{n_0}{n_0 + n_1} f^{(G_0)}(x) + \frac{n_1}{n_0 + n_1} f^{(G_1)}(x)\), so \(f^{(G_0)}, f, f^{(G_1)}\) are also collinear (see Fig. 3c). This is not a coincidence; in fact, as histogram estimators, when \((n_0, n_1) \rightarrow +\infty \), \(\left( f^{(G_0)}, f, f^{(G_1)} \right) \rightarrow \left( \mathcal {D}_{G_0}^a, \mathcal {D}_{G}, \mathcal {D}_{G_1}^a \right) \).

Proof of 3.3

According to Inverse transform sampling [2], we have the following two theorems:

Theorem 1

Let \(\mathcal {A}\) be a distribution and \(F_{\mathcal {A}}\) be the cumulative distribution function of that distribution. If X obeys the distribution \(\mathcal {A}\) i.e. \(X \sim \mathcal {A}\), then \(F_{\mathcal {A}}(X) \sim \mathcal {U}_{[0, 1]}\), where \(\mathcal {U}_{[0,1]}\) is a uniform distribution over [0, 1].

Theorem 2

Let \(U \sim \mathcal {U}_{[0,1]}\) and \(F^{-1}_{\mathcal {A}}\) be the generalised inverse function of \(F_{\mathcal {A}}\), then \(F^{-1}_{\mathcal {A}}(U) \sim \mathcal {A}\).

Take \(G_0\) as an example. By definition, the newly generated prediction \(\overline{p}_{i}^{(\lambda )}\) is \(\overline{\operatorname {CDF}}_{(G_0)}^{-1(\lambda )}(\operatorname {CDF}_{(G_0)}(\hat{p}_i))\), and by Theorem 1, we have \(\operatorname {CDF}_{(G_0)}(\hat{p}_i) \sim \mathcal {U}_{[0,1]}\), so \(\overline{\operatorname {CDF}}_{(G_0)}^{-1(\lambda )}(\operatorname {CDF}_{(G_0)}(\hat{p}_i))\) obeys the newly generated distribution according to Theorem 2. Furthermore, since the CDF is monotone increasing and the inverse function does not change the monotonicity, \(\overline{\operatorname {CDF}}_{(G_0)}^{-1(\lambda )}\) is also monotone increasing, which means that \(\forall i,j, \hat{p}_i \ge \hat{p}_j \Longrightarrow \overline{p}_{i}^{(\lambda )} \ge \overline{p}_{i}^{(\lambda )}\). The conclusion on \(G_1\) follows in the same way.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Verger, M., Fan, C., Lallé, S., Bouchet, F., Luengo, V. (2025). A Fair Post-processing Method Based on the MADD Metric for Predictive Student Models. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2134. Springer, Cham. https://doi.org/10.1007/978-3-031-74627-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-74627-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-74626-0

  • Online ISBN: 978-3-031-74627-7

  • eBook Packages: Artificial Intelligence (R0)

Publish with us

Policies and ethics