A Fair Post-processing Method Based on the MADD Metric for Predictive Student Models

Verger, Mélina; Fan, Chunyang; Lallé, Sébastien; Bouchet, François; Luengo, Vanda

doi:10.1007/978-3-031-74627-7_3

Mélina Verger⁴,
Chunyang Fan⁴,
Sébastien Lallé⁴,
François Bouchet⁴ &
…
Vanda Luengo⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2134))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

69 Accesses

Abstract

Predictive student models are increasingly used in learning environments. However, due to the rising social impact of their usage, it is now all the more important for these models to be both sufficiently accurate and fair in their predictions. To evaluate algorithmic fairness, a new metric has been developed in education, namely the Model Absolute Density Distance (MADD). This metric enables us to measure how different a predictive model behaves regarding two groups of students, in order to quantify its algorithmic unfairness. In this paper, we thus develop a post-processing method based on this metric, that aims at improving the fairness while preserving the accuracy of relevant predictive models’ results. We experiment with our approach on the task of predicting student success in an online course, using both simulated and real-world educational data, and obtain successful results. Our source code and data are in open access at https://github.com/melinaverger/MADD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fairness of MOOC Completion Predictions Across Demographics and Contextual Variables

Assessing Algorithmic Fairness in Automatic Classifiers of Educational Forum Posts

Debiasing Education Algorithms

Article Open access 04 January 2024

Notes

1.
Here, not a mathematical function, but a programming function. See details at https://github.com/melinaverger/MADD.
2.
$\overline{y}_i$ corresponds to the new predictions (1 or 0) obtained thanks to the new $\overline{p}_{i}^{(\lambda )}$ thresholded with the classification threshold parameter t we primarily set at 0.5 (i.e. $\overline{y}_i = 0$ when $\overline{p}_{i}^{(\lambda )} < 0.5$, $\overline{y}_i = 1$ otherwise)..

References

Castelnovo, A., Crupi, R., Greco, G., Regoli, D., Penco, I.G., Cosentini, A.C.: A clarification of the nuances in the fairness metrics landscape. Sci. Rep. 12(1), 4209 (2022). https://doi.org/10.1038/s41598-022-07939-1
Article ADS PubMed PubMed Central Google Scholar
Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York, NY (1986). https://doi.org/10.1007/978-1-4613-8643-8
Freedman, D., Diaconis, P.: On the histogram as a density estimator: L2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 57(4), 453–476 (1981). https://doi.org/10.1007/BF01025868
Article MathSciNet MATH Google Scholar
Holstein, K., Doroudi, S.: Equity and artificial intelligence in education: Will "AIEd" amplify or alleviate inequities in education? ArXiv:abs/2104.12920 (2021)
Kizilcec, R.F., Lee, H.: Algorithmic Fairness in Education. Ethics in Artificial Intelligence in Education (Forthcoming). Arxiv:abs/2007.05443
Kuzilek, J., Hlosta, M., Zdrahal, Z.: Open university learning analytics dataset. Sci. Data 4(1), 170171 (2017). https://doi.org/10.1038/sdata.2017.171
Article PubMed PubMed Central Google Scholar
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6) (2021). https://doi.org/10.1145/3457607
Romero, C., Ventura, S.: Educational data mining and learning analytics: an updated survey. WIREs Data Min. Knowl. Discovery 10(3), e1355 (2020). https://doi.org/10.1002/widm.1355
Article MATH Google Scholar
Verger, M., Lallé, S., Bouchet, F., Luengo, V.: Is your model "madd"? a novel metric to evaluate algorithmic fairness for predictive student models. In: Proceedings of the 16th International Conference on Educational Data Mining, pp. 91–102. International Educational Data Mining Society (2023). https://doi.org/10.5281/zenodo.8115786
Verma, S., Rubin, J.: Fairness definitions explained. In: Proceedings of the International Workshop on Software Fairness, p. 1–7. FairWare ’18, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3194770.3194776

Download references

Author information

Authors and Affiliations

Sorbonne Université, CNRS, LIP6, 75005, Paris, France
Mélina Verger, Chunyang Fan, Sébastien Lallé, François Bouchet & Vanda Luengo

Authors

Mélina Verger
View author publications
You can also search for this author in PubMed Google Scholar
Chunyang Fan
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Lallé
View author publications
You can also search for this author in PubMed Google Scholar
François Bouchet
View author publications
You can also search for this author in PubMed Google Scholar
Vanda Luengo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mélina Verger .

Editor information

Editors and Affiliations

University of Turin, Turin, Italy
Rosa Meo
Sapienza University of Rome, Rome, Italy
Fabrizio Silvestri

Appendices

Proofs of 3.2

We set C to be a random variable with probability density function $\mathcal {D}$, representing the predicted probability value of the output of the model $\mathcal {C}$, and S to be a random variable subject to Bernoulli distribution, representing the value of the sensitive parameter a. Thus, $\mathcal {D}^{G_0}$ and $\mathcal {D}^{G_1}$ are the probability density functions of the conditional distributions $C | S=0$ and $C | S=1$, respectively. According to the law of total probability, we have:

$$ \begin{alignedat}{2} &\quad &\mathbb {P}\left( C \le t \right) &= \mathbb {P}\left( C \le t \mid S=0 \right) \ \mathbb {P}\left( S=0 \right) + \mathbb {P}\left( C \le t \mid S=1 \right) \ \mathbb {P}\left( S=1 \right) \\ &\Longleftrightarrow & F(t) &= F^{G_0}(t) \ \mathbb {P}\left( S=0 \right) + F^{G_1}(t) \ \mathbb {P}\left( S=1 \right) \\ &\Longleftrightarrow & \mathcal {D}(t) &= \mathcal {D}^{G_0}(t) \ \mathbb {P}\left( S=0 \right) + \mathcal {D}^{G_1}(t) \ \mathbb {P}\left( S=1 \right) \end{alignedat} $$

Where $F, F^{G_0}, F^{G_1}$ are the cumulative distribution functions (CDFs) of $\mathcal {D}$, $\mathcal {D}^{G_0}$, $\mathcal {D}^{G_1}$, respectively. Since $\mathbb {P}\left( S=0 \right) + \mathbb {P}\left( S=1 \right) = 1$, f is a linear combination of $\mathcal {D}^{G_0}$ and $\mathcal {D}^{G_1}$, and $\mathcal {D}$ lies between $\mathcal {D}^{G_0}$ and $\mathcal {D}^{G_1}$ in the function space (i.e., $\mathcal {D}^{G_0}, \mathcal {D}, \mathcal {D}^{G_0}$ are collinear.

This property is also true for estimators obtained from observed values. In fact, the definition of the sequence of the heights of the histogram is: for the m equal sub-intervals $\left[ \frac{k-1}{m}, \frac{k}{m} \right] $ for all $k \in \{1, \ldots , m\}$ on [0, 1],

$$\begin{aligned} &D_{G_0}^a &:= \left\{ d_{G_0, k} \mid \forall k \in \{ 1,\ldots , m \} \right\} , &\text { with } d_{G_0, k} &:= &\frac{N_{G_0, k}}{n_0} &:= &\frac{1}{n_0}\sum _{i \in G_0} \textbf{1}_{\hat{p}_i \in I_k} \\ &D_{G_1}^a &:= \left\{ d_{G_1, k} \mid \forall k \in \{ 1,\ldots , m \} \right\} , &\text { with } d_{G_1, k} &:= &\frac{N_{G_1, k}}{n_1} &:= &\frac{1}{n_1}\sum _{i \in G_1} \textbf{1}_{\hat{p}_i \in I_k} \\ &D_{G} &:= \left\{ d_{G, k} \mid \forall k \in \{ 1,\ldots , m \} \right\} , &\text { with } d_{G, k} &:= &\frac{N_{G_0, k} + N_{G_1, k}}{n_0 + n_1} &:= &\frac{1}{n_0 + n_1}\sum _{i \in G} \textbf{1}_{\hat{p}_i \in I_k} \end{aligned}$$

And because for all $k \in \{1,\ldots , m\}$, we have:

$$\begin{aligned} d_{G, k} &= \frac{N_{G_0, k} + N_{G_1, k}}{n_0 + n_1} = \frac{n_0 \ d_{G_0,k} + n_1 \ d_{G_1,k}}{n_0 + n_1} \\ &= \frac{n_0}{n_0 + n_1} d_{G_0,k} + \frac{n_1}{n_0 + n_1} d_{G_1,k} \end{aligned}$$

Also, $f^{(G_0)}, f, f^{(G_1)}$ are based on $D_{G_0}^a, D_{G}, D_{G_1}^a$, respectively:

$$\begin{aligned} f^{(G_0)}(x) &:= \sum _{k=i}^m d_{G_0, k} \textbf{1}_{x \in I_k} \\ f^{(G_1)}(x) &:= \sum _{k=i}^m d_{G_1, k} \textbf{1}_{x \in I_k} \\ f(x) &:= \sum _{k=i}^m d_{G, k} \textbf{1}_{x \in I_k} \end{aligned}$$

Therefore, $f^{(G)}(x) = \frac{n_0}{n_0 + n_1} f^{(G_0)}(x) + \frac{n_1}{n_0 + n_1} f^{(G_1)}(x)$, so $f^{(G_0)}, f, f^{(G_1)}$ are also collinear (see Fig. 3c). This is not a coincidence; in fact, as histogram estimators, when $(n_0, n_1) \rightarrow +\infty $, $\left( f^{(G_0)}, f, f^{(G_1)} \right) \rightarrow \left( \mathcal {D}_{G_0}^a, \mathcal {D}_{G}, \mathcal {D}_{G_1}^a \right) $.

Proof of 3.3

According to Inverse transform sampling [2], we have the following two theorems:

Theorem 1

Let $\mathcal {A}$ be a distribution and $F_{\mathcal {A}}$ be the cumulative distribution function of that distribution. If X obeys the distribution $\mathcal {A}$ i.e. $X \sim \mathcal {A}$, then $F_{\mathcal {A}}(X) \sim \mathcal {U}_{[0, 1]}$, where $\mathcal {U}_{[0,1]}$ is a uniform distribution over [0, 1].

Theorem 2

Let $U \sim \mathcal {U}_{[0,1]}$ and $F^{-1}_{\mathcal {A}}$ be the generalised inverse function of $F_{\mathcal {A}}$, then $F^{-1}_{\mathcal {A}}(U) \sim \mathcal {A}$.

Take $G_0$ as an example. By definition, the newly generated prediction $\overline{p}_{i}^{(\lambda )}$ is $\overline{\operatorname {CDF}}_{(G_0)}^{-1(\lambda )}(\operatorname {CDF}_{(G_0)}(\hat{p}_i))$, and by Theorem 1, we have $\operatorname {CDF}_{(G_0)}(\hat{p}_i) \sim \mathcal {U}_{[0,1]}$, so $\overline{\operatorname {CDF}}_{(G_0)}^{-1(\lambda )}(\operatorname {CDF}_{(G_0)}(\hat{p}_i))$ obeys the newly generated distribution according to Theorem 2. Furthermore, since the CDF is monotone increasing and the inverse function does not change the monotonicity, $\overline{\operatorname {CDF}}_{(G_0)}^{-1(\lambda )}$ is also monotone increasing, which means that $\forall i,j, \hat{p}_i \ge \hat{p}_j \Longrightarrow \overline{p}_{i}^{(\lambda )} \ge \overline{p}_{i}^{(\lambda )}$. The conclusion on $G_1$ follows in the same way.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verger, M., Fan, C., Lallé, S., Bouchet, F., Luengo, V. (2025). A Fair Post-processing Method Based on the MADD Metric for Predictive Student Models. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2134. Springer, Cham. https://doi.org/10.1007/978-3-031-74627-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-74627-7_3
Published: 01 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74626-0
Online ISBN: 978-3-031-74627-7
eBook Packages: Artificial Intelligence (R0)

Publish with us

Policies and ethics

A Fair Post-processing Method Based on the MADD Metric for Predictive Student Models