Abstract
Collaborative filtering relies on a sparse rating matrix, where each user rates a few products, to propose recommendations. The approach consists of approximating the sparse rating matrix with a simple model whose regularities allow to fill in the missing entries. The latent block model is a generative co-clustering model that can provide such an approximation. In this paper, we show that exogenous sensitive attributes can be incorporated in this model to make fair recommendations. Since users are only characterized by their ratings and their sensitive attribute, fairness is measured here by a parity criterion. We propose a definition of fairness specific to recommender systems, requiring item rankings to be independent of the users’ sensitive attribute. We show that our model ensures approximately fair recommendations provided that the classification of users approximately respects statistical parity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\(\gamma = (\boldsymbol{\tau }^{\left( U\right) }, \boldsymbol{\tau }^{\left( V\right) },\boldsymbol{\nu ^{\left( A\right) }},\boldsymbol{\rho ^{\left( A\right) }}, \boldsymbol{\nu ^{\left( B\right) }}, \boldsymbol{\rho ^{\left( B\right) }}, \boldsymbol{\nu ^{\left( C\right) }}, \boldsymbol{\rho ^{\left( C\right) }})\).
- 2.
\(\gamma = (\boldsymbol{\tau }^{\left( U\right) }, \boldsymbol{\tau }^{\left( V\right) },\boldsymbol{\nu ^{\left( A\right) }},\boldsymbol{\rho ^{\left( A\right) }}, \boldsymbol{\nu ^{\left( B\right) }}, \boldsymbol{\rho ^{\left( B\right) }}, \boldsymbol{\nu ^{\left( C\right) }}, \boldsymbol{\rho ^{\left( C\right) }})\).
References
Abbasi, M., Bhaskara, A., Venkatasubramanian, S.: Fair clustering via equitable group representations. In: Elish, M.C., Isaac, W., Zemel, R.S. (eds.) ACM Conference on Fairness, Accountability, and Transparency (FAccT), pp. 504–514 (2021). https://doi.org/10.1145/3442188.3445913
Baudry, J.P., Celeux, G.: EM for mixtures. Stat. Comput. 25(4), 713–726 (2015). https://doi.org/10.1007/s11222-015-9561-x
Bellogin, A., Castells, P., Cantador, I.: Precision-oriented evaluation of recommender systems: an algorithmic comparison. In: Proceedings of the Fifth ACM Conference on Recommender Systems, pp. 333–336. Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/2043932.2043996
Bera, S.K., Chakrabarty, D., Flores, N.J., Negahbani, M.: Fair algorithms for clustering (2019)
Beutel, A., et al.: Fairness in Recommendation Ranking through Pairwise Comparisons, pp. 2212–2220 (2019). https://doi.org/10.1145/3292500.3330745
Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41, 561–575 (2003). https://doi.org/10.1016/S0167-9473(02)00163-9
Binns, R.: Fairness in machine learning: lessons from political philosophy. In: Friedler, S.A., Wilson, C. (eds.) Proceedings of the 1st Conference on Fairness, Accountability and Transparency. Proceedings of Machine Learning Research, vol. 81, pp. 149–159. PMLR, 23–24 Feb 2018, New York, NY, USA. http://proceedings.mlr.press/v81/binns18a.html
Brault, V., Mariadassou, M.: Co-clustering through latent bloc model: a review. J. de la Société Française de Statistique 156(3), 120–139 (2015). http://journal-sfds.fr/article/view/474/448
Burke, R., Sonboli, N., Ordonez-Gauger, A.: Balanced neighborhoods for multi-sided fairness in recommendation. In: 1st Conference on Fairness, Accountability and Transparency. PMLR, vol. 81, pp. 202–214 (2018). http://proceedings.mlr.press/v81/burke18a.html
Bürkner, P.C., Vuorre, M.: Ordinal regression models in psychology: a tutorial. Adv. Meth. Pract. Psychol. Sci. 2(1), 77–101 (2019)
Daykin, A.R., Moffatt, P.G.: Analyzing ordered responses: a review of the ordered Probit model. Understand. Stat. 1(3), 157–166 (2002). https://doi.org/10.1207/S15328031US0103_02
Gajane, P.: On formalizing fairness in prediction with machine learning. CoRR abs/1710.03184 (2017). http://arxiv.org/abs/1710.03184
George, T., Merugu, S.: A scalable collaborative filtering framework based on co-clustering. In: Fifth IEEE International Conference on Data Mining (ICDM) (2005)
Ghadiri, M., Samadi, S., Vempala, S.: Socially fair k-means clustering. arXiv preprint arXiv:2006.10085 (2020)
Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52(6), 3233–3245 (2008)
Govaert, G., Nadif, M.: Latent block model for contingency table. Commun. Stat. Theory Meth. 39(3), 416–425 (2010). https://doi.org/10.1080/03610920903140197
Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems 29, pp. 3315–3323 (2016). https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html
Hug, N.: Surprise: a python library for recommender systems. J. Open Source Softw. 5(52), 2174 (2020). https://doi.org/10.21105/joss.02174
Jaakkola, T.S.: Tutorial on variational approximation methods. In: Advanced Mean Field Methods: Theory and Practice, pp. 129–159. MIT Press, Cambridge (2000)
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: ACM SIGIR Forum, vol. 51, pp. 243–250 (2017)
Kamishima, T., Akaho, S., Asoh, H., Sakuma, J.: Recommendation independence. In: Conference on Fairness, Accountability and Transparency, pp. 187–201 (2018)
Keribin, C., Brault, V., Celeux, G., Govaert, G.: Estimation and selection for the latent block model on categorical data. Stat. Comput. 25(6), 1201–1216 (2015). https://hal.inria.fr/hal-01095957
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Lomet, A., Govaert, G., Grandvalet, Y.: Model selection for Gaussian latent block clustering with the integrated classification likelihood. Adv. Data Anal. Classif. 12(3), 489–508 (2018). https://hal.archives-ouvertes.fr/hal-00913680
Marlin, B.M., Zemel, R.S., Roweis, S.T., Slaney, M.: Collaborative filtering and the missing at random assumption. In: Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI), pp. 267–275 (2007)
Marlin, B.M., Zemel, R.S., Roweis, S.T., Slaney, M.: Collaborative filtering and the missing at random assumption. CoRR abs/1206.5267 (2012). http://arxiv.org/abs/1206.5267
Movielens 1M datasets. https://grouplens.org/datasets/movielens/
Ranganath, R., Gerrish, S., Blei, D.: Black box variational inference. In: Kaski, S., Corander, J. (eds.) Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 33, pp. 814–822. PMLR, 22–25 April 2014, Reykjavik, Iceland. http://proceedings.mlr.press/v33/ranganath14.html
Räz, T.: Group fairness: independence revisited. arXiv preprint arXiv:2101.02968 (2021)
Rendle, S., Zhang, L., Koren, Y.: On the difficulty of evaluating baselines: a study on recommender systems. arXiv preprint arXiv:1905.01395 (2019)
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976). http://www.jstor.org/stable/2335739
Yao, S., Huang, B.: Beyond parity: fairness objectives for collaborative filtering. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/e6384711491713d29bc63fc5eeb5ba4f-Paper.pdf
Zhu, Z., Hu, X., Caverlee, J.: Fairness-aware tensor-based recommendation. In: 27th ACM International Conference on Information and Knowledge Management, pp. 1153–1162 (2018). https://doi.org/10.1145/3269206.3271795
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Co-clustering for Fair Recommendation. Supplementary Material
A Computation of the Variational Log-Likelihood Criterion
The criterion we want to optimize is:
We chose to restrict the space of the variational distribution \(q_{\gamma }\) in order to get a fully factorized form:
where \(\gamma \) denotes the parameters concatenation of the variational distributionFootnote 2 \(q_{\gamma }\). The entropy is additive across independant variables so we get:
with the following terms:
The independence of the latent variables allows to rewrite the expectation of the complete log-likelihood as:
with the following terms:
and as the entries of the data matrix \(\boldsymbol{R}\) are independent and identically distributed:
where \(\boldsymbol{R}^{(\text {o})}\) denotes the set of observed ratings and \(\boldsymbol{R}^{{\left( \lnot o\right) }}\), the set of non-observed ratings, where \(R_{ij}=\text {NA}\). From Eq. S3, it becomes clear that maximizing \(\mathbb {E}_{q_{\gamma }} \mathcal {L}(\boldsymbol{R}^{(\lnot o)})\) is not necessary to infer the model parameters used for prediction and therefore ignoring the non-observed data is correct. The expectation of the conditional log-likelihood (first term of right side of Eq. S3) is numerically estimated by sampling from \(q_{\gamma }\).
Stochastic Gradient Optimization. To optimize the criterion with stochastic gradient descent, we express the variational log-likelihood criterion on a single rating:
A batch of data, \(\boldsymbol{R}_{(i:i+n),(j:j+n)}\), consists of a \((n\times n)\) sub-matrix randomly sampled from the original matrix \(\boldsymbol{R}\).
B Clustering \(\varepsilon \)-parity and \(\varepsilon \)-fair Recommendation for Arbitrary Discrete Sensitive Attribute
Definition S1
(Clustering \(\varepsilon \)-parity, arbitrary discrete sensitive attribute). The clustering of users is said to respect \(\varepsilon \)-parity with respect to the discrete attribute \(s\in {\mathcal S}\) iff:
where \(\varepsilon \in \mathbb {R}_+\) measures the gap to exact parity, \(u_{iq}\) is the (hard) membership of user \(i\) to cluster \(q\),and \(\#\left\{ i|\varOmega \right\} \) is the number of users defined by the cardinality of the set \(\varOmega \).
Definition S2
(\(\varepsilon \)-fair recommendation, arbitrary discrete sensitive attribute). A recommender system is said to be \(\varepsilon \)-fair with respect to the dicrete attribute \(s\in {\mathcal S}\) if for any two items \(j\) and \(j'\):
where \(\varepsilon \in \mathbb {R}_+\) measures the gap to exact fairness
C Proof of Theorem 1
Theorem 1
(Fair recommendation from clustering parity). If the clustering of users in \({k_1}\) groups respects \(\varepsilon \)-parity (Definition 3 or Definition S1) then the recommender system relying on the relevance score defined in Eq. (7) is \(({k_1}\varepsilon )\)-fair (Definition 1 or Definition S2).
Proof
Suppose that \(\boldsymbol{\tau }^{\left( U\right) }\), the maximum a posteriori of \(\boldsymbol{U}\), is a binary matrix; \(\boldsymbol{\tau }^{\left( U\right) }\) is thus a \({n_1}\times {k_1}\) indicator matrix of row classes membership. Then, given user \(i\), item \(j\) is said to be preferred to item \(j'\) if \(\hat{R}_{ij} > \hat{R}_{ij'}\), that is:
with \(\boldsymbol{a} \in \mathbb {R}^{{k_1}}\) defined by \(\boldsymbol{a}=\hat{\boldsymbol{\mu }} {\left( \boldsymbol{\tau }^{\left( V\right) }_{j} - \boldsymbol{\tau }^{\left( V\right) }_{j'}\right) }^T\), \(b \in \mathbb {R}\) defined by \(b = \nu ^{\left( B\right) }_{j'} - \nu ^{\left( B\right) }_{j}\) and \(d_{i}\in \{1,\cdots ,{k_1}\}\) being the group indicator of user \(i\): \(\tau ^{\left( U\right) }_{i, d_{i}} = 1\).
Suppose \(\varepsilon \)-parity, from Definition S1 (Definition 3 is a particular case of Definition S1), we have
therefore,
By summing over all groups, we get:
and from the triangular inequality,
And, applying (S6), the result is obtained:
\(\square \)
D Supplemental Results for MovieLens 1M
1.1 D.1 Gender as Sensitive Attribute
Supplemental Analysis of the Model. We list in Tables 2 and 3 the most extreme movies according to the inferred value of their latent variable \(C_j\). Variable \(C_j\) encodes the difference in opinion between the sensitive groups, not the overall opinion. For example, a movie may well be liked by most people but liked even more by males. Table 2 lists movies for which females have a better opinion than males and Table 3 lists movies for which males have a better opinion than females.
Higher Number of Groups. We did not optimize the hyper-parameters of the compared models. We present here additional experiments to illustrate that the conclusions of Sect. 4 apply to different hyper-parameter settings. Using a substantially larger number of groups (\({k_1}=50\) user groups and \({k_2}=50\) item groups) or a larger dimension of latent factors for SVD (also 50), the statistical gender parity measures given in Table 4 and the recommendation performance given in Fig. 7 are qualitatively similar to the ones given in Table 1 and Fig. 5.
1.2 D.2 Age as Sensitive Attribute
The age range of the users is indicated within the following intervals: ‘Under 18’,‘18–24’, ‘25–34’, ‘35–44’, ‘45–49’, ‘50–55’ and ‘56+’. The counts of users in each age category is displayed in Fig. 8.
User age is treated as sensitive: we introduce seven binary sensitive attributes \(s_i\) encoding for the seven categories of user age. We use a one-hot encoding of the seven categories of user age and introduce for the purpose seven binary sensitive attributes \(s^{1}_i, \cdots , s^{7}_i\) and their item associated latent variables \(C^{1}_j, \cdots , C^{7}_j\). We use the protocol described in Sect. 4 with the exception that our Parity-LBM is initialized from estimates obtained with the Standard-LBM. Table 5 presents results of the \(\chi ^2\) statistics constructed from the contingency table of user age counts in each group. The methods that do not consider the sensitive variable in the modelling create groups that are dependent on the age and assuming the statistical parity with our Parity-LBM model is reasonable.
Finally, we illustrate the interpretability of the estimates of the latent variables \(C^{1}_j, \cdots ,C^{7}_j\) related to movies. For each age category k, we select the thirty movies with the largest value of the latent variables \(C^{k}_j\). These movies have the largest positive opinion bias for users in the given age category. Figure 9 displays a boxplot of the release years of these films for all user age categories. The greater variability in the distribution for older users means that they have a comparatively higher opinion of older movies than younger users. If user age were the sensitive attribute, the recommendations would not account for these differences.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Frisch, G., Leger, JB., Grandvalet, Y. (2021). Co-clustering for Fair Recommendation. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-93736-2_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93735-5
Online ISBN: 978-3-030-93736-2
eBook Packages: Computer ScienceComputer Science (R0)