Abstract
Robust group lasso regression plays an important role in high-dimensional regression modeling such as biological data analysis for disease diagnosis and gene expression. However, most existing methods are optimized with prior variable structure knowledge under the traditional empirical risk minimization (ERM) framework, in which the estimators are excessively dependent on prior structure information and sensitive to outliers. To address this issue, we propose a new robust variable structure discovery method for group lasso based on a convergent bilevel optimization framework. In this paper, we adopt tilted empirical risk minimization (TERM) as the target function to improve the robustness of the estimator by assigning less weight to the outliers. Moreover, we modify the TERM objective function to calculate its Fenchel conjugate while maintaining its robust property, which is proven theoretically and empirically. Experimental results on both synthetic and real-world datasets show that the proposed method can improve the robust performance on prediction and variable structure discovery compared to the existing techniques.
Similar content being viewed by others
Notes
The code can be downloaded from https://github.com/zhangxuelincode/demoTERMGLhttps://github.com/zhangxuelincode/demoTERMGL
The data can be downloaded from https://adni.loni.usc.edu/
References
Yu X, Sun Y, Zhou H-J (2021) An adaptive shortest-solution guided decimation approach to sparse high-dimensional linear regression. Sci Rep 11(1):1–13
Wilms I, Croux C (2016) Robust sparse canonical correlation analysis. BMC Syst Biol 10 (1):1–13
Wang Y, Li X, Ruiz R (2018) Weighted general group lasso for gene selection in cancer classification. IEEE Trans Cybern 49(8):2860–2873
He H, Guo X, Yu J, Ai C, Shi S (2022) Overcoming the inadaptability of sparse group lasso for data with various group structures by stacking. Bioinformatics 38(6):1542–1549
Liu X, Goncalves AR, Cao P, Zhao D, Banerjee A (2018) Modeling alzheimer’s disease cognitive scores using multi-task sparse group lasso. Comput Med Imaging Graph 66:100–114
Cao P, Shan X, Zhao D, Huang M, Zaiane O (2017) Sparse shared structure based multi-task learning for mri based cognitive performance prediction of alzheimer’s disease. Pattern Recogn 72:219–235
Liu X, Cao P, Wang J, Kong J, Zhao D (2019) Fused group lasso regularized multi-task feature learning and its application to the cognitive performance prediction of alzheimer’s disease. Neuroinformatics 17 (2):271–294
Oliveira SHG, Gonçalves AR, Von Zuben FJ (2019) Group lasso with asymmetric structure estimation for multi-task learning. In: Proceedings of the 28th international joint conference on artificial intelligence, pp 3202–3208
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67
Ma S, Song X, Huang J (2007) Supervised group lasso with applications to microarray data analysis. BMC Bioinforma 8(1):1–17
Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? automatic structure discovery for partially linear models. J Am Stat Assoc 106(495):1099–1112
Yang G, Sun Y, Cui X (2017) Automatic structure discovery for varying-coefficient partially linear models. Commun Stat-Theory Methods 46(15):7703–7716
Hernández-Lobato D, Hernández-Lobato JM (2013) Learning feature selection dependencies in multi-task learning. Adv Neural Inf Process Syst 26
Pan C, Zhu M (2017) Group additive structure identification for kernel nonparametric regression. Adv Neural Inf Process Syst 30
Frecon J, Salzo S, Pontil M (2018) Bilevel learning of the group lasso structure. Adv Neural Inf Process Syst 31:8301–8311
Franceschi L, Frasconi P, Salzo S, Grazzi R, Pontil M (2018) Bilevel programming for hyperparameter optimization and meta-learning. In: International conference on machine learning, pp 1568–1577
Shaban A, Cheng C-A, Hatch N, Boots B (2019) Truncated back-propagation for bilevel optimization. In: The 22nd international conference on artificial intelligence and statistics, pp 1723–1732
Feurer M, Hutter F (2019) Hyperparameter optimization. Springer, Cham, pp 3–33
Ji K, Yang J, Liang Y (2021) Bilevel optimization: convergence analysis and enhanced design. In: International conference on machine learning, pp 4882–4892
Sun H, Pu W, Fu X, Chang T-H, Hong M (2022) Learning to continuously optimize wireless resource in a dynamic environment: a bilevel optimization perspective. IEEE Trans Signal Process 70:1900–1917
Liu H, Simonyan K, Yang Y (2019) Darts: differentiable architecture search. In: International conference on learning representations, pp 1–13
Bertinetto L, Henriques JF, Torr P, Vedaldi A (2018) Meta-learning with differentiable closed-form solvers. In: International conference on learning representations, pp 1–15
Rajeswaran A, Finn C, Kakade SM, Levine S (2019) Meta-learning with implicit gradients. Adv Neural Inf Process Syst 32
Ji K, Lee JD, Liang Y, Poor HV (2020) Convergence of meta-learning with task-specific adaptation over partial parameters. Adv Neural Inf Process Syst 33:11490–11500
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288
Feng Y, Huang X, Shi L, Yang Y, Suykens JA (2015) Learning with the maximum correntropy criterion induced losses for regression. J Mach Learn Res 16(1):993–1034
Li Y, Liang M, Mao L, Wang S (2021) Robust estimation and variable selection for the accelerated failure time model. Stat Med 40(20):4473–4491
Li T, Beirami A, Sanjabi M, Smith V (2020) Tilted empirical risk minimization. In: International conference on learning representations, pp 1–44
Huber PJ (1973) Robust regression: asymptotics, conjectures and monte carlo. Ann Stat 799–821
Sun Q, Zhou W-X, Fan J (2020) Adaptive huber regression. J Am Stat Assoc 115 (529):254–265
Feng Y, Wu Q (2022) A statistical learning assessment of huber regression. J Approx Theory 273:105660
Chen S, Gong C, Yang J, Li X, Wei Y, Li J (2018) Adversarial metric learning. In: Proceedings of the 27th international joint conference on artificial intelligence, pp 2021–2027
Sinha A, Malo P, Deb K (2017) A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans Evol Comput 22(2):276–295
Sharma A (2020) Optimistic variants of single-objective bilevel optimization for evolutionary algorithms. Int J Comput Intell Appl 19(03):2050020
Hong T, Zhao D, Zhang Y, Wang Z (2021) A bilevel voltage regulation operation for distribution systems with self-operated microgrids. IEEE Trans Smart Grid 13(2):1238–1248
Condat L (2016) Fast projection onto the simplex and the ℓ1-ball. Math Prog 158(1):575–585
Reddi SJ, Sra S, Poczos B, Smola AJ (2016) Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Adv Neural Inf Process Syst 29:1145–1153
Van Nguyen Q (2017) Forward-backward splitting with bregman distances. Vietnam J Math 45 (3):519–539
Nikolova M, Chan RH (2007) The equivalence of half-quadratic minimization and the gradient linearization iteration. IEEE Trans Image Process 16(6):1623–1627
He R, Zheng W-S, Tan T, Sun Z (2013) Half-quadratic-based iterative minimization for robust sparse representation. IEEE Trans Pattern Anal Mach Intell 36(2):261–275
Wang X, Chen H, Cai W, Shen D, Huang H (2017) Regularized modal regression with applications in cognitive impairment prediction. Adv Neural Inf Process Syst 30:1448–1458
Huber PJ (1981) Robust statistics Wiley series in probability and mathematical statistics
Yu Y-l, Aslan Ö, Schuurmans D (2012) A polynomial-time form of robust regression. Adv Neural Inf Process Syst 25
Yao W, Li L (2014) A new regression model: modal linear regression. Scand J Stat 41 (3):656–671
Weiner MW, Aisen PS, Jack Jr CR, Jagust WJ, Trojanowski JQ, Shaw L, Saykin AJ, Morris JC, Cairns N, Beckett LA et al (2010) The alzheimer’s disease neuroimaging initiative: progress report and future plans. Alzheim Dement 6(3):202–211
Yang J, Wang S, Wu T (2022) Maximum mutual information for feature extraction from graph-structured data: application to alzheimer’s disease classification. Appl Intell 1–17
Acknowledgements
This paper is supported by the Fundamental Research Funds for the Central Universities of China (No.2662022XXYJ005) and National Natural Science Foundation of China (No.12071166).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Robustness analysis
In this section, we analyze the robustness property of the modified TERM objective function with the related sensitivity curve with respect to prediction error and the finite sample breakdown point theory.
1.1 A.1: Sensitivity curve and gradient analysis
The modified TERM objective function is defined as \(\phi _{t}(e)=\frac {1}{t} \exp \{t e^{2}\}\) with negative t, where e represents the prediction error \(y_{i} - {x_{i}^{T}} \beta \). Inspired by [26], we draw the sensitivity curve by analyzing its derivative \(2e \cdot \exp \{t e^{2}\}\), as shown in Fig. 12.
Compared with the square loss function, where the samples with larger prediction errors have a greater impact on the weight updates, small weights are assigned to atypical samples in the modified TERM formulation, making this method more robust to outliers. Moreover, the behavior of the modified TERM is affected by t, e.g., the function converges slowly with a sufficiently large negative t value (e.g., \(t \rightarrow -\infty \)), and the outliers will affect the model when t is a small negative value (e.g., \(t \rightarrow 0^{-}\)). The optimal value of t should be set in a specific task, which is also discussed in the simulation experiments in Section 4.1.3. The performance of the modified TERM can be tuned by t to magnify (t > 0) or suppress (t < 0) the influence of outliers. The theoretical analysis of this is shown in Lemma 3 by exploring the gradient.
The gradient of the modified objective function is a weighted average of the original individual loss gradients \(\nabla _{\theta } f\left (x_{i} ; \theta \right )\). In addition, we find that the weight wi(t; 𝜃) is closely related to the value of the hyperparameter t and the loss \(f\left (x_{i} ; \theta \right )\) of each sample. Specifically, when t < 0, samples with larger loss values are suppressed by assigning less weight to them.
Lemma 3
For a smooth loss function \(f\left (x_{i} ; \beta \right )\), we have
Proof
Here, we provide the proof:
□
When t is positive, TERM focuses on the samples with relatively large losses (rare instances). Thus, the positive parameter t can be adopted when rare instances are important for estimation, which aims to minimize variance as proven in [28]. In summary, the new modified function of TERM without the \(\log \) operator can be considered as an alternate loss that magnifies samples with large loss when t > 0 and suppresses the outliers when t < 0. In this paper, we focus on the robust performance in nonoverlapping group lasso and variable structure discovery applications, where t < 0.
1.2 A.2: Finite sample breakdown point analysis
We are interested in studying the behavior of the modified TERM estimator when the dataset is contaminated with outliers. To further discuss the robustness of our estimator, we analyze its finite sample breakdown point.
Various definitions of the sample breakdown point have been proposed. In this work, we analyze the finite sample contamination breakdown point [42], which is used to quantify the proportion of bad data that an estimator can tolerate before returning arbitrary values.
Given original observations \(\boldsymbol {S}=\left (s_{1}, \ldots , s_{n}\right )\), where \(s_{i}=\left (\boldsymbol {x}_{i}, y_{i}\right )\), denote T(S) as the minimizer of the following objective function:
and
The original samples S are then corrupted by adding extra m arbitrary points \(\boldsymbol {S}^{\prime }=\left (s_{n+1}, \ldots , s_{n+m}\right )\). Thus, the new dataset \(\boldsymbol {S} \cup \boldsymbol {S}^{\prime }\) contains ε = m/(m + n) bad samples. The finite sample contamination breakdown point ε∗ is defined as
where ∥⋅∥ is the Euclidean norm.
Theorem 2
Given original observations \(\boldsymbol {S}=\left (s_{1}, \ldots , s_{n}\right )\), suppose \(T(S)=\hat {\beta }\), which is the minimizer of (13). Let \(\phi _{t}(v)=\frac {1}{t} e^{t v^{2}}\), and t < 0. Denote \(M= t {\sum }_{i=1}^{n} \phi _{t}\) \(\left (y_{i}-\boldsymbol {x}_{i}^{T} \hat {\beta }\right )\). Then, the finite sample contamination breakdown point of the modified TERM framework is
where m∗ represents the ideal maximum number of abnormal samples and satisfies
where ⌈M⌉ represents the largest integer that is not greater than M, and ⌊M⌋ is the smallest integer that is not less than M.
Proof
Given the modified TERM objective ϕt(⋅), original n samples \(\boldsymbol {S}=\left (s_{1}, \ldots , s_{n}\right )\) and extra m arbitrary samples \(\boldsymbol {S}^{\prime }=\left (s_{n+1}, \ldots , s_{n+m}\right )\), denote T as the estimator. Let ϕ∗(⋅) = tϕt(⋅), \(M={\sum }_{i=1}^{n} \phi ^{*}\left (y_{i}-\boldsymbol {x}_{i}^{T} \hat {{\beta }}\right )\), where \(\hat {{\beta }}= T(\boldsymbol {S})\). In addition, ϕ∗(⋅) is symmetric about 0 and satisfies that \(\max \limits _{v} \phi ^{*}(v)=\phi ^{*}(0)=1\), ϕ∗(⋅) decreases monotonically toward both sides and \(\lim _{\vert v\vert \rightarrow \infty } \phi ^{*}(v)=0\).
First, we prove that \(T\left (\boldsymbol {S} \cup \boldsymbol {S}^{\prime }\right )\) remains bounded if m < M. Let ξ > 0 be such that m + nξ < M. Let C be such that ϕ∗(v) ≤ ξ for |v|≥ C. Let β be any real vector such that |y − xTβ|≥ C for all samples in S. Then, we have
and
From (16) and (17), we find that \(T\left (\boldsymbol {S} \cup \boldsymbol {S}^{\prime }\right )\) must satisfy \( \vert y-\boldsymbol {x}^{T} T\left (\boldsymbol {S} \cup \boldsymbol {S}^{\prime }\right ) \vert <C\) for a point in S because \({\sum }_{i=1}^{m+n} \phi ^{*}\left (y_{i}-\boldsymbol {x}_{i}^{T} T\left (\boldsymbol {S} \cup \boldsymbol {S}^{\prime }\right ) \right ) \geq M\). Thus, \(T\left (\boldsymbol {S} \cup \boldsymbol {S}^{\prime }\right )\) is bounded in the case of m < M.
Second, we prove that the estimator may breakdown if m > M. Similarly, let ξ > 0 such that m > M + mξ, and let C be such that ϕ∗(v) ≤ ξ for |v|≥ C. Assume that all arbitrary points \(\left \{\left (\boldsymbol {x}_{n+1}, y_{n+1}\right ), \ldots ,\left (\boldsymbol {x}_{n+m}, y_{n+m}\right )\right \}\) in \(\boldsymbol {S}^{\prime }\) are the same and satisfy a linear relationship y = xTβ∗. Let β be any vector such that \(\vert y_{n+1}-\boldsymbol {x}_{n+1}^{T} \beta \vert >C\). Then, we have
and
From (18) and (19), we find that \(T\left (\boldsymbol {S} \cup \boldsymbol {S}^{\prime }\right )\) must satisfy \( \vert y_{n+1}-\boldsymbol {x}_{n+1}^{T} T\left (S \cup S^{\prime }\right ) \vert \leq C\) because \({\sum }_{i=1}^{m+n} \phi ^{*}\) \(\left (y_{i}-\boldsymbol {x}_{i}^{T} T\left (\boldsymbol {S} \cup \boldsymbol {S}^{\prime }\right ) \right ) \geq m\). \(\vert T\left (\boldsymbol {S} \cup \boldsymbol {S}^{\prime }\right )\vert \) will go to infinity when we let \(y_{n+1} \rightarrow \infty \) and fix xn+ 1. Thus, we obtain the breakdown point. □
From the above theorem, we can see that the influencing factors of the breakdown point include not only the objective function ϕt(⋅) with the tuning parameter t, but also the sample configuration. However, as pointed out in [42, 44], the breakdown point is empirically quite high if the scale (contained in the hyperparameter t of TERMGL) is determined from the dataset itself.
Appendix B: Optimization details
1.1 B.1: Details for updating w
Denote the dual object \({w}=\left ({w}_{1}, \ldots , {w}_{{L}}\right ) \in \mathbb {R}^{{P} \times {L}}\), where \({w}_{l}=\left ({w}_{1 l}, {w}_{2 l}, \ldots , {w}_{P l}\right )^{{T}} \in \mathbb {R}^{{P}}\). According to the Fenchel-Rockafellar duality theorem, we can formulate the dual problem of (9) as
Moreover, this is a smooth constrained convex optimization problem. Here \({\mathscr{L}}^{*}\left (-\mathcal {T}_{\theta }^{*} w\right )\) is the Fenchel conjugate of \({\mathscr{L}}(\beta )\), and \(\mathcal {T}_{\theta }^{*} w={\sum }_{l=1}^{L} \mathcal {T}_{\theta _{l}} w_{l} \in \mathbb {R}^{P}\). \({\Omega }^{*}(w)={\sum }_{l=1}^{L} \delta \left (w_{l}\right )\) is the Fenchel conjugate of Ω(β), where the indicator function \(\delta \left (w_{l}\right )\) satisfies \(\delta \left (w_{l}\right )=0\) if \(\left \|w_{l}\right \|_{2} \leq \eta \), and \(+\infty \) otherwise.
According to the properties of the strongly-convex function, \({\mathscr{L}}^{*}\) can be differentiable everywhere with ε− 1-Lipschitz continuous gradient, and we also have that \(\nabla _{{w}}\left [{\mathscr{L}}^{*}\left (-\mathcal {T}_{\theta }^{*} {~w}\right )\right ]=-\left (\mathcal {T}_{\theta _{l}} \nabla {\mathscr{L}}^{*}\left (-\mathcal {T}_{\theta }^{*} {~w}\right )\right )_{1 \leq l \leq {L}} \in \mathbb {R}^{{P} \times {L}}\) is \(\left \|\left (\mathcal {T}_{\theta _{1}} \cdot \right )_{1 \leq l \leq {L}}\right \|_{2}^{2} \varepsilon ^{-1}\)-Lipschitz continuous. We can easily obtain the following primal-dual link with the properties of conjugate functions after Q iterations:
We next apply the forward-backward splitting method with Bregman proximity operator \(\text {prox}_{{\Omega }^{*}({w})}^{\varphi }\) to the dual problem in (20), where φ is set to be the separable Helinger-like function as follows:
Its Fenchel conjugate function is as follows:
By computing directly, we obtain
and
Denote q as the q-th iteration and γ as the step-size. To address the dual problem in (20), we update w, which is the dual variable of coefficient β, by the following iterative steps:
The last step above for updating the dual variable w is also employed in HQ-DFBB, as summarized in Algorithm 1. In the following, we state the details of updating 𝜃 to solve the upper level problem.
1.2 B.2: Details for updating 𝜃
Recalling the definitions given in Sections 3 and 4, we summarize the simplified bilevel problems.Upper level problem:
Lower level problem:
To update 𝜃, we then obtain the partial derivative calculator defined in Section 3.1 with the following backward gradient descent method,
where
We next specify the implementation details of the partial derivative computation. For m = 0,…,M − 1 and q = 0,…,Q − 1, we denote the equations in Algorithm 2 as
Thus, \(\mathcal {A}, {\mathscr{B}}, \mathcal {C}\) represent the updating function for the HQ parameter b, dual object w and coefficient β, respectively. Different from the process in BiGL [15], the key step for updating the HQ parameter b is added for calculation. Then, we summarize Algorithm 1 of HQ-DFBB as follows:
We focus on the computation of \(\left (\frac {d \hat {\beta }(\theta )}{d \theta }\right )^{T}\) to illustrate the relation between 𝜃m+ 1 and 𝜃m. Then we analyze the backward gradient details.
For simplicity, similar to [15], we define the partial derivatives by using the following abbreviations with an additional computation for b:
From the simplified algorithms in (27), we can easily find that
D contains the partial derivative with respect to 𝜃, and E is related to b.
We can further obtain the following equations:
and
According to the above results, we expand the analysis with (28).
Therefore, the hypergradient in (28) can be computed by the above steps.
Appendix C: Convergence analysis
For completeness, we succinctly provide the proof for Theorem 1.
Proof
(a) Since the function J(β,b) is strongly convex w.r.t. β, we have
According to the continuous properties of J(β,b) and b with respect to β, we have that \(\left \{J\left (\beta ^{(m)}, b^{(m)}\right ), m=1,2, \ldots \right \}\) is a decreasing sequence by analyzing the case of \(Q\rightarrow +\infty \). Since J(β,b) is bounded, \(\left \{J\left (\beta ^{(m)}, b^{(m)}\right ), m=1,2, {\ldots } M\right \}\) converges. Due to \(J(\beta )=\min \limits _{b} J(\beta , b)\), we have \(J\left (\beta ^{(m)}\right )=n^{-1} J\left (\beta ^{(m)}, b^{(m+1)}\right )\) from HQ optimization. Finally, the sequence \(\left \{J\left (\beta ^{(m)}\right ), m=1,2, \ldots \right \}\) also converges.
(b) Due to the ε-strongly convex w.r.t. β of J(β,b), we have
where G is the gradient of \(J\left (\bar {\beta }^{(m)}, \bar {b}^{(m)}\right )\) and could be zero as \(\bar {\beta }^{(m)}=\arg \min \limits _{\beta } J\left (\beta , \bar {b}^{(m)}\right )\). Considering the case of \(m \rightarrow 0\), we have \(J\left (\bar {\beta }^{(m+1)}, \bar {b}^{(m)}\right )-J\left (\bar {\beta }^{(m)}, \bar {b}^{(m)}\right ) \rightarrow 0\) and \(\left \|\bar {\beta }^{(m+1)}-\bar {\beta }^{(m)}\right \|^{2} \rightarrow 0\). From Section 3, we conclude that
and
Finally, we have \(\left \|\beta ^{(M)}-\beta ^{(M-1)}\right \|_{2}^{2} \rightarrow 0\) as \(m, Q \rightarrow +\infty \).
(c) Based on Theorem 3.1 in [38], we know that the result \(\left \{w^{(m, Q)}\right \}_{Q \in \mathbb {N}}\) generated in the HQ-DFBB algorithm is convergent to \(\hat {w}^{(m)}\) as \(Q \rightarrow +\infty \).
We denote b∗, \(\hat {w}^{*}\), β∗, \(\widetilde {X}^{*}=\sqrt {b^{*}} \odot X\) and \(\widetilde {y}^{*}=\sqrt {b^{*}} \odot y\) as the sequence generated by Algorithm 1 after \(m, Q \rightarrow +\infty \) iterations. When \(Q, m \rightarrow +\infty \), we obtain that b(m) is convergent as the weight b is continuous differential with respect to β. Then we analyze the primal-dual link function in the case of \(Q, m \rightarrow +\infty \),
We can rewrite the above formula as
From the definitions of Ω(β) and its dual function Ω∗(w), we have \(\left \|w_{l}^{(m, q)}\right \| \leq \eta \) for m ∈{1,…,M},l ∈ {1,…,L},q ∈{1,…,Q}. Therefore, under the case of \(Q, m \rightarrow +\infty \), we have \(\left \|w_{l}^{*}\right \| \leq \eta \). Let \(w_{l}^{*}=\frac {1}{\eta } \hat {w}_{l}^{*}\), and we can obtain \(\left \|w_{l}^{*}\right \| \leq 1\). Equation (29) can be rewritten as
By deriving Ω(β), we have
where δl satisfies \(\left \|\delta _{l}\right \|=1\) if \(\mathcal {T}_{\theta _{l}} \beta \neq 0\), and \(\left \|\delta _{l}\right \|<1\) otherwise (1 ≤ l ≤ L). Hence,
In summary, we obtain \(0 \in \partial _{\beta } J\left (\beta ^{*}, b^{*}\right )\). Finally, we know that β∗ is the unique result of \(J\left (\beta , b^{*}\right )\) since J(β,b) is strongly convex with respect to β. □
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, X., Wang, Y., Zhu, L. et al. Robust variable structure discovery based on tilted empirical risk minimization. Appl Intell 53, 17865–17886 (2023). https://doi.org/10.1007/s10489-022-04409-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04409-z