Skip to main content
Log in

Historical discrimination and optimal remediation

  • Original Paper
  • Published:
Social Choice and Welfare Aims and scope Submit manuscript

Abstract

I consider a society which is jointly committed to ensuring equal opportunity and to increasing aggregate wealth but is faced with the vestiges of past discrimination in the form of a historically skewed distribution of social resources. Focusing on the problem of allocating the existing (fixed) quantity of social inputs, I consider two policy instruments: directly transferring resources from the advantaged to the disadvantaged or affording preferential treatment in employment to the disadvantaged group (affirmative action). After describing the general procedure for determining an optimal policy, I demonstrate by means of an example that either of the instruments might constitute an optimal remediation policy and I identify conditions which favor each.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Of course, the nature of the resource asymmetries would vary. For example, in some cases, this might include access to social or political networks (such as male only clubs in the case of gender bias), or it might pertain to (other) economic or environmental resources (such as access to the global commons in the case of climate change).

  2. Such was the remedy sought by the Campaign for Fiscal Equity regarding New York City public schools and ordered by the State Supreme Court of New York in 2001. It is also the impetus for the U.S. House bill H.R. 40, Commission to Study Reparation Proposals for African Americans Act, which has been introduced in every Congress since 1989.

  3. The role of affirmative action here is to counteract the effects of historical discrimination in skewing the distribution of employment opportunities. There may be other reasons for AA, some of which are mentioned below. The present analysis should not be seen as competing with other explanations, but rather complementary to them. That is, it offers an additional dimension to the overall picture.

  4. Generally, there are two separate areas where incentive issues might arise: after agents have been matched to jobs, in their work effort; and before matching, where they might invest in human capital or exert effort in school in order to influence their job assignment. The paper abstracts from both of these.

    Concerning work effort, since job matching occurs after agents have been subject to the historically skewed distribution of educational resources and any sort of remedial action, then prima facie such policies cannot affect on-the-job effort (although they may affect job placement and thus whose preferences prevail at each wage). Moreover, empirical estimates of labor supply elasticities are quite small. (See footnote 13.) However, there are variations of the model involving incomplete information where remedial policies might alter the link between observable and unobservable variables and thus affect the evaluation of the expected outcome. That would be the case, for example, if work effort is unobservable but correlated with school effort or the willingness to invest in human capital. In any case, such variations lie beyond the scope of the present paper.

    On the other hand, concerning school effort or investment, any policy which affects the likelihood of job assignments might influence investment incentives. In “Appendix 2”, I sketch a possible modification of the present model to include such incentive effects, and I discuss the implications. In particular, I find that both resource transfers and affirmative action increase aggregate effort.

  5. In support of this approach, I would echo Atkinson (1970) who notes, “Dalton (1920) argued that we should approach the question [of income inequality measurement] by considering directly the form of the social welfare function to be employed.” He adds, “I hope that these conventional measures [of income inequality] will be rejected in favour of direct consideration of the properties that we should like the social welfare function to display”.

  6. In light of legislative efforts to eliminate discrimination and fiscal and monetary efforts to stimulate growth, it is obvious that these factors affect social welfare. Moreover, the voluminous economic literature on these topics attests to their significance.

  7. Although it is inevitable that some individuals will be helped and some will be hurt when seeking to redress past injustices.

  8. To clarify the terminology, I use the word “type” to refer to members of one of the two historical groups, that is, those in the disadvantaged group versus those in the advantaged group. I use the terms “innate ability” and “innate talent” interchangeably. In contrast, “effective” or “ex post” ability refers to the agents’ abilities subsequent to the influence of social factors.

  9. Otherwise, although discrimination might affect the levels of the agents’ productivities, it does not affect the order. Hence, the agents are assigned their “correct” positions.

  10. A word on the timing of the model is in order. Since the objective of policy analysis is to design and implement an institutional framework within which agents then engage, I assume that a past or present policy is in place prior to the realization of innate talents. Therefore, the focus of the policy maker is on expected wealth and the likelihood of inversions.

  11. I discuss the restrictiveness of the assumptions in the Sect. 5.

  12. There are several other papers which focus exclusively on the incentive effects of AA rather than on the welfare implications. These include Fu (2006), Calsamiglia et al. (2013) and Franke (2012).

  13. In fact, (microeconometric) estimates of individual labor supply elasticity tend to be quite small. (See Heckman 1993.) Also, given the work that has been done on incentives over the past several decades, economists are now sufficiently knowledgeable on the topic that if there were adverse incentive effects associated with AA, it might be possible to devise an appropriate supplemental scheme to address this issue directly.

  14. While I have in mind such (economic) factors as schooling and health, this might incorporate institutional or cultural impediments as well such as legal strictures or social attitudes. In any case, it is the historical difference in social resources that I refer to as the “vestiges of past discrimination”.

  15. I assume that abilities, both innate and effective, are observable or can be measured, say, by examination.

  16. Since \(s_{A}\le s_{B}\), A could not obtain job h over B undeservedly. That is, inversions always entail a less deserving type B agent obtaining h over a more deserving type A agent.

  17. For example, transferring school funding from wealthier school districts to poorer ones.

  18. If feasible policies were limited to those that lessen the degree of inequality, then \(\varepsilon \) would be restricted to the set \( [0,\varepsilon ^{\circ }]\). Conversely, if \(\varepsilon >\varepsilon ^{\circ }\), then the policy would entail increasing the gap between \(s_{A}\) and \(s_{B}\), for example, by creating elite schools or subsidizing private education.

  19. The figure includes a hypothetical social welfare contour.

  20. To clarify, I use the term inversion to refer to the case in which an individual with lower innate ability has higher effective ability as a result of differences in s. In contrast, a mismatch occurs when agents are assigned to the “wrong” jobs, that is, when the individual with lower effective ability is assigned to position h, either by mistake or as a result of affirmative action.

  21. Otherwise, for some \(a^{\circ }\) it would be advantageous to be of type A and receive \(s_{A}\) rather than to be of type B and receive \(s_{B}\).

  22. If \(\sigma \) is concave and \(\sigma _{as}>0\), then the minimum occurs at \( \underline{a}^{\circ }\).

  23. Again this is a slight abuse of notation but should not cause confusion.

  24. For example, under an AA program in which \(\delta >0\), it would be excessive to equalize resources (\(\varepsilon =0\)), for then \(\sigma (a^{\circ }, \widehat{s}-\varepsilon )>\sigma (a^{\circ },\widehat{s}+\varepsilon )-\delta \), i.e., a type A agent would have an unambiguous advantage.

  25. Clearly, such a minimum value exists since \(\sigma \) is continuous, \(\delta >\sigma (a^{\circ },\widehat{s}+\varepsilon )-\sigma (a^{\circ },\widehat{s} -\varepsilon )\) when \(\varepsilon =0\), and by assumption \(\delta \le \overline{\delta }\le \sigma (a^{\circ },\widehat{s}+\varepsilon ^{\circ })-\sigma (a^{\circ },\widehat{s}+\varepsilon ^{\circ })\).

  26. The solution to this two-step procedure would be the same if the order of optimization were reversed or if \(\delta \) and \(\varepsilon \) were determined simultaneously. Effectively, this amounts to solving one first order condition for W followed by the other.

  27. While there is some evidence that the effect of school resources on achievement is positive and concave, generally, this issue is controversial and fraught with methodological difficulties, including endogeneity of expenditures and intraschool correlations. (See Gibbons and McNally 2013a, b for recent surveys.) However, for the present purpose of demonstrating the procedure for determining an optimal policy and the possible role of AA this simple example will suffice.

  28. That is, for those values of \(\tau \) which do not reverse the order of the agents’ productivities.

  29. It is also here where the distinction between opportunity inequality and income inequality is most pronounced. If \(\alpha =0\), then the agent assigned to job \(\ell \) will have no earnings. However, if the two types are equally likely to be assigned to h, then there is no opportunity inequality and \(I(\varepsilon )=0\).

  30. Specifically, for \(\widehat{a}^{\circ }=3\) and \(D=2\).

  31. Recall \(\underline{\varepsilon }_{\delta }\) denotes the minimum value of \( \varepsilon \) for which \(\sigma (a^{\circ },\widehat{s}+\varepsilon )-\sigma (a^{\circ },\widehat{s}-\varepsilon )\ge \delta \), for all \(a^{\circ }\). Here, \(\underline{\varepsilon }_{\delta }=\frac{\delta \sqrt{4\widehat{s} -\delta ^{2}}}{2}\). Also, \(\bar{\delta }=D\).

  32. In general, \(\Gamma (\delta )\) is neither convex nor concave for all \(\delta \in [0,D]\).

  33. In this example, it appears that whenever the optimal AA bias \(\delta ^{*}>0\), the associated optimal transfer is \(\varepsilon _{2}(\delta ^{*})= \widehat{s}\), which is maximally regressive. (Although that cannot be established formally, since \(\Gamma (\delta )\) need not be convex—therefore, while each point \((EV,Q)\in \Theta \) is achievable, it has not been shown to be undominated within \({\cup }_{\delta \in \Delta } \Gamma (\delta )\).) However, as seen in Fig. 3 and discussed in Sect. 5, the behavior of \(\Gamma \) (and \(\Gamma (\delta )\)) is highly sensitive to the parameter values. Hence, this extreme result may be idiosyncratic, and alternative parameters or specifications may lead to interior transfer policies even under AA. Here, if one were to require that the distribution of social resources be no more skewed than the historical distribution, then rather than the envelope \( \Theta \) consisting of those combinations \((EV(\widehat{s},\delta ),Q( \widehat{s},\delta ))\) for \(0\le \delta \le D\), it would be comprised of \( (EV(\varepsilon ^{\circ },\delta ),Q(\varepsilon ^{\circ },\delta ))\). (This is depicted in Fig. 6 in “Appendix 1”.)

  34. Or the limit as \(\varepsilon \) approaches 0 from above.

  35. Although the policy would have a marginal effect on aggregate expected wealth but by influencing who is assigned to which job and thus whose preferences were to prevail at each wage. If agents have different preferences, this is likely to be idiosyncratic. If they have the same preferences, then the job placement effect would have little significance.

  36. Having specified agents’ preferences, I would point out that the welfare function \(W=\Phi (EV,Q)\), which directly expresses the dual social objectives of increasing aggregate wealth and equal opportunity, is distinctly non-welfarist.

  37. In particular, an A agent for whom \(a_{A}^{\circ }+1+\sqrt{s_{A}}<\underline{a}^{\circ }+\sqrt{s_{B}}\) would necessarily be assigned to job \(\ell \), while a B agent for whom \( a_{B}^{\circ }+\sqrt{s_{B}}>\overline{a}^{\circ }+1+\sqrt{s_{A}}\) would be assured of getting h. However, it is not necessary that these occur with certainty, but rather with sufficiently high probability.

  38. Recall \(\Lambda ^{\prime }(s_{i})=[\underline{a}^{\circ }+\sqrt{s_{i}}, \overline{a}^{\circ }+\sqrt{s_{i}}]\).

  39. The assignment would be discriminatory even if \(\sigma (a_{A}^{\circ },0,s_{A})<\sigma (a_{B}^{\circ },0,s_{B})<\sigma (a_{A}^{\circ },1,s_{A})\).

  40. Note that unless \(\sigma \) is separable in e, the magnitude of the effect might differ if one were to compare \(\sigma (a_{A}^{\circ },1,s_{A})\) to \( \sigma (a_{B}^{\circ },1,s_{B})\) rather than \(\sigma (a_{A}^{\circ },0,s_{A}) \) to \(\sigma (a_{B}^{\circ },0,s_{B})\). Nevertheless, each provides a consistent measure. (This is analogous to Hicks’s equivalent and compensating variations in measuring welfare changes.)

  41. For example, some groups might put greater emphasis on educational achievement and offer more positive reinforcement. This effect has clearly been identified by some U.S. charter schools, such as the KIPP Academy, which begin with the premise that high expectations lead to high achievement (see http://www.kipp.org/FAB8BFD0-B631-11E1-A076005056883C4D).

References

  • Arrow KJ (1973) The theory of discrimination. In: Ashenfelter O, Rees A (eds) Discrimination in labor markets. Princeton University Press, Princeton

    Google Scholar 

  • Atkinson AB (1970) On the measurement of inequality. J Econ Theory 2:244–263

    Article  Google Scholar 

  • Austen-Smith D, Wallerstein M (2006) Redistribution and affirmative action. J Public Econ 90:1789–1823

    Article  Google Scholar 

  • Becker GS (1957) The economics of discrimination. University of Chicago Press, Chicago

    Google Scholar 

  • Calsamiglia C, Franke J, Rey-Biel P (2013) The incentive effects of affirmative action in a real-effort tournament. J Public Econ 96:15–31

    Article  Google Scholar 

  • Chung K-S (2000) Role models and arguments for affirmative action. Am Econ Rev 99:640–648

    Article  Google Scholar 

  • Coate S, Loury GC (1993) Will affirmative action policies eliminate negative stereotypes? Am Econ Rev 83:1220–1240

    Google Scholar 

  • Dalton H (1920) The measurement of the inequality of incomes. Econ J 30:348–361

    Article  Google Scholar 

  • Fang H, Moro A (2011) Theories of statistical discrimination and affirmative action: a survey. In: Benhabib J, Bisin A, Jackson M (eds) Handbook of social economics, vol 1A. North-Holland, Amsterdam

    Google Scholar 

  • Foster DP, Vohra RV (1992) An economic argument for affirmative action. Ration Soc 4(2):176–188

  • Franke J (2012) Affirmative action in contest games. Eur J Polit Econ 28(1):105–118

    Article  Google Scholar 

  • Fryer Jr. RG, Loury GC (2013) Valuing identity. J Polit Econ 121(4):747–774

    Article  Google Scholar 

  • Fu Q (2006) A theory of affirmative action in college admissions. Econ Inq 44(3):420–428

    Article  Google Scholar 

  • Gibbons S, McNally S (2013a) Does school spending matter? LSE, Centre for Economic Performance, CentrePiece Autumn

  • Gibbons S, McNally S (2013b) The effects of resources across school phases: a summary of recent evidence. In: LSE/CEP discussion paper CEPDP1226

  • Heckman JJ (1993) What has been learned about labor supply in the past twenty years? Am Econ Rev 83(2):116–121

    Google Scholar 

  • Lundberg SJ (1991) The enforcement of equal opportunity laws under imperfect information: affirmative action and alternatives. Q J Econ 106:309–326

    Article  Google Scholar 

  • Moro A, Norman P (2003) Affirmative action in a competitive economy. J Public Econ 87:567–594

    Article  Google Scholar 

  • Phelps ES (1972) The statistical theory of racism and sexism. Am Econ Rev 62:659–661

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laurence Kranich.

Additional information

L. Kranich would like to thank Jerry Marschke and Mike Jerison for useful conversations. L. Kranich would also like to thank seminar participants at Fondation Maison des sciences de l’homme, Paris; the Institute for Economic Research, Hitosubashi University; and the 12th Meeting of the Society for Social Choice and Welfare, Boston College, as well as two anonymous referees. Financial support from FMSH and IER is gratefully acknowledged. Finally, L. Kranich would like to thank William Thomson for his inspiration and guidance over many years.

Appendices

Appendix 1: Proofs

This appendix contains proofs of some of the lemmas in Sect. 4.

Lemma 5

When \(\alpha = 0\), \(\Gamma \) is convex and exhibits the asymptotic behavior \({lim}_{\varepsilon \rightarrow 0} \frac{dQ/d\varepsilon }{dEV/d\varepsilon }=-\infty \) and \({lim}_{ \varepsilon \rightarrow \widehat{s}}\frac{dQ/d\varepsilon }{ dEV/d\varepsilon }=-\frac{1}{D}\).

Proof

As in the proof of Proposition 4,

\(\frac{dQ}{dEV}=\frac{\left( D-\left( \sqrt{\widehat{s}+\varepsilon }- \sqrt{\widehat{s}-\varepsilon }\right) \right) \left( \sqrt{\widehat{s}+\varepsilon }+\sqrt{\widehat{s}-\varepsilon }\right) }{D^{2}\left( \alpha \sqrt{\widehat{ s}+\varepsilon }-\sqrt{\widehat{s}-\varepsilon }\right) +\frac{1}{2} (1-\alpha )\left( D-\left( \sqrt{\widehat{s}+\varepsilon }-\sqrt{\widehat{s} -\varepsilon }\right) \right) ^{2}\left( \sqrt{\widehat{s}+\varepsilon }+\sqrt{ \widehat{s}-\varepsilon }\right) }\).

Hence, at \(\alpha =0\), \(\frac{dQ}{dEV}=\frac{\left( D-(\sqrt{ \widehat{s}+\varepsilon }-\sqrt{\widehat{s}-\varepsilon })\right) \left( \sqrt{\widehat{s}+\varepsilon }+\sqrt{\widehat{s}-\varepsilon }\right) }{ -D^{2}\sqrt{\widehat{s}-\varepsilon }+\frac{1}{2}\left( D-(\sqrt{\widehat{s} +\varepsilon }-\sqrt{\widehat{s}-\varepsilon })\right) ^{2}\left( \sqrt{ \widehat{s}+\varepsilon }+\sqrt{\widehat{s}-\varepsilon }\right) }\). Upon further simplification, this can be expressed as \(\frac{dQ}{dEV}=\frac{ D\left( \sqrt{\widehat{s}+\varepsilon }+\sqrt{\widehat{s}-\varepsilon } \right) -2\varepsilon }{(\frac{1}{2}D^{2}+\varepsilon )(\sqrt{\widehat{s} +\varepsilon }-\sqrt{\widehat{s}-\varepsilon })-2D\varepsilon }\). Using \( \widehat{s}=\frac{D^{2}}{2}\), \({lim}_{\varepsilon \rightarrow 0} \frac{dQ/d\varepsilon }{dEV/d\varepsilon }=-\infty \) can be verified directly noting that the denominator is decreasing in \(\varepsilon \) at zero. Similarly, \({lim}_{\varepsilon \rightarrow \widehat{s}}\frac{ dQ/d\varepsilon }{dEV/d\varepsilon }=-\frac{1}{D}\) can be shown directly using l’Hôpital’s rule.

Next, computing \(\frac{d^{2}Q}{dEV^{2}}\) from the above expression, it can be seen that \(\frac{d^{2}Q}{dEV^{2}}>0\) if and only if

$$\begin{aligned}&\begin{array}{c} 2\varepsilon ^{2}\sqrt{\widehat{s}+\varepsilon }-2\widehat{s} D^{3}+2\varepsilon ^{2}\sqrt{\widehat{s}-\varepsilon }-2D\sqrt{\widehat{s} -\varepsilon }\left( \widehat{s}+\varepsilon \right) ^{\frac{3}{2}}+2D\left( \widehat{s}-\varepsilon \right) ^{\frac{3}{2}}\sqrt{\widehat{s}+\varepsilon }\\ +\,6\widehat{s}D^{2}\sqrt{\widehat{s}+\varepsilon }-3D^{2}\varepsilon \sqrt{ \widehat{s}+\varepsilon }+2\widehat{s}D^{2}\sqrt{\widehat{s}-\varepsilon } +D^{2}\varepsilon \sqrt{\widehat{s}-\varepsilon }-4\widehat{s}D\varepsilon >0 \text {.} \end{array} \end{aligned}$$
(31)

Manipulating (31) and using \(\widehat{s}=\frac{D^{2}}{2}\), \( \frac{d^{2}Q}{dEV^{2}}>0\) if and only if

$$\begin{aligned}&(2\varepsilon ^{2}-3D^{2}\varepsilon +3D^{4})\sqrt{\widehat{s}+\varepsilon } +(2\varepsilon ^{2}+D^{2}\varepsilon +D^{4})\sqrt{\widehat{s}-\varepsilon } -4D\varepsilon \sqrt{\widehat{s}+\varepsilon }\sqrt{\widehat{s}-\varepsilon }\nonumber \\&\quad -D^{5}-2D^{3}\varepsilon >0\text {,} \end{aligned}$$
(32)

or

$$\begin{aligned}&(2\varepsilon ^{2}-3D^{2}\varepsilon +3D^{4})\sqrt{\widehat{s}+\varepsilon } +(2\varepsilon ^{2}+D^{2}\varepsilon +D^{4})\sqrt{\widehat{s}-\varepsilon } -4D\varepsilon \sqrt{\widehat{s}+\varepsilon }\sqrt{\widehat{s}-\varepsilon }\nonumber \\&\quad >D^{5}+2D^{3}\varepsilon \text {.} \end{aligned}$$
(33)

Notice the RHS of (33) is linear in \(\varepsilon \). Also, \(D^{5}+2D^{3}\varepsilon <2D^{5}\) on the interior of the interval \([0, \widehat{s}]\) and \(D^{5}+2D^{3}\varepsilon =2D^{5}\) at \(\varepsilon = \widehat{s}\). Therefore, it is sufficient to show the LHS \((2\varepsilon ^{2}-3D^{2}\varepsilon +3D^{4})\sqrt{\widehat{s}+\varepsilon }+(2\varepsilon ^{2}+D^{2}\varepsilon +D^{4})\sqrt{\widehat{s}-\varepsilon }-4D\varepsilon \sqrt{\widehat{s}+\varepsilon }\sqrt{\widehat{s}-\varepsilon }>2D^{5}\) on \( (0,\widehat{s})\).

For convenience, let \(T_{1}(\varepsilon )=(2\varepsilon ^{2}-3D^{2}\varepsilon +3D^{4})\sqrt{\widehat{s}+\varepsilon }\). It is easily verified that \(T_{1}(0)=\frac{3}{\sqrt{2}}D^{5}>2D^{5}\), \(T_{1}( \widehat{s})=2D^{5}\). and \(\frac{dT_{1}}{d\varepsilon }<0\) on \((0,\widehat{s} )\). Hence, \(T_{1}(\varepsilon )>2D^{5}\) uniformly on \((0,\widehat{s})\).

Finally, I establish that \((2\varepsilon ^{2}+D^{2}\varepsilon +D^{4})\sqrt{ \widehat{s}-\varepsilon }-4D\varepsilon \sqrt{\widehat{s}+\varepsilon }\sqrt{ \widehat{s}-\varepsilon }>0\), or \((2\varepsilon ^{2}+D^{2}\varepsilon +D^{4})>4D\varepsilon \sqrt{\widehat{s}+\varepsilon }\), on \((0,\widehat{s})\) . Again for convenience, denote \(T_{2}(\varepsilon )=(2\varepsilon ^{2}+D^{2}\varepsilon +D^{4})\) and \(T_{3}(\varepsilon )=4D\varepsilon \sqrt{ \widehat{s}+\varepsilon }\). It is also easily verified that \(T_{2}(0)=D^{4}\) , \(T_{3}(0)=0\), \(T_{2}(\widehat{s})=T_{3}(\widehat{s})=2D^{4}\), and \(\frac{ dT_{3}}{d\varepsilon }>\frac{dT_{2}}{d\varepsilon }>0\) on \((0,\widehat{s})\). The latter implies \(T_{2}(\varepsilon )\) and \(T_{3}(\varepsilon )\) are monotonic over the interval and uniformly \(T_{2}(\varepsilon )>T_{3}(\varepsilon )\). \(\square \)

Lemma 6

For each \(\delta \in (0,D)\), the slope along \( \Gamma (\delta )\), \(\frac{\partial Q/\partial \varepsilon }{\partial EV/\partial \varepsilon }<0\).

Proof

In this case, from (30) and (28 ) one obtains

\(\frac{\partial Q/\partial \varepsilon }{\partial EV/\partial \varepsilon }=\frac{(D+\delta )( \sqrt{\widehat{s}+\varepsilon }+\sqrt{ \widehat{s}-\varepsilon }) -2\varepsilon }{(\widehat{s}+\varepsilon )( \sqrt{\widehat{s}+\varepsilon }-\sqrt{\widehat{s}-\varepsilon })-\frac{1}{2} \delta ^{2}( \sqrt{\widehat{s}+\varepsilon }+\sqrt{\widehat{s} -\varepsilon }) -2D\varepsilon }\), when \(\alpha =0\). Let \( T(\varepsilon )\) denote the numerator of this expression and \(B(\varepsilon ) \) denote the denominator. First, \(T(0)=(D+\delta )(2\frac{D}{\sqrt{2}})>0\) and \(T(\widehat{s})=\delta D>0\). Moreover, \(\frac{dT}{d\varepsilon }=\) \(-\frac{1}{2\sqrt{\widehat{s}+\varepsilon }\sqrt{\widehat{s} -\varepsilon }}( (D+\delta )(\sqrt{\widehat{s}+\varepsilon }-\sqrt{ \widehat{s}-\varepsilon })+4\sqrt{\widehat{s}+\varepsilon }\sqrt{\widehat{s} -\varepsilon }) <0\), for \(\varepsilon \in [0,\widehat{s}]\), or T is monotonic on the interval. Hence, the numerator is uniformly positive. It is sufficient, therefore, to show that \(B(\varepsilon )\le 0\). Indeed, since this must be true for all \(\delta \in [0,D]\), it is sufficient to show \((\widehat{s}+\varepsilon )(\sqrt{\widehat{s}+\varepsilon }-\sqrt{ \widehat{s}-\varepsilon })-2D\varepsilon \le 0\), or \((\widehat{s} +\varepsilon )(\sqrt{\widehat{s}+\varepsilon }-\sqrt{\widehat{s}-\varepsilon })\le 2D\varepsilon \). Note that \((\widehat{s}+\varepsilon )(\sqrt{\widehat{ s}+\varepsilon }-\sqrt{\widehat{s}-\varepsilon })=2D\varepsilon \) at \( \varepsilon =0\) and \(\varepsilon =\widehat{s}\). It remains to be shown that \( (\widehat{s}+\varepsilon )(\sqrt{\widehat{s}+\varepsilon }-\sqrt{\widehat{s} -\varepsilon })<2D\varepsilon \) for \(0<\varepsilon <\widehat{s}\).

First, \(\frac{\partial }{\partial \varepsilon }(\widehat{s}+\varepsilon )( \sqrt{\widehat{s}+\varepsilon }-\sqrt{\widehat{s}-\varepsilon })=\frac{1}{2 \sqrt{\widehat{s}-\varepsilon }}( 3\varepsilon -\widehat{s}+3\sqrt{ \widehat{s}+\varepsilon }\sqrt{\widehat{s}-\varepsilon }) \). At \( \varepsilon =0\),Footnote 34 this equals \( \frac{D}{\sqrt{2}}<2D\). Hence, for small \(\varepsilon >0\), \((\widehat{s} +\varepsilon )(\sqrt{\widehat{s}+\varepsilon }-\sqrt{\widehat{s}-\varepsilon })<2D\varepsilon \). To show that \((\widehat{s}+\varepsilon )(\sqrt{\widehat{s }+\varepsilon }-\sqrt{\widehat{s}-\varepsilon })<2D\varepsilon \) uniformly for \(0<\varepsilon <\widehat{s}\), it is sufficient to show that \(\frac{1}{2 \sqrt{\widehat{s}-\varepsilon }}( 3\varepsilon -\widehat{s}+3\sqrt{ \widehat{s}+\varepsilon }\sqrt{\widehat{s}-\varepsilon }) \) increases monotonically on \((0,\widehat{s})\). (That is, since \((\widehat{s} +\varepsilon )(\sqrt{\widehat{s}+\varepsilon }-\sqrt{\widehat{s}-\varepsilon })=2D\varepsilon \) at \(\varepsilon =0\) and \(\varepsilon =\widehat{s}\), in order for there to be an \(\varepsilon \) at which \((\widehat{s}+\varepsilon )( \sqrt{\widehat{s}+\varepsilon }-\sqrt{\widehat{s}-\varepsilon } )>2D\varepsilon \), it must be the case that \(\frac{1}{2\sqrt{\widehat{s} -\varepsilon }}( 3\varepsilon -\widehat{s}+3\sqrt{\widehat{s} +\varepsilon }\sqrt{\widehat{s}-\varepsilon }) \) decreases for some \( \varepsilon \in (0,\widehat{s})\).) By direct computation, \(\frac{\partial }{ \partial \varepsilon }\frac{1}{2\sqrt{\widehat{s}-\varepsilon }}( 3\varepsilon -\widehat{s}+3\sqrt{\widehat{s}+\varepsilon }\sqrt{\widehat{s} -\varepsilon }) =\frac{1}{4\sqrt{\widehat{s}+\varepsilon }( \widehat{s}-\varepsilon ) ^{\frac{3}{2}}}3( \widehat{s} -\varepsilon ) ^{\frac{3}{2}}+5\widehat{s}\sqrt{\widehat{s} +\varepsilon }-3\varepsilon \sqrt{\widehat{s}+\varepsilon }\). Since \(5 \widehat{s}>3\varepsilon \) for all \(\varepsilon \in (0,\widehat{s})\), this establishes the result.

\(\square \)

Lemma 7

\(\Theta \) is concave and has extreme values \((EV( \widehat{s},0),Q(\widehat{s},0))=(\widehat{a}^{\circ }+D,0)\) and \((EV( \widehat{s},D),Q(\widehat{s},D))=(\widehat{a}^{\circ }+\frac{2}{3}D,\frac{1}{ 2})\). Also, along \(\Theta \), \({lim}_{\delta \rightarrow 0}\frac{ \partial Q/\delta }{\partial EV/\partial \delta }=-\infty \) and \({lim}_{ \delta \rightarrow D}\frac{\partial Q/\delta }{\partial EV/\partial \delta }=-\frac{1}{D}\).

Proof

From (30) and (28), \(\frac{ \partial }{\partial \delta }Q(\widehat{s},\delta )=\frac{\delta }{D^{2}}\) while \(\frac{\partial }{\partial \delta }EV(\widehat{s},\delta )=-\frac{ \delta ^{2}}{D^{2}}\). Hence, \(\frac{\partial Q/\partial \delta }{\partial EV/\partial \delta }=-\frac{1}{\delta }\). The concavity of \(\Theta \) follows immediately from the monotonicity of \(\frac{\partial Q/\partial \delta }{ \partial EV/\partial \delta }\) in \(\delta \). \(\square \)

Figure 6. depicts the effect of requiring that the distribution of social resources be no more skewed than the historical distribution, i.e., imposing \(\varepsilon ^{\circ }\) as an upper bound on the extent of discrimination, as described in footnote 33. In particular, the bold locus indicates the lower terminus of \(\Gamma (\varepsilon ,\delta )\) for each value of \(\delta \) in the event \(\varepsilon =\varepsilon ^{\circ }\) rather than \(\varepsilon =\widehat{s}\).

Fig. 6
figure 6

Optimal remediation with \(\varepsilon ^{\circ }\) lower bound

Appendix 2: Investment incentives

This appendix sketches a modification of the model to include incentive effects. First, as explained in footnote 4, if labor supply were endogenous, then since effort is expended after agents have been assigned to jobs the alternative remedial policies could not affect the individual labor supply decisions.Footnote 35 However, suppose agents invest in human capital or, equivalently, exert effort in school, and this affects the outcome of the education process; that is, suppose the education technology is \(\sigma (a_{i}^{\circ },e_{i},s_{i})\), rather than \(\sigma (a_{i}^{\circ },s_{i})\), where \(e_{i}\) denotes i’s school effort.

Agent i’s preferences are of the form \(u(y_{i},e_{i})\), where \( y_{i}=w_{i}a_{i}^{\prime }\), and u is increasing in the first argument and decreasing in the second. It is natural to assume that i knows its own innate ability at the time of choosing \(e_{i}\) but does not know the ability of agent with whom it will eventually compete for jobs. However, the distribution of \(a^{\circ }\), namely f, is assumed to be common knowledge. Hence, i chooses \(e_{i}\) to maximize its expected utility which depends on anticipated wage earnings and effort.

Subsequent to the choice of \(e_{i}\) by each agent, the matching formula is the same. That is, in the absence of affirmative action, the agent with the higher effective ability is assigned to job h and the agent with the lower ability is assigned to \(\ell \). With affirmative action, agent A is assigned to job h if and only if \(a_{A}^{\prime }+\delta >a_{B}^{\prime }\) . Since i’s assignment and, hence, payoff is affected by j’s effort choice, this generates a game with incomplete information or a Bayesian game. Here, agent i’s type is given by \(a_{i}^{\circ }\), and a strategy is a mapping \(t_{i}\) which associates a level of effort with each \(a_{i}^{\circ }\in \Lambda ^{\circ }\). (I thus refer to “agent \( a_{i}^{\circ }\)”.) A Bayesian Nash (BN) equilibrium is a pair of strategies \((t_{A}^{*},t_{B}^{*})\) such that for each \( a_{i}^{\circ }\), \(t_{i}^{*}(a_{i}^{\circ })\) is a best response to \( t_{j}^{*}\) measured with respect to f, for \(i,j=A,B,i\ne j\). After associating equilibrium effort levels with each vector of types \( (a_{A}^{\circ },a_{B}^{\circ })\), one can then evaluate expected aggregate wealth and inequality as before.Footnote 36

To demonstrate, consider an extension of the previous example where \( a_{i}^{\circ }\) is distributed uniformly on \(\Lambda ^{\circ }\), \(\sigma (a_{i}^{\circ },e,s_{i})=a_{i}^{\circ }+e_{i}+\sqrt{s_{i}}\) and \( u(y_{i},e_{i})=y_{i}-ce_{i}\). Also, to simplify even further, assume \( e_{i}\in \{0,1\}\); that is, agents either exert effort or they do not. In this case, a BN equilibrium is described heuristically as follows. First, given agent j’s strategy, \(t_{j}\), let \(Ew(a_{i}^{\circ },e_{i},t_{j})\) denote the expected wage of agent \(a_{i}^{\circ }\) who chooses effort \(e_{i}\) . Then the change in this agent’s expected benefit (utility net of costs) from choosing \(e_{i}=1\) versus \(e_{i}=0\) can be written as

$$\begin{aligned} \Delta EB(a_{i}^{\circ },t_{j})= & {} \sigma (a_{i}^{\circ },1,s_{i})Ew(a_{i}^{\circ },1,t_{j})- \sigma (a_{i}^{\circ },0,s_{i})Ew(a_{i}^{\circ },0,t_{j}) \end{aligned}$$
(34a)
$$\begin{aligned}= & {} [\sigma (a_{i}^{\circ },1,s_{i})-\sigma (a_{i}^{\circ },0,s_{i})]Ew(a_{i}^{\circ },1,t_{j}) \nonumber \\&\qquad +\,\sigma (a_{i}^{\circ },0,s_{i})[Ew(a_{i}^{\circ },1,t_{j})-Ew(a_{i}^{\circ },0,t_{j})] \end{aligned}$$
(34b)
$$\begin{aligned}= & {} Ew(a_{i}^{\circ },1,t_{j}) \nonumber \\&\qquad +\,(a_{i}^{\circ }+\sqrt{s_{i}})[Ew(a_{i}^{\circ },1,t_{j})-Ew(a_{i}^{\circ },0,t_{j})]\text {.} \end{aligned}$$
(34c)

One can distinguish two effects of effort on expected benefits corresponding to the two components in (34b). The first is the productivity effect: since effort increases productivity and, hence, earnings in either job, it may be worth choosing \(e=1\) on this basis alone, even if \( [Ew(a_{i}^{\circ },1,t_{j})-Ew(a_{i}^{\circ },0,t_{j})]=0\). The second is the competitive effect, where exerting greater effort increases the likelihood of obtaining job h.

The best response of agent \(a_{i}^{\circ }\) to \(t_{j}\) is determined by comparing \(\Delta EB(a_{i}^{\circ },t_{j})\) to c: if \(\Delta EB(a_{i}^{\circ },t_{j})>c\), the agent should choose \(e_{i}=1\); and if \( \Delta EB(a_{i}^{\circ },t_{j})<c\), then it should choose \(e_{i}=0\). Indeed, if c is very high, namely, if \(c>\Delta EB(a_{i}^{\circ },t_{j})\) for all \( a_{i}^{\circ }\in \Lambda ^{\circ }\), then i will always choose \(e_{i}=0\); and if \(c<Ew(a_{i}^{\circ },1,t_{j})\) for all \(a_{i}^{\circ }\), then it would choose \(e_{i}=1\) on the basis of the productivity effect alone. Therefore, the interesting case is that in which \(Ew(a_{i}^{\circ },1,t_{j})<c<\Delta EB(a_{i}^{\circ },t_{j})\) for some \(a_{i}^{\circ }\). For such values, one can see from (34c) that if the competitive effect is sufficiently small, then \(\Delta EB(a_{i}^{\circ },t_{j})<c\) and the optimal choice is \(e_{i}=0\). For A, this would occur when \(a_{A}^{\circ }\) is so low that it is unlikely to be assigned job h even if it were to choose \( e_{A}=1\), and for B this would occur when \(a_{B}^{\circ }\) is so high that it is likely to be assigned job h even if it were to choose \(e_{B}=0\).Footnote 37 These are most likely to occur when the overlap between \(\Lambda ^{\prime }(s_{A})\) and \( \Lambda ^{\prime }(s_{B})\) is small,Footnote 38 that is, when \(\sqrt{s_{B}}-\sqrt{ s_{A}}\) is large.

This suggests that a BN equilibrium would have the form

$$\begin{aligned} t_{A}^{*}(a_{A}^{\circ })= & {} \left\{ \begin{array}{cc} 0 &{}\quad \text {if }\,\,a_{A}^{\circ }<\widetilde{a}_{A}^{\circ } \\ 1 &{}\quad \text {if }\,\,a_{A}^{\circ }\ge \widetilde{a}_{A}^{\circ } \end{array} \right. , \end{aligned}$$
(35)
$$\begin{aligned} t_{B}^{*}(a_{B}^{\circ })= & {} \left\{ \begin{array}{cc} 1 &{} \quad \text {if }\,a_{B}^{\circ }<\widetilde{a}_{B}^{\circ } \\ 0 &{} \quad \text {if }\,a_{B}^{\circ }\ge \widetilde{a}_{B}^{\circ } \end{array} \right. , \end{aligned}$$
(36)

where \(\widetilde{a}_{i}^{\circ }\in \Lambda ^{\circ }\) is determined by the equation

$$\begin{aligned} c=\Delta EB(\widetilde{a}_{i}^{\circ },t_{j}^{*}) \end{aligned}$$
(37)

if such a solution exists. Otherwise, if \(c<\Delta EB(a_{i}^{\circ },t_{j}^{*})\) for all \(a_{i}^{\circ }\in \Lambda ^{\circ }\), then \( \widetilde{a}_{A}^{\circ }=\underline{a}^{\circ }\) and \(\widetilde{a} _{B}^{\circ }=\overline{a}^{\circ }\); and if \(c>\Delta EB(a_{i}^{\circ },t_{j}^{*})\) for all \(a_{i}^{\circ }\in \Lambda ^{\circ }\), then \( \widetilde{a}_{A}^{\circ }=\overline{a}^{\circ }\) and \(\widetilde{a} _{B}^{\circ }=\underline{a}^{\circ }\).

Turning to the evaluation of welfare, a BN equilibrium of the above form (endogenously) partitions the type space \(\Lambda ^{\circ }\times \Lambda ^{\circ }\) into four quadrants: \([\underline{a}^{\circ },\widetilde{a} _{A}^{\circ })\times [\underline{a}^{\circ },\widetilde{a}_{B}^{\circ })\), \([\widetilde{a}_{A}^{\circ },\overline{a}^{\circ }]\times [\underline{a}^{\circ },\widetilde{a}_{B}^{\circ })\), \([\underline{a}^{\circ },\widetilde{a}_{A}^{\circ })\times [\widetilde{a}_{B}^{\circ }, \overline{a}^{\circ }]\), and \(=[\widetilde{a}_{A}^{\circ },\overline{a} ^{\circ }]\times [\widetilde{a}_{B}^{\circ },\overline{a}^{\circ }]\). EV can be determined by computing the expected value of \(V(\mathbf {a} ^{\circ },\mathbf {s})=a_{H}^{\prime }w^{h}+a_{L}^{\prime }w^{\ell }\) over the partition, where \(a_{H}^{\prime }=\max \{a_{A}^{\prime },a_{B}^{\prime }\}\), \(a_{L}^{\prime }=\min \{a_{A}^{\prime },a_{B}^{\prime }\}\), and \( a_{i}^{\prime }=\sigma (a_{i}^{\circ },t_{i}^{*}(a_{i}^{\circ }),s_{i})\) for \(i=A,B\).

Regarding the likelihood of inversions, if the agents choose the same level of e, then the evaluation of I is straightforward; that is, if they both choose \(e=0\) or both choose \(e=1\), then the analysis would be the same as in the main text, although the values of \(\sigma (a_{A}^{\circ },e,s_{A})\) and \( \sigma (a_{B}^{\circ },e,s_{B})\) would differ in the two cases. Also, in the event \(a_{A}^{\circ }>a_{B}^{\circ }\) and \(\sigma (a_{A}^{\circ },1,s_{A})<\sigma (a_{B}^{\circ },0,s_{B})\), then it is clear that the resulting job assignment is the result of discrimination, although the criterion for inversion is significantly weaker.Footnote 39 The difficulty lies in the case in which \(a_{A}^{\circ }>a_{B}^{\circ }\) and \(\sigma (a_{A}^{\circ },0,s_{A})<\sigma (a_{B}^{\circ },1,s_{B})\). Then it is not clear whether B would obtain job h because of the asymmetry in s or because B exerted greater effort. However, it is easy to accommodate this as well as the previous case in which \(a_{A}^{\circ }>a_{B}^{\circ }\) and \(\sigma (a_{A}^{\circ },1,s_{A})<\sigma (a_{B}^{\circ },0,s_{B})\).

Since the education technology \(\sigma (a_{i}^{\circ },e_{i},s_{i})\) is assumed to be known, one can accurately isolate and measure the effect of s by evaluating the effective productivities if the agents had chosen the same effort level. That is, even if it were the case that \(e_{A}\ne e_{B}\), it is possible to consider what \(\sigma \) would have been had they chosen the same level, i.e., to compare \(\sigma (a_{A}^{\circ },0,s_{A})\) to \(\sigma (a_{B}^{\circ },0,s_{B})\) or \(\sigma (a_{A}^{\circ },1,s_{A})\) to \( \sigma (a_{B}^{\circ },1,s_{B})\) in order to render the comparison meaningful.Footnote 40 In this case, the measure of inversions is the same as in the main section, namely,

$$\begin{aligned} I(\mathbf {s})=\int _{\underline{a}^{\circ }}^{\overline{a}^{\circ }}\int _{a_{A}^{\prime }}^{\sigma (a_{A}^{\circ },e,s_{B})}f_{B}(a_{B}^{\prime })f(a_{A}^{\circ })da_{B}^{\prime }da_{A}^{\circ }\text {,} \end{aligned}$$

where e is the common effort level.

The next question is how including school effort is likely to affect the policy instruments and the determination of an optimal policy. First, as mentioned above, when the agents choose the same level of effort, either 0 or 1, then the policy analysis will be the same. However, aggregate wealth would be greater if the agents were to choose \(e=1\) than if they were to choose \(e=0\). Generally, as discussed above, the agents are more likely to choose \(e=0\) (in equilibrium) when the overlap \(\Lambda ^{\prime }(s_{A})\cap \Lambda ^{\prime }(s_{B})\) is small or \(\sqrt{s_{B}}-\sqrt{s_{A} }\) is large. Since resource transfers directly increase the overlap, they will increase the degree of competition between A and B and thus induce greater effort. Affirmative action will also induce greater effort, but via a different mechanism. Rather than changing the support of the distributions of ex post ability, the effect of AA would be to lower \(\widetilde{a} _{A}^{\circ }\) and to raise \(\widetilde{a}_{B}^{\circ }\), thus increasing the proportion of both types who choose \(e=1\). In effect, AA will render the outcome more uncertain for the marginal A and B agents. For the marginal A agent, AA increases the probability that it will be assigned job h, thereby placing h “within reach” and thus increasing the competitive effect in (34c). For the marginal B agent, AA decreases the probability of h, thus putting the assignment “at risk”, and requiring greater effort to maintain.

In summary, both policies will lead to increased effort. When both agents do choose \(e=1\), the analysis of inversions will be similar to that in the main text. It will also increase expected aggregate wealth relative to the case in which both choose \(e=0\). However, the overall effect on expected aggregate wealth is again likely to be highly dependent on the parameters of the problem and worthy of systematic study.

A full analysis of investment incentive effects and the determination of an optimal policy is beyond the scope of the present paper. Before undertaking such an investigation, however, there are several fundamental issues to consider. First, as mentioned earlier in footnote 36, the approach is non-welfarist in that the social goals are unrelated to the preferences of the agents. Next, when effort is endogenous, it is implicitly assumed that greater opportunities resulting from higher levels of effort are well-deserved and need not be subject to remediation. However, if one interprets the model as describing the effect of primary and secondary education on job opportunities, then one might ask how young is too young to be fully responsible for the consequences of one’s decisions? And finally, both here and in the general discussion of responsibility, there is the presumption that effort is independent of circumstances, but that may not be the case; it may be that effort is influenced by external resources and education culture.Footnote 41 Exploring these issues as well as developing the model more thoroughly and verifying the previous conjectures is left for future research.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kranich, L. Historical discrimination and optimal remediation. Soc Choice Welf 48, 239–265 (2017). https://doi.org/10.1007/s00355-016-0952-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00355-016-0952-5

Navigation