skip to main content
10.1145/3520304.3529028acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
poster

Optimizing sample diversity with fairness constraints on imbalanced, sparse, hiring data

Published:19 July 2022Publication History

ABSTRACT

There are many cases where one may wish to retrieve non-random, diverse, and fair samples from an imbalanced dataset. With over 90K tech job descriptions and corresponding resumes that applied to those jobs, we describe our approach using evolutionary algorithms to derive a diverse and gender-fair subset for use in validating ML algorithms. Since 3/4 of the applicants were male, we had an imbalanced dataset. We describe how, through the use of evolutionary algorithms, we were able to discover different characteristics between genders as well as recognize issues with sparse representations. We constructed additional optimizing objectives to rectify these issues to ultimately unearth a desired sample.

References

  1. [n.d.]. Gender API. https://gender-api.com/en/Google ScholarGoogle Scholar
  2. [n.d.]. Infographic: Women's Representation in Big Tech. https://www.statista.com/chart/4467/female-employees-at-tech-companies/Google ScholarGoogle Scholar
  3. Zeinab Abbassi, Vahab S. Mirrokni, and Mayur Thakur. 2013. Diversity maximization under matroid constraints. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '13). Association for Computing Machinery, New York, NY, USA, 32--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit (1st edition ed.). O'Reilly Media, Beijing ; Cambridge Mass.Google ScholarGoogle Scholar
  5. François-Michel De Rainville, Félix-Antoine Fortin, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: a python framework for evolutionary algorithms. In Proceedings of the 14th annual conference companion on Genetic and evolutionary computation (GECCO '12). Association for Computing Machinery, New York, NY, USA, 85--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (April 2002), 182--197. Conference Name: IEEE Transactions on Evolutionary Computation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Erhan Erkut. 1990. The discrete p-dispersion problem. European Journal of Operational Research 46, 1 (May 1990), 48--60. Google ScholarGoogle ScholarCross RefCross Ref
  8. Mehrdad Ghadiri, Samira Samadi, and Santosh Vempala. 2020. Socially Fair k-Means Clustering. arXiv:2006.10085 [cs, stat] (Oct. 2020). http://arxiv.org/abs/2006.10085 arXiv: 2006.10085 version: 2.Google ScholarGoogle Scholar
  9. Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. 2019. Fair k-Center Clustering for Data Summarization. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 3448--3457. https://proceedings.mlr.press/v97/kleindessner19a.html ISSN: 2640--3498.Google ScholarGoogle Scholar
  10. I. Douglas Moon and Sohail S. Chaudhry. 1984. An Analysis of Network Location Problems with Distance Constraints. Management Science 30, 3 (1984), 290--307. https://www.jstor.org/stable/2631804 Publisher: INFORMS.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging {AI} Applications. 561--577. https://www.usenix.org/conference/osdi18/presentation/moritzGoogle ScholarGoogle Scholar
  12. Zafeiria Moumoulidou, Andrew McGregor, and Alexandra Meliou. 2020. Diverse Data Selection under Fairness Constraints. arXiv:2010.09141 [cs] (Oct. 2020). http://arxiv.org/abs/2010.09141 arXiv: 2010.09141.Google ScholarGoogle Scholar
  13. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825--2830. http://jmlr.org/papers/v12/pedregosa11a.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gerard Salton and Michael J. McGill. 1983. Introduction to modern information retrieval. New York : McGraw-Hill. http://archive.org/details/introductiontomo00saltGoogle ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing sample diversity with fairness constraints on imbalanced, sparse, hiring data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference Companion
        July 2022
        2395 pages
        ISBN:9781450392686
        DOI:10.1145/3520304

        Copyright © 2022 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2022

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        Overall Acceptance Rate1,669of4,410submissions,38%

        Upcoming Conference

        GECCO '24
        Genetic and Evolutionary Computation Conference
        July 14 - 18, 2024
        Melbourne , VIC , Australia
      • Article Metrics

        • Downloads (Last 12 months)19
        • Downloads (Last 6 weeks)2

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader