skip to main content
10.1145/3448016.3450576acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
abstract

Grouped Learning: Group-By Model Selection Workloads

Published:18 June 2021Publication History

ABSTRACT

Machine Learning (ML) is gaining popularity in many applications. Increasingly, companies prefer more targeted models for different subgroups of the population like locations, which helps improve accuracy. This practice is comparable to Group-By aggregation in SQL; we call it learning over groups. A smaller group means the data distribution is more straightforward than the whole population. So, a group-level model may offer more accuracy in many cases. Non-technical business needs, such as privacy and regulatory compliance, may also necessitate group-level models. For instance, online advertising platforms would need to build disaggregated partner-specific ML models, where all partner groups' training data are aggregated together in one data pipeline.

References

  1. CriteoLabs. 2018. Criteo Sponsored Search Conversion Log Dataset. (2018). Retrieved November 20, 2020 from https://ailab.criteo.com/criteo-sponsored-search-conversion-log-dataset/Google ScholarGoogle Scholar
  2. CriteoLabs. 2018. Spark Custom Partitioner. (2018). Retrieved November 20, 2020 from https://labs.criteo.com/2018/06/spark-custom-partitioner/Google ScholarGoogle Scholar
  3. Jeffrey Dunn. 2016. Introducing FBLearner Flow: Facebook's AI backbone. https://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/. (2016).Google ScholarGoogle Scholar
  4. Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Operation of Cloud Lab. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 1--14. https://www.usenix.org/conference/atc19/presentation/duplyakinGoogle ScholarGoogle Scholar
  5. S. Knust, N. Shakhlevich, Stefan Waldherr, and C. WeiB. 2019. Shop Scheduling Problems with Pliable Jobs.Journal of Scheduling(04 2019). https://doi.org/10.1007/s10951-019-00607--9Google ScholarGoogle Scholar
  6. Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel. 2016. Model Selection Management Systems: The Next Frontier of Advanced Analytics. SIGMOD Rec. 44, 4 (May 2016), 17--22. https://doi.org/10.1145/2935694.2935698Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha. 2021. Cerebro: A Layered Data Platform for Scalable Deep Learning(CIDR'21). http://cidrdb.org/cidr2021/papers/cidr2021_paper25.pdfGoogle ScholarGoogle Scholar
  8. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed,Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 583--598. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_muGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  9. Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan,and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In 13th USENIX Symposium on Operating Systems Design and Implementation(OSDI 18). USENIX Association, Carlsbad, CA, 561--577. https://www.usenix.org/conference/osdi18/presentation/moritzGoogle ScholarGoogle Scholar
  10. Supun Nakandala, Yuhao Zhang, and Arun Kumar. 2020. Cerebro: A Data System for Optimized Deep Learning Model Selection. Proc. VLDB Endow.13, 12 (July 2020), 2159--2173. https://doi.org/10.14778/3407790.3407816Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ethan L. Schreiber, Richard E. Korf, and Michael D. Moffitt. 2018. Optimal Multi-Way Number Partitioning. J. ACM65, 4, Article 24 (July 2018), 61 pages. https://doi.org/10.1145/3184400Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. CoRRabs/1802.05799 (2018). arXiv:1802.05799 http://arxiv.org/abs/1802.05799Google ScholarGoogle Scholar
  13. S. Shalev-Shwartz. 2012. https://doi.org/10.1561/2200000018Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Grouped Learning: Group-By Model Selection Workloads

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
          June 2021
          2969 pages
          ISBN:9781450383431
          DOI:10.1145/3448016

          Copyright © 2021 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2021

          Check for updates

          Qualifiers

          • abstract

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%
        • Article Metrics

          • Downloads (Last 12 months)12
          • Downloads (Last 6 weeks)5

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader