abstract

Grouped Learning: Group-By Model Selection Workloads

Author:
Side Li

University of California, San Diego, La Jolla, CA, USA

University of California, San Diego, La Jolla, CA, USA
View Profile

SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataJune 2021Pages 2899–2901https://doi.org/10.1145/3448016.3450576

Published:18 June 2021Publication History

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 2899–2901

ABSTRACT

Machine Learning (ML) is gaining popularity in many applications. Increasingly, companies prefer more targeted models for different subgroups of the population like locations, which helps improve accuracy. This practice is comparable to Group-By aggregation in SQL; we call it learning over groups. A smaller group means the data distribution is more straightforward than the whole population. So, a group-level model may offer more accuracy in many cases. Non-technical business needs, such as privacy and regulatory compliance, may also necessitate group-level models. For instance, online advertising platforms would need to build disaggregated partner-specific ML models, where all partner groups' training data are aggregated together in one data pipeline.

References

CriteoLabs. 2018. Criteo Sponsored Search Conversion Log Dataset. (2018). Retrieved November 20, 2020 from https://ailab.criteo.com/criteo-sponsored-search-conversion-log-dataset/Google Scholar
CriteoLabs. 2018. Spark Custom Partitioner. (2018). Retrieved November 20, 2020 from https://labs.criteo.com/2018/06/spark-custom-partitioner/Google Scholar
Jeffrey Dunn. 2016. Introducing FBLearner Flow: Facebook's AI backbone. https://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/. (2016).Google Scholar
Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Operation of Cloud Lab. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 1--14. https://www.usenix.org/conference/atc19/presentation/duplyakinGoogle Scholar
S. Knust, N. Shakhlevich, Stefan Waldherr, and C. WeiB. 2019. Shop Scheduling Problems with Pliable Jobs.Journal of Scheduling(04 2019). https://doi.org/10.1007/s10951-019-00607--9Google Scholar
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel. 2016. Model Selection Management Systems: The Next Frontier of Advanced Analytics. SIGMOD Rec. 44, 4 (May 2016), 17--22. https://doi.org/10.1145/2935694.2935698Google ScholarDigital Library
Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha. 2021. Cerebro: A Layered Data Platform for Scalable Deep Learning(CIDR'21). http://cidrdb.org/cidr2021/papers/cidr2021_paper25.pdfGoogle Scholar
Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed,Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 583--598. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_muGoogle ScholarDigital Library
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan,and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In 13th USENIX Symposium on Operating Systems Design and Implementation(OSDI 18). USENIX Association, Carlsbad, CA, 561--577. https://www.usenix.org/conference/osdi18/presentation/moritzGoogle Scholar
Supun Nakandala, Yuhao Zhang, and Arun Kumar. 2020. Cerebro: A Data System for Optimized Deep Learning Model Selection. Proc. VLDB Endow.13, 12 (July 2020), 2159--2173. https://doi.org/10.14778/3407790.3407816Google ScholarDigital Library
Ethan L. Schreiber, Richard E. Korf, and Michael D. Moffitt. 2018. Optimal Multi-Way Number Partitioning. J. ACM65, 4, Article 24 (July 2018), 61 pages. https://doi.org/10.1145/3184400Google ScholarDigital Library
Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. CoRRabs/1802.05799 (2018). arXiv:1802.05799 http://arxiv.org/abs/1802.05799Google Scholar
S. Shalev-Shwartz. 2012. https://doi.org/10.1561/2200000018Google ScholarDigital Library

Index Terms

Grouped Learning: Group-By Model Selection Workloads
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Cooperation and coordination
  2. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms
2. Information systems
  1. Data management systems

Recommendations

Group-Agent Reinforcement Learning
Artificial Neural Networks and Machine Learning – ICANN 2023
Abstract
It can largely benefit the reinforcement learning (RL) process of each agent if multiple geographically distributed agents perform their separate RL tasks cooperatively. Different from multi-agent reinforcement learning (MARL) where multiple ...
Read More
Engineering a platform for reinforcement learning workloads
CAIN '22: Proceedings of the 1st International Conference on AI Engineering: Software Engineering for AI

Reinforcement Learning (RL) is an area of machine learning concerned with teaching intelligent agents to take desired actions in a specific environment. The teaching part can be performed in a simulated environment where the agent can learn how to react ...
Read More
Hybrid learning clonal selection algorithm

Artificial immune system is a class of computational intelligence methods drawing inspiration from human immune system. As one type of popular artificial immune computing model, clonal selection algorithm (CSA) has been widely used for many optimization ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
General Chairs:
Guoliang Li
Tsinghua University (China)
,
Zhanhuai Li
Northwestern Polytechnical University (China)
,
Program Chairs:
Stratos Idreos
Harvard University (USA)
,
Divesh Srivastava
AT&T (USA)
Copyright © 2021 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2021
Check for updates
Qualifiers
- abstract
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 92
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Grouped Learning: Group-By Model Selection Workloads

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Group-Agent Reinforcement Learning

Engineering a platform for reinforcement learning workloads

Hybrid learning clonal selection algorithm