Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference

FODS '20: Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference

October 2020

2020 Proceeding

General Chairs:
Jeannette Wing
Columbia University, USA
,
David Madigan
Northeastern University, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

FODS '20: ACM-IMS Foundations of Data Science Conference Virtual Event USA October 19 - 20, 2020

ISBN:

978-1-4503-8103-1

Published:

18 October 2020

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Get Alerts for this ConferenceAlerts Save to BinderBinder

Save to Binder

Create a New Binder

Name

Export CitationCitation

Share on

Bibliometrics

Citation count

104

Downloads (6 weeks)

139

Downloads (12 months)

945

Downloads (cumulative)

3,722

Sections

FODS '20: Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference

2020

Previous Next

Skip Abstract Section

Abstract

Computing and statistics underpin the rapid emergence of data science as a pivotal academic discipline. The Association for Computing Machinery (ACM) and the Institute of Mathematical Statistics (IMS), the two key academic organizations in these areas, have come together to launch a conference series on the Foundations of Data Science. Our inaugural event, the ACMIMS Interdisciplinary Summit on the Foundations of Data Science, took place in San Francisco in 2019. FODS-2020 represents the first of what will be an annual conference series with refereed conference proceedings. This interdisciplinary event brings together researchers and practitioners to address foundational data science challenges in prediction, inference, fairness, ethics and the future of data science.

We received 58 submissions and the program committee reviewed each paper thoroughly. We accepted 17 papers for plenary presentation and inclusion in the proceedings. The program also included keynote addresses by Professor Mihaela van der Schaar and Professor Oren Etzioni and half-day tutorials by Professor Michael Kearns and Professor David Blei.

Proceeding Downloads

PDF(Title Page, Copyright, Welcome, Contents, Organization, Sponsors)

PDF(Author Index)

Skip Table Of Content Section

Select All

Export Citations Save to Binder

SESSION: Keynote Talk I

section

Session details: Keynote Talk I

David Madigan

https://doi.org/10.1145/3429731

- 0
Metrics
Total Citations0

keynote

AutoML and Interpretability: Powering the Machine Learning Revolution in Healthcare

Mihaela van der Schaar

pp 1https://doi.org/10.1145/3412815.3416879

An AutoML and interpretability are both fundamental to the successful uptake of machine learning by non-expert end users. The former will lower barriers to entry and unlock potent new capabilities that are out of reach when working with ad-hoc models, ...

- 0
- 218
Metrics
Total Citations0
Total Downloads218
Last 12 Months29
Last 6 weeks6

Abstract
Get Access

SESSION: Session 1: Methodology

section

Session details: Session 1: Methodology

Julia Kempe

https://doi.org/10.1145/3429732

- 0
Metrics
Total Citations0

research-article

ADAGES: Adaptive Aggregation with Stability for Distributed Feature Selection

Yu Gui

pp 3–12https://doi.org/10.1145/3412815.3416881

In this era of big data, not only the large amount of data keeps motivating distributed computing, but concerns on data privacy also put forward the emphasis on distributed learning. To conduct feature selection and to control the false discovery rate ...

- 1
- 110
Metrics
Total Citations1
Total Downloads110
Last 12 Months4
Last 6 weeks0

Abstract
Get Access

research-article

Classification Acceleration via Merging Decision Trees

Chenglin Fan,
Ping Li

pp 13–22https://doi.org/10.1145/3412815.3416886

We study the problem of merging decision trees: Given k decision trees $T_1,T_2,T_3...,T_k$, we merge these trees into one super tree T with (often) much smaller size. The resultant super tree T, which is an integration of k decision trees with each ...

- 8
- 186
Metrics
Total Citations8
Total Downloads186
Last 12 Months36
Last 6 weeks6

Abstract
Get Access

research-article

Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable

Sarah Tan,
Matvey Soloviev,
Giles Hooker,
Martin T. Wells

pp 23–34https://doi.org/10.1145/3412815.3416893

Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to ...

- 24
- 344
Metrics
Total Citations24
Total Downloads344
Last 12 Months43
Last 6 weeks3

Abstract
Get Access

research-article

Open Access

Ensembles of Bagged TAO Trees Consistently Improve over Random Forests, AdaBoost and Gradient Boosting

Miguel Á Carreira-Perpiñán,
Arman Zharmagambetov

pp 35–46https://doi.org/10.1145/3412815.3416882

Ensemble methods based on trees, such as Random Forests, AdaBoost and gradient boosting, are widely recognized as among the best off-the-shelf classifiers: they typically achieve state-of-the-art accuracy in many problems with little effort in tuning ...

- 20
- 507
Metrics
Total Citations20
Total Downloads507
Last 12 Months274
Last 6 weeks43

Abstract
View online with eReader
PDF

SESSION: Session 2: Fairness, Privacy, Interpretability

section

Session details: Session 2: Fairness, Privacy, Interpretability

Jeff Goldsmith

https://doi.org/10.1145/3429733

- 0
Metrics
Total Citations0

research-article

Interpreting Black Box Models via Hypothesis Testing

Collin Burns,
Jesse Thomason,
Wesley Tansey

pp 47–57https://doi.org/10.1145/3412815.3416889

In science and medicine, model interpretations may be reported as discoveries of natural phenomena or used to guide patient treatments. In such high-stakes tasks, false discoveries may lead investigators astray. These applications would therefore ...

- 17
- 327
Metrics
Total Citations17
Total Downloads327
Last 12 Months64
Last 6 weeks8

Abstract
Get Access

research-article

Congenial Differential Privacy under Mandated Disclosure

Ruobin Gong,
Xiao-Li Meng

pp 59–70https://doi.org/10.1145/3412815.3416892

Differentially private data releases are often required to satisfy a set of external constraints that reflect the legal, ethical, and logical mandates to which the data curator is obligated. The enforcement of constraints, when treated as post-...

- 3
- 123
Metrics
Total Citations3
Total Downloads123
Last 12 Months26
Last 6 weeks6

Abstract
Get Access

research-article

Incentives Needed for Low-Cost Fair Lateral Data Reuse

Roland Maio,
Augustin Chaintreau

pp 71–82https://doi.org/10.1145/3412815.3416890

A central goal of algorithmic fairness is to build systems with fairness properties that compose gracefully. A major effort and step towards this goal in data science has been the development offair representations which guarantee demographic parity ...

- 0
- 67
Metrics
Total Citations0
Total Downloads67
Last 12 Months9
Last 6 weeks0

Abstract
Get Access

research-article

Public Access

Applying Algorithmic Accountability Frameworks with Domain-specific Codes of Ethics: A Case Study in Ecosystem Forecasting for Shellfish Toxicity in the Gulf of Maine

Isabella Grasso,
David Russell,
Abigail Matthews,
Jeanna Matthews,
Nicholas R. Record

pp 83–91https://doi.org/10.1145/3412815.3416897

Ecological forecasts are used to inform decisions that can havesignificant impacts on the lives of individuals and on the healthof ecosystems. These forecasts, or models, embody the ethics oftheir creators as well as many seemingly arbitrary ...

- 2
- 302
Metrics
Total Citations2
Total Downloads302
Last 12 Months122
Last 6 weeks13

Abstract
View online with eReader
PDF

SESSION: Keynote Talk II

section

Session details: Keynote Talk II

Jeannette Wing

https://doi.org/10.1145/3429734

- 0
Metrics
Total Citations0

keynote

Semantic Scholar, NLP, and the Fight against COVID-19

Oren Etzioni

pp 93https://doi.org/10.1145/3412815.3416880

This talk will describe the dramatic creation of the COVID-19 Open Research Dataset (CORD-19) at the Allen Institute for AI and the broad range of efforts, both inside and outside of the Semantic Scholar project, to garner insights into COVID-19 and its ...

- 0
- 57
Metrics
Total Citations0
Total Downloads57
Last 12 Months4
Last 6 weeks0

Abstract
Get Access

SESSION: Session 3: Data Science Theory

section

Session details: Session 3: Data Science Theory

Yannis Ioannidis

https://doi.org/10.1145/3429735

- 0
Metrics
Total Citations0

research-article

Non-Uniform Sampling of Fixed Margin Binary Matrices

Alex Fout,
Bailey K. Fosdick,
Matthew P. Hitt

pp 95–105https://doi.org/10.1145/3412815.3416887

Data sets in the form of binary matrices are ubiquitous across scientific domains, and researchers are often interested in identifying and quantifying noteworthy structure. One approach is to compare the observed data to that which might be obtained ...

- 0
- 63
Metrics
Total Citations0
Total Downloads63
Last 12 Months4
Last 6 weeks0

Abstract
Get Access

research-article

Large Very Dense Subgraphs in a Stream of Edges

Claire Mathieu,
Michel de Rougemont

pp 107–117https://doi.org/10.1145/3412815.3416884

We study the detection and the reconstruction of a large very dense subgraph in a social graph with n nodes and m edges given as a stream of edges, when the graph follows a power law degree distribution, in the regime when $m=O(n. łog n)$. A subgraph is ...

- 1
- 63
Metrics
Total Citations1
Total Downloads63
Last 12 Months3
Last 6 weeks1

Abstract
Get Access

research-article

Toward Communication Efficient Adaptive Gradient Method

Xiangyi Chen,
Xiaoyun Li,
Ping Li

pp 119–128https://doi.org/10.1145/3412815.3416891

In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of training speed ...

- 10
- 257
Metrics
Total Citations10
Total Downloads257
Last 12 Months25
Last 6 weeks0

Abstract
Get Access

research-article

Towards Practical Lipschitz Bandits

Tianyu Wang,
Weicheng Ye,
Dawei Geng,
Cynthia Rudin

pp 129–138https://doi.org/10.1145/3412815.3416885

Stochastic Lipschitz bandit algorithms balance exploration and exploitation, and have been used for a variety of important task domains. In this paper, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and ...

- 7
- 166
Metrics
Total Citations7
Total Downloads166
Last 12 Months34
Last 6 weeks3

Abstract
Get Access

research-article

Open Access

On Reinforcement Learning for Turn-based Zero-sum Markov Games

Devavrat Shah,
Varun Somani,
Qiaomin Xie,
Zhi Xu

pp 139–148https://doi.org/10.1145/3412815.3416888

We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement Learning based approach. Specifically, we propose Explore-Improve-Supervise (EIS) ...

- 0
- 328
Metrics
Total Citations0
Total Downloads328
Last 12 Months135
Last 6 weeks22

Abstract
View online with eReader
PDF

SESSION: Session 4: Foundations in Practice

section

Session details: Session 4: Foundations in Practice

Stan Ahalt

https://doi.org/10.1145/3429736

- 0
Metrics
Total Citations0

research-article

Public Access

Transforming Probabilistic Programs for Model Checking

Ryan Bernstein,
Matthijs Vákár,
Jeannette Wing

pp 149–159https://doi.org/10.1145/3412815.3416896

Probabilistic programming is perfectly suited to reliable and transparent data science, as it allows the user to specify their models in a high-level language without worrying about the complexities of how to fit the models. Static analysis of ...

- 0
- 148
Metrics
Total Citations0
Total Downloads148
Last 12 Months44
Last 6 weeks7

Abstract
View online with eReader
PDF

research-article

Open Access

StyleCAPTCHA: CAPTCHA Based on Stylized Images to Defend against Deep Networks

Haitian Chen,
Bai Jiang,
Hao Chen

pp 161–170https://doi.org/10.1145/3412815.3416895

CAPTCHAs are widely deployed for bot detection. Many CAPTCHAs are based on visual perception tasks such as text and objection classification. However, they are under serious threat from advanced visual perception technologies based on deep convolutional ...

- 1
- 215
Metrics
Total Citations1
Total Downloads215
Last 12 Months52
Last 6 weeks10

Abstract
View online with eReader
PDF

research-article

Statistical Significance in High-dimensional Linear Mixed Models

Lina Lin,
Mathias Drton,
Ali Shojaie

pp 171–181https://doi.org/10.1145/3412815.3416883

This paper develops an inferential framework for high-dimensional linear mixed effect models. Such models are suitable, e.g., when collecting n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (...

- 2
- 91
Metrics
Total Citations2
Total Downloads91
Last 12 Months17
Last 6 weeks8

Abstract
Get Access

research-article

Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data

Thanh Le,
Vasant Honavar

pp 183–188https://doi.org/10.1145/3412815.3416894

Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable ...

- 3
- 150
Metrics
Total Citations3
Total Downloads150
Last 12 Months20
Last 6 weeks3

Abstract
Get Access

Cited By

Hernández-López I, Prieto-Santiago V, Ortiz-Sòla J, Abadias M and Aguiló-Aguayo I (2024). Acceptance of microalgal processes and products Sustainable Industrial Processes Based on Microalgae, 10.1016/B978-0-443-19213-5.00015-7, (335-359),
Yang B, Ji S, Zhao T, Wang Z, Zhang Y, Pan Q, Huang W and Lu B (2023). Phytosterols photooxidation in O/W emulsion: Influence of emulsifier composition and interfacial properties, Food Hydrocolloids, 10.1016/j.foodhyd.2023.108698, 142, (108698), Online publication date: 1-Sep-2023.
Cheng Z, Pan W, Xian W, Yu J, Weng X, Benjakul S, Guidi A, Ying X and Deng S (2022). Effects of various logistics packaging on the quality and microbial variation of bigeye tuna (Thunnus obesus), Frontiers in Nutrition, 10.3389/fnut.2022.998377, 9
Liu Q, Chang X, Shan Y, Fu F and Ding S (2020). Fabrication and characterization of Pickering emulsion gels stabilized by zein/pullulan complex colloidal particles , Journal of the Science of Food and Agriculture, 10.1002/jsfa.10992, 101:9, (3630-3643), Online publication date: 1-Jul-2021.
Magri A, Petriccione M, Cerqueira M and Gutiérrez T (2020). Self-assembled lipids for food applications: A review, Advances in Colloid and Interface Science, 10.1016/j.cis.2020.102279, 285, (102279), Online publication date: 1-Nov-2020.

Save to Binder

Create a New Binder

Name

Contributors

Jeannette Wing
- Publication Years
- Publication counts0
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile
David Madigan
- Publication Years
- Publication counts0
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile

Index Terms

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference

Index terms have been assigned to the content through auto-classification.

Recommendations

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
Read More
SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conference
Read More
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
Read More

Comments

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation

Save to Binder

Sections

Proceeding Downloads

Cited By

Save to Binder

Index Terms

Recommendations

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conference

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining