skip to main content
10.1145/3412815acmotherconferencesBook PagePublication PagesfodsConference Proceedingsconference-collections
FODS '20: Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference
ACM2020 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
FODS '20: ACM-IMS Foundations of Data Science Conference Virtual Event USA October 19 - 20, 2020
ISBN:
978-1-4503-8103-1
Published:
18 October 2020

Bibliometrics
Skip Abstract Section
Abstract

Computing and statistics underpin the rapid emergence of data science as a pivotal academic discipline. The Association for Computing Machinery (ACM) and the Institute of Mathematical Statistics (IMS), the two key academic organizations in these areas, have come together to launch a conference series on the Foundations of Data Science. Our inaugural event, the ACMIMS Interdisciplinary Summit on the Foundations of Data Science, took place in San Francisco in 2019. FODS-2020 represents the first of what will be an annual conference series with refereed conference proceedings. This interdisciplinary event brings together researchers and practitioners to address foundational data science challenges in prediction, inference, fairness, ethics and the future of data science.

We received 58 submissions and the program committee reviewed each paper thoroughly. We accepted 17 papers for plenary presentation and inclusion in the proceedings. The program also included keynote addresses by Professor Mihaela van der Schaar and Professor Oren Etzioni and half-day tutorials by Professor Michael Kearns and Professor David Blei.

Skip Table Of Content Section
SESSION: Keynote Talk I
keynote
AutoML and Interpretability: Powering the Machine Learning Revolution in Healthcare

An AutoML and interpretability are both fundamental to the successful uptake of machine learning by non-expert end users. The former will lower barriers to entry and unlock potent new capabilities that are out of reach when working with ad-hoc models, ...

SESSION: Session 1: Methodology
research-article
ADAGES: Adaptive Aggregation with Stability for Distributed Feature Selection

In this era of big data, not only the large amount of data keeps motivating distributed computing, but concerns on data privacy also put forward the emphasis on distributed learning. To conduct feature selection and to control the false discovery rate ...

research-article
Classification Acceleration via Merging Decision Trees

We study the problem of merging decision trees: Given k decision trees $T_1,T_2,T_3...,T_k$, we merge these trees into one super tree T with (often) much smaller size. The resultant super tree T, which is an integration of k decision trees with each ...

research-article
Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable

Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to ...

research-article
Open Access
Ensembles of Bagged TAO Trees Consistently Improve over Random Forests, AdaBoost and Gradient Boosting

Ensemble methods based on trees, such as Random Forests, AdaBoost and gradient boosting, are widely recognized as among the best off-the-shelf classifiers: they typically achieve state-of-the-art accuracy in many problems with little effort in tuning ...

SESSION: Session 2: Fairness, Privacy, Interpretability
research-article
Interpreting Black Box Models via Hypothesis Testing

In science and medicine, model interpretations may be reported as discoveries of natural phenomena or used to guide patient treatments. In such high-stakes tasks, false discoveries may lead investigators astray. These applications would therefore ...

research-article
Congenial Differential Privacy under Mandated Disclosure

Differentially private data releases are often required to satisfy a set of external constraints that reflect the legal, ethical, and logical mandates to which the data curator is obligated. The enforcement of constraints, when treated as post-...

research-article
Incentives Needed for Low-Cost Fair Lateral Data Reuse

A central goal of algorithmic fairness is to build systems with fairness properties that compose gracefully. A major effort and step towards this goal in data science has been the development offair representations which guarantee demographic parity ...

research-article
Public Access
Applying Algorithmic Accountability Frameworks with Domain-specific Codes of Ethics: A Case Study in Ecosystem Forecasting for Shellfish Toxicity in the Gulf of Maine

Ecological forecasts are used to inform decisions that can havesignificant impacts on the lives of individuals and on the healthof ecosystems. These forecasts, or models, embody the ethics oftheir creators as well as many seemingly arbitrary ...

SESSION: Keynote Talk II
keynote
Semantic Scholar, NLP, and the Fight against COVID-19

This talk will describe the dramatic creation of the COVID-19 Open Research Dataset (CORD-19) at the Allen Institute for AI and the broad range of efforts, both inside and outside of the Semantic Scholar project, to garner insights into COVID-19 and its ...

SESSION: Session 3: Data Science Theory
research-article
Non-Uniform Sampling of Fixed Margin Binary Matrices

Data sets in the form of binary matrices are ubiquitous across scientific domains, and researchers are often interested in identifying and quantifying noteworthy structure. One approach is to compare the observed data to that which might be obtained ...

research-article
Large Very Dense Subgraphs in a Stream of Edges

We study the detection and the reconstruction of a large very dense subgraph in a social graph with n nodes and m edges given as a stream of edges, when the graph follows a power law degree distribution, in the regime when $m=O(n. łog n)$. A subgraph is ...

research-article
Toward Communication Efficient Adaptive Gradient Method

In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of training speed ...

research-article
Towards Practical Lipschitz Bandits

Stochastic Lipschitz bandit algorithms balance exploration and exploitation, and have been used for a variety of important task domains. In this paper, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context- and ...

research-article
Open Access
On Reinforcement Learning for Turn-based Zero-sum Markov Games

We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement Learning based approach. Specifically, we propose Explore-Improve-Supervise (EIS) ...

SESSION: Session 4: Foundations in Practice
research-article
Public Access
Transforming Probabilistic Programs for Model Checking

Probabilistic programming is perfectly suited to reliable and transparent data science, as it allows the user to specify their models in a high-level language without worrying about the complexities of how to fit the models. Static analysis of ...

research-article
Open Access
StyleCAPTCHA: CAPTCHA Based on Stylized Images to Defend against Deep Networks

CAPTCHAs are widely deployed for bot detection. Many CAPTCHAs are based on visual perception tasks such as text and objection classification. However, they are under serious threat from advanced visual perception technologies based on deep convolutional ...

research-article
Statistical Significance in High-dimensional Linear Mixed Models

This paper develops an inferential framework for high-dimensional linear mixed effect models. Such models are suitable, e.g., when collecting n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (...

research-article
Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data

Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable ...

Contributors

Index Terms

  1. Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference
        Index terms have been assigned to the content through auto-classification.

        Recommendations