skip to main content
research-article
Public Access

Reliability at multiple stages in a data analysis pipeline

Published: 20 October 2022 Publication History

Abstract

Data-centric methods designed to increase end-to-end reliability of data-driven decision systems.

References

[1]
AI Fairness 360; https://aif360.mybluemix.net/.
[2]
Abedjan, Z., Golab, L., and Naumann, F. Profiling relational data: A survey. Intern. J. of Very Large Databases 24, 4 (2015), 557--581.
[3]
Agrawal, R. and Srikant, R. Fast algorithms for mining association rules in large databases. In Proceedings of the Intern. Conf. on Very Large Databases (1994), Morgan Kaufmann, 487--499.
[4]
Angwin, J. et al. Machine bias. ProPublica (2016).
[5]
Asudeh, A., Jin, Z., and Jagadish, H.V. Assessing and remedying coverage for a given dataset. IEEE 35th Intern. Conf. on Data Engineering (2019), 554--565.
[6]
Baralis, E., Paraboschi, S., and Teniente, E. Materialized views selection in a multidimensional database. In Proceedings of the 23rd Intern. Conf. on Very Large Databases (1997), Morgan Kaufmann, 156--165.
[7]
Broussard, M. When algorithms give real students imaginary grades. The New York Times (2020); https://www.nytimes.com/2020/09/08/opinion/international-baccalaureate-algorithm-grades.html
[8]
Cortez, P. and Silva, A.M.G. Using data mining to predict secondary school student performance. In Proceedings of the 5th Annual Future Business Tech. Conf. (2008).
[9]
Deshpande, A., Garofalakis, M.N., and Rastogi, R. Independence is good: Dependency-based histogram synopses for high-dimensional data. ACM SIGMOD (2001), 199--210.
[10]
Guo, Y., Binnig, C., and Kraska, T. What you see is not what you get! Detecting Simpson's Paradoxes during data exploration. In Proceedings of the 2nd Workshop on Human in the Loop Analytics (2017), 2:1--2:5.
[11]
Jo, S. et al. Verifying text summaries of relational data sets. ACM SIGMOD (2019), 299--316.
[12]
Li, J., Moskovitch, Y., and Jagadish, H.V. Denouncer: Detection of unfairness in classifiers. In Proceedings of the VLDB Endowment 14, 12 (2021), 2719--2722^
[13]
Lin, Y. et al. On detecting cherry-picked generalizations. In Proceedings of the VLDB Endowment 15, 1 (2021), 59--71.
[14]
Moskovitch, Y. and Jagadish, H.V. COUNTATA: Dataset labeling using pattern counts. In Proceedings of the VLDB Endowment 13, 12 (2020).
[15]
Moskovitch, Y. and Jagadish, H.V. Patterns count-based labels for datasets. IEEE 35th Intern. Conf. on Data Engineering (2019), 1961--1966.
[16]
Müller, M., Moerkotte, G., and Kolb, O. Improved selectivity estimation by combining knowledge from sampling and synopses. In Proceedings of the VLDB Endowment 11, 9 (2018), 1016--1028.
[17]
Rymon, R. Search through systematic set enumeration. Intern. Conf. on Principles of Knowledge Representation and Reasoning (1992), Morgan Kaufmann, 539--550.
[18]
Salimi, B., Gehrke, J., and Suciu, D. Bias in OLAP queries: Detection, explanation, and removal. ACM SIGMOD (2018), 1021--1035.
[19]
Sissons, B. What to know about ADHD misdiagnosis. Medical News Today (2019); https://www.medicalnewstoday.com/articles/325595#age-related-factors.
[20]
Stoyanovich, J. and Howe, B. Nutritional labels for data and models. IEEE Data Engineering Bulletin 42, 3 (2019), 13--23.
[21]
Verma, S. and Rubin, J. Fairness definitions explained. IEEE/ACM FairWare (2018), 1--7.
[22]
Yang, Z. et al. Deep unsupervised cardinality estimation. In Proceedings of the VLDB Endowment 13, 3 (2019), 279--292.

Cited By

View all
  • (2024)Reliability evaluation of individual predictions: a data-centric approachThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00857-w33:4(1203-1230)Online publication date: 30-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 65, Issue 11
November 2022
130 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3569027
  • Editor:
  • James Larus
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2022
Published in CACM Volume 65, Issue 11

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)510
  • Downloads (Last 6 weeks)59
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Reliability evaluation of individual predictions: a data-centric approachThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00857-w33:4(1203-1230)Online publication date: 30-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media