Skip to main content

Advertisement

Log in

Exploring quality dimensions in trustworthy Machine Learning in the context of official statistics: model explainability and uncertainty quantification

  • Originalveröffentlichung
  • Published:
AStA Wirtschafts- und Sozialstatistisches Archiv Aims and scope Submit manuscript

Abstract

Despite the fact that National Statistical Offices (NSOs) continue to embrace and adopt Machine Learning (ML) methods and tools in a variety of areas of their operations, including data collection, integration, and processing, it is still not clear how these complex and prediction-oriented approaches can be incorporated into the quality standards and frameworks within NSOs or if the frameworks themselves need to be modified. This article focuses on and builds upon two of the quality dimensions proposed in the Quality Framework for Statistical Algorithms (QF4SA): model explainability and accuracy (including uncertainty). The implications of the current methods for explainable ML and uncertainty quantification will be examined in further detail, as well as their possible uses in statistical production, such as continuous model monitoring in intermediate ML classifications and auto-coding phases. This strategy will ensure that human subject-matter experts, who are an essential component of every statistical program, are effectively integrated into the life cycle of ML projects. It will also guarantee to maintain the quality of ML models in production, adhere to the current quality frameworks within NSOs, and ultimately boost confidence and trust in these emerging technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alvarez-Melis D, Jaakkola TS (2018) On the robustness of interpretability methods (presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden)

    Google Scholar 

  • Angelopoulos AN, Bates S (2021) A gentle introduction to conformal prediction and distribution-free uncertainty quantification (arXiv:2107.07511)

    Google Scholar 

  • Angelopoulos AN, Bates S, Fisch A, Lei L, Schuster T (2022) Conformal risk control (arXiv:2208.02814)

    Google Scholar 

  • Angelopoulos AN, Bates S, Fannjiang C, Jordan MI, Zrnic T (2023) Prediction-powered inference (ArXiv:2301.09633)

    Book  Google Scholar 

  • Barber RF, Candès EJ, Ramdas A, Tibshirani RJ (2021) Predictive inference with the jackknife. Ann Stat 49(1):486–507. https://doi.org/10.1214/20-AOS1965

    Article  MathSciNet  Google Scholar 

  • Barber RF, Candes EJ, Ramdas A, Tibshirani RJ (2022) Conformal prediction beyond exchangeability (arXiv:2202.13415)

    Google Scholar 

  • Bernasconi E, De Fausti F, Pugliese F, Scannapieco M, Zardetto D (2022) Automatic extraction of land cover statistics from satellite imagery by deep learning. SJI 38:183–199

    Article  Google Scholar 

  • Bhatt U, Antorán J, Zhang Y, Liao QV, Sattigeri P, Fogliato R, Melançon G, Krishnan R, Stanley J, Tickoo O, Nachman L, Chunara R, Srikumar M, Weller A, Xiang A (2021) Uncertainty as a form of transparency: measuring, communicating, and using uncertainty. Proceedings of the 2021 AAAI/ACM conference on AI, ethics, and society, association for computing machinery, New York, NY, USA, pp 401–413 https://doi.org/10.1145/3461702.3462571

    Book  Google Scholar 

  • Böhm V, Lanusse F, Seljak U (2019) Uncertainty quantification with generative models (arXiv.1910.10046)

    Google Scholar 

  • Breidt FJ, Claeskens G, Opsomer JD (2005) Model-assisted estimation for complex surveys using penalised splines. Biometrika 92(4):831–846

    Article  MathSciNet  Google Scholar 

  • Cassel CM, Särndal CE, Wretman JH (1976) Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63:615–620

    Article  MathSciNet  Google Scholar 

  • Chambers R, Clark R (2012) An introduction to model-based survey sampling with applications. Oxford University Press https://doi.org/10.1093/acprof:oso/9780198566625.001.0001

    Book  Google Scholar 

  • Chen T, Fox E, Guestrin C (2014) Stochastic gradient hamiltonian Monte Carlo. In: Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning, PMLR, Bejing, China, proceedings of machine learning research, vol 32, pp 1683–1691

    Google Scholar 

  • Daas P, Puts M, Buelens B, van den Hurk P (2015) Big data as a source for official statistics. J Off Stat 31(2):249–262. https://doi.org/10.1515/jos-2015-0016

    Article  Google Scholar 

  • Dagdoug M, Goga C, Haziza D (2021) Model-assisted estimation through random forests in finite population sampling. J Am Stat Assoc. https://doi.org/10.1080/01621459.2021.1987250

    Article  Google Scholar 

  • Earth observations for official statistics (2017) United Nation’s satellite imagery and geospatial data task team report. https://unstats.un.org/bigdata/task-teams/earth-observation/UNGWG_Satellite_Task_Team_Report_WhiteCover.pdf. Accessed August 16, 2023

  • Erman S, Rancourt E, Beaucage Y, Loranger A (2022) The use of data science in a national statistical office. https://hdsr.mitpress.mit.edu/pub/x0l4x099. Accessed August 16, 2023

  • European Commission (2022) EU AI act. https://artificialintelligenceact.eu/the-act/. Accessed August 16, 2023

  • Fadel S, Trottier S (2023) A study on explainable active learning for text classification (Statistics Canada’s internal report)

    Google Scholar 

  • Firth D, Bennett KE (1998) Robust models in probability sampling. J Royal Stat Soc Ser B 60(1):3–21. https://doi.org/10.1111/1467-9868.00105

    Article  MathSciNet  Google Scholar 

  • Gal Y, Ghahramani Z (2015) Bayesian convolutional neural networks with bernoulli approximate variational inference (arxiv:1506.02158)

    Google Scholar 

  • Gal Y, Ghahramani Z (2016) Dropout as a bayesian approximation: representing model uncertainty in deep learning. In: Balcan MF, Weinberger KQ (eds) Proceedings of the 33rd international conference on machine learning, PMLR, New York, New York, USA, proceedings of machine learning research, vol 48, pp 1050–1059

    Google Scholar 

  • Gal Y, Islam R, Ghahramani Z (2017) Deep bayesian active learning with image data. Proceedings of the 34th international conference on machine learning, vol 70, pp 1183–1192

    Google Scholar 

  • Geifman Y, El-Yaniv R (2017) Selective classification for deep neural networks. In: Guyon I, von Luxburg U, Bengio S, Wallach H, Fergus R, Garnett R (eds) Advances in neural information processing systems, vol 30

    Google Scholar 

  • Gelein B, Haziza D, Causeur D (2018) Propensity weighting for survey non-response through machine learning. Journées De Méthodologie Stat. https://doi.org/10.1016/j.jmva.2014.06.020

    Article  Google Scholar 

  • Ghai B, Liao QV, Zhang Y, Bellamy R, Mueller K (2021) Explainable active learning (xal): toward ai explanations as interfaces for machine teachers. Proc ACM Hum-Comput Interact. https://doi.org/10.1145/3432934

    Article  Google Scholar 

  • Government of Canada (2022a) National occupational classification (NOC) Canada 2021 version 1.0. https://www.statcan.gc.ca/en/subjects/standard/noc/2021/indexV1. Accessed August 16, 2023

  • Government of Canada (2022b) North American industry classification system (NAICS) Canada 2022 version 1.0. https://www.statcan.gc.ca/en/subjects/standard/naics/2022/v1/index. Accessed August 16, 2023

  • Government of Canada (2023) North American product classification system (NAPCS) Canada 2022 version 1.0. https://www.statcan.gc.ca/en/subjects/standard/napcs/2022/index. Accessed August 16, 2023

  • Government of Canada AI & data act. https://www.parl.ca/DocumentViewer/en/44-1/bill/C-27/first-reading. Accessed August 16, 2023

  • Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, PMLR, proceedings of machine learning research, vol 70, pp 1321–1330

    Google Scholar 

  • Haziza D, Beaumont JF (2017) Construction of weights in surveys: a review. Stat Sci 32:206–226

    Article  MathSciNet  Google Scholar 

  • Hüllermeier E, Waegeman W (2021) Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn 110(3):457–506. https://doi.org/10.1007/s10994-021-05946-3

    Article  MathSciNet  Google Scholar 

  • Kaiser P, Kern C, Rügamer D (2022) Uncertainty-aware predictive modeling for fair data-driven decisions

    Google Scholar 

  • Kull M, Silva Filho TM, Flach P (2017) Beyond sigmoids: how to obtain well-calibrated probabilities from binary classifiers with beta calibration. Electron J Statist 11(2):5052–5080. https://doi.org/10.1214/17-EJS1338SI

    Article  MathSciNet  Google Scholar 

  • Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. Proceedings of the 31st international conference on neural information processing systems, pp 6405–6416

    Google Scholar 

  • Lele SR (2020) How should we quantify uncertainty in statistical inference? Front Ecol Evol. https://doi.org/10.3389/fevo.2020.00035

    Article  Google Scholar 

  • Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Proceedings of the 31st international conference on neural information processing systems, pp 4768–4777

    Google Scholar 

  • McConville KS, Moisen GG, Frescino TS (2020) A tutorial on model-assisted estimation with application to forest inventory. Forests. https://doi.org/10.3390/f11020244

    Article  Google Scholar 

  • Montanari G, Ranalli M (2005) Nonparametric model calibration estimation in survey sampling. J Am Stat Assoc 100(472):1429–1442

    Article  MathSciNet  Google Scholar 

  • Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. Proceedings of the 2020 conference on fairness, accountability, and transparency, association for computing machinery, pp 607–617 https://doi.org/10.1145/3351095.3372850

    Book  Google Scholar 

  • Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci USA. https://doi.org/10.1073/pnas.1900654116

    Article  MathSciNet  Google Scholar 

  • Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers. MIT Press, pp 61–74

    Google Scholar 

  • Ribeiro MT, Singh S, Guestrin C (2018) Anchors: high-precision model-agnostic explanations. Proceedings of the AAAI conference on artificial intelligence

    Google Scholar 

  • Romano Y, Patterson E, Candes EJ (2019) Conformalized quantile regression. In: Wallach H, Larochelle H, Beygelzimer A, Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems, vol 32

    Google Scholar 

  • Roscher R, Bohn B, Duarte MF, Garcke J (2020) Explainable machine learning for scientific insights and discoveries. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2976199

    Article  Google Scholar 

  • Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1(5):206–215. https://doi.org/10.1038/s42256-019-0048-x

    Article  Google Scholar 

  • Särndal CE, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer

    Book  Google Scholar 

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  Google Scholar 

  • Statistics Canada’s quality guidelines (2019) https://www150.statcan.gc.ca/n1/pub/12-539-x/12-539-x2019001-eng.htm. Accessed August 16, 2023

  • Steinberger L, Leeb H (2018) Conditional predictive inference for stable algorithms (arXiv:1809.01412)

    Google Scholar 

  • The OECD Artificial Intelligence (AI) (2019) Principles. https://oecd.ai/en/ai-principles. Accessed August 16, 2023

  • Vaicenavicius J, Widmann D, Andersson C, Lindsten F, Roll J, Schön T (2019) Evaluating model calibration in classification. In: Chaudhuri K, Sugiyama M (eds) Proceedings of the twenty-second international conference on artificial intelligence and statistics, PMLR, proceedings of machine learning research, vol 89, pp 3459–3467

    Google Scholar 

  • Vovk V, Gammerman A, Shafer G (2005) Algorithmic learning in a random world. Springer, Berlin, Heidelberg

    Google Scholar 

  • Wachter S, Mittelstadt B, Russell C (2018) Counterfactual explanations without opening the black box: automated decisions and the gdpr. Harv J Law Technol 31(2):841–887

    Google Scholar 

  • Yung W, Cook K, Thomas S (2004) Use of GST data by the monthly survey of manufacturing. https://www.oecd.org/sdd/36232466.pdf. Accessed August 16, 2023

  • Yung W, Tam SM, Buelens B, Chipman H, Dumpert F, Ascari G, Rocci F, Burger J, Choi IK (2022) A quality framework for statistical algorithms. Stat J IAOS. https://doi.org/10.3233/SJI-210875

    Article  Google Scholar 

  • Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. Proceedings of the eighteenth international conference on machine learning. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 609–616

    Google Scholar 

  • Zhang J (2022) Machine learning techniques to handle survey non-response (statistics Canada’s internal report)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeid Molladavoudi.

Ethics declarations

Conflict of interest

The content of this article represents the position of the authors and may not necessarily represent that of Statistics Canada. The authors declare no competing or conflicting interests that could be perceived as having influenced the work presented in this paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Molladavoudi, S., Yung, W. Exploring quality dimensions in trustworthy Machine Learning in the context of official statistics: model explainability and uncertainty quantification. AStA Wirtsch Sozialstat Arch 17, 223–252 (2023). https://doi.org/10.1007/s11943-023-00331-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11943-023-00331-z

Keywords

Navigation