Abstract
Despite the fact that National Statistical Offices (NSOs) continue to embrace and adopt Machine Learning (ML) methods and tools in a variety of areas of their operations, including data collection, integration, and processing, it is still not clear how these complex and prediction-oriented approaches can be incorporated into the quality standards and frameworks within NSOs or if the frameworks themselves need to be modified. This article focuses on and builds upon two of the quality dimensions proposed in the Quality Framework for Statistical Algorithms (QF4SA): model explainability and accuracy (including uncertainty). The implications of the current methods for explainable ML and uncertainty quantification will be examined in further detail, as well as their possible uses in statistical production, such as continuous model monitoring in intermediate ML classifications and auto-coding phases. This strategy will ensure that human subject-matter experts, who are an essential component of every statistical program, are effectively integrated into the life cycle of ML projects. It will also guarantee to maintain the quality of ML models in production, adhere to the current quality frameworks within NSOs, and ultimately boost confidence and trust in these emerging technologies.
Similar content being viewed by others
References
Alvarez-Melis D, Jaakkola TS (2018) On the robustness of interpretability methods (presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden)
Angelopoulos AN, Bates S (2021) A gentle introduction to conformal prediction and distribution-free uncertainty quantification (arXiv:2107.07511)
Angelopoulos AN, Bates S, Fisch A, Lei L, Schuster T (2022) Conformal risk control (arXiv:2208.02814)
Angelopoulos AN, Bates S, Fannjiang C, Jordan MI, Zrnic T (2023) Prediction-powered inference (ArXiv:2301.09633)
Barber RF, Candès EJ, Ramdas A, Tibshirani RJ (2021) Predictive inference with the jackknife. Ann Stat 49(1):486–507. https://doi.org/10.1214/20-AOS1965
Barber RF, Candes EJ, Ramdas A, Tibshirani RJ (2022) Conformal prediction beyond exchangeability (arXiv:2202.13415)
Bernasconi E, De Fausti F, Pugliese F, Scannapieco M, Zardetto D (2022) Automatic extraction of land cover statistics from satellite imagery by deep learning. SJI 38:183–199
Bhatt U, Antorán J, Zhang Y, Liao QV, Sattigeri P, Fogliato R, Melançon G, Krishnan R, Stanley J, Tickoo O, Nachman L, Chunara R, Srikumar M, Weller A, Xiang A (2021) Uncertainty as a form of transparency: measuring, communicating, and using uncertainty. Proceedings of the 2021 AAAI/ACM conference on AI, ethics, and society, association for computing machinery, New York, NY, USA, pp 401–413 https://doi.org/10.1145/3461702.3462571
Böhm V, Lanusse F, Seljak U (2019) Uncertainty quantification with generative models (arXiv.1910.10046)
Breidt FJ, Claeskens G, Opsomer JD (2005) Model-assisted estimation for complex surveys using penalised splines. Biometrika 92(4):831–846
Cassel CM, Särndal CE, Wretman JH (1976) Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63:615–620
Chambers R, Clark R (2012) An introduction to model-based survey sampling with applications. Oxford University Press https://doi.org/10.1093/acprof:oso/9780198566625.001.0001
Chen T, Fox E, Guestrin C (2014) Stochastic gradient hamiltonian Monte Carlo. In: Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning, PMLR, Bejing, China, proceedings of machine learning research, vol 32, pp 1683–1691
Daas P, Puts M, Buelens B, van den Hurk P (2015) Big data as a source for official statistics. J Off Stat 31(2):249–262. https://doi.org/10.1515/jos-2015-0016
Dagdoug M, Goga C, Haziza D (2021) Model-assisted estimation through random forests in finite population sampling. J Am Stat Assoc. https://doi.org/10.1080/01621459.2021.1987250
Earth observations for official statistics (2017) United Nation’s satellite imagery and geospatial data task team report. https://unstats.un.org/bigdata/task-teams/earth-observation/UNGWG_Satellite_Task_Team_Report_WhiteCover.pdf. Accessed August 16, 2023
Erman S, Rancourt E, Beaucage Y, Loranger A (2022) The use of data science in a national statistical office. https://hdsr.mitpress.mit.edu/pub/x0l4x099. Accessed August 16, 2023
European Commission (2022) EU AI act. https://artificialintelligenceact.eu/the-act/. Accessed August 16, 2023
Fadel S, Trottier S (2023) A study on explainable active learning for text classification (Statistics Canada’s internal report)
Firth D, Bennett KE (1998) Robust models in probability sampling. J Royal Stat Soc Ser B 60(1):3–21. https://doi.org/10.1111/1467-9868.00105
Gal Y, Ghahramani Z (2015) Bayesian convolutional neural networks with bernoulli approximate variational inference (arxiv:1506.02158)
Gal Y, Ghahramani Z (2016) Dropout as a bayesian approximation: representing model uncertainty in deep learning. In: Balcan MF, Weinberger KQ (eds) Proceedings of the 33rd international conference on machine learning, PMLR, New York, New York, USA, proceedings of machine learning research, vol 48, pp 1050–1059
Gal Y, Islam R, Ghahramani Z (2017) Deep bayesian active learning with image data. Proceedings of the 34th international conference on machine learning, vol 70, pp 1183–1192
Geifman Y, El-Yaniv R (2017) Selective classification for deep neural networks. In: Guyon I, von Luxburg U, Bengio S, Wallach H, Fergus R, Garnett R (eds) Advances in neural information processing systems, vol 30
Gelein B, Haziza D, Causeur D (2018) Propensity weighting for survey non-response through machine learning. Journées De Méthodologie Stat. https://doi.org/10.1016/j.jmva.2014.06.020
Ghai B, Liao QV, Zhang Y, Bellamy R, Mueller K (2021) Explainable active learning (xal): toward ai explanations as interfaces for machine teachers. Proc ACM Hum-Comput Interact. https://doi.org/10.1145/3432934
Government of Canada (2022a) National occupational classification (NOC) Canada 2021 version 1.0. https://www.statcan.gc.ca/en/subjects/standard/noc/2021/indexV1. Accessed August 16, 2023
Government of Canada (2022b) North American industry classification system (NAICS) Canada 2022 version 1.0. https://www.statcan.gc.ca/en/subjects/standard/naics/2022/v1/index. Accessed August 16, 2023
Government of Canada (2023) North American product classification system (NAPCS) Canada 2022 version 1.0. https://www.statcan.gc.ca/en/subjects/standard/napcs/2022/index. Accessed August 16, 2023
Government of Canada AI & data act. https://www.parl.ca/DocumentViewer/en/44-1/bill/C-27/first-reading. Accessed August 16, 2023
Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, PMLR, proceedings of machine learning research, vol 70, pp 1321–1330
Haziza D, Beaumont JF (2017) Construction of weights in surveys: a review. Stat Sci 32:206–226
Hüllermeier E, Waegeman W (2021) Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn 110(3):457–506. https://doi.org/10.1007/s10994-021-05946-3
Kaiser P, Kern C, Rügamer D (2022) Uncertainty-aware predictive modeling for fair data-driven decisions
Kull M, Silva Filho TM, Flach P (2017) Beyond sigmoids: how to obtain well-calibrated probabilities from binary classifiers with beta calibration. Electron J Statist 11(2):5052–5080. https://doi.org/10.1214/17-EJS1338SI
Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. Proceedings of the 31st international conference on neural information processing systems, pp 6405–6416
Lele SR (2020) How should we quantify uncertainty in statistical inference? Front Ecol Evol. https://doi.org/10.3389/fevo.2020.00035
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Proceedings of the 31st international conference on neural information processing systems, pp 4768–4777
McConville KS, Moisen GG, Frescino TS (2020) A tutorial on model-assisted estimation with application to forest inventory. Forests. https://doi.org/10.3390/f11020244
Montanari G, Ranalli M (2005) Nonparametric model calibration estimation in survey sampling. J Am Stat Assoc 100(472):1429–1442
Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. Proceedings of the 2020 conference on fairness, accountability, and transparency, association for computing machinery, pp 607–617 https://doi.org/10.1145/3351095.3372850
Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci USA. https://doi.org/10.1073/pnas.1900654116
Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers. MIT Press, pp 61–74
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: high-precision model-agnostic explanations. Proceedings of the AAAI conference on artificial intelligence
Romano Y, Patterson E, Candes EJ (2019) Conformalized quantile regression. In: Wallach H, Larochelle H, Beygelzimer A, Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems, vol 32
Roscher R, Bohn B, Duarte MF, Garcke J (2020) Explainable machine learning for scientific insights and discoveries. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2976199
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1(5):206–215. https://doi.org/10.1038/s42256-019-0048-x
Särndal CE, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Statistics Canada’s quality guidelines (2019) https://www150.statcan.gc.ca/n1/pub/12-539-x/12-539-x2019001-eng.htm. Accessed August 16, 2023
Steinberger L, Leeb H (2018) Conditional predictive inference for stable algorithms (arXiv:1809.01412)
The OECD Artificial Intelligence (AI) (2019) Principles. https://oecd.ai/en/ai-principles. Accessed August 16, 2023
Vaicenavicius J, Widmann D, Andersson C, Lindsten F, Roll J, Schön T (2019) Evaluating model calibration in classification. In: Chaudhuri K, Sugiyama M (eds) Proceedings of the twenty-second international conference on artificial intelligence and statistics, PMLR, proceedings of machine learning research, vol 89, pp 3459–3467
Vovk V, Gammerman A, Shafer G (2005) Algorithmic learning in a random world. Springer, Berlin, Heidelberg
Wachter S, Mittelstadt B, Russell C (2018) Counterfactual explanations without opening the black box: automated decisions and the gdpr. Harv J Law Technol 31(2):841–887
Yung W, Cook K, Thomas S (2004) Use of GST data by the monthly survey of manufacturing. https://www.oecd.org/sdd/36232466.pdf. Accessed August 16, 2023
Yung W, Tam SM, Buelens B, Chipman H, Dumpert F, Ascari G, Rocci F, Burger J, Choi IK (2022) A quality framework for statistical algorithms. Stat J IAOS. https://doi.org/10.3233/SJI-210875
Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. Proceedings of the eighteenth international conference on machine learning. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 609–616
Zhang J (2022) Machine learning techniques to handle survey non-response (statistics Canada’s internal report)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The content of this article represents the position of the authors and may not necessarily represent that of Statistics Canada. The authors declare no competing or conflicting interests that could be perceived as having influenced the work presented in this paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Molladavoudi, S., Yung, W. Exploring quality dimensions in trustworthy Machine Learning in the context of official statistics: model explainability and uncertainty quantification. AStA Wirtsch Sozialstat Arch 17, 223–252 (2023). https://doi.org/10.1007/s11943-023-00331-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11943-023-00331-z