Abstract
Explainable artificial intelligence (XAI) provides explanations for not interpretable machine learning (ML) models. While many technical approaches exist, there is a lack of validation of these techniques on real-world datasets. In this work, we present a use-case of XAI: an ML model which is trained to estimate electrification rates based on mobile phone data in Senegal. The data originate from the Data for Development challenge by Orange in 2014/15. We apply two model-agnostic, local explanation techniques and find that while the model can be verified, it is biased with respect to the population density. We conclude our paper by pointing to the two main challenges we encountered during our work: data processing and model design that might be restricted by currently available XAI methods, and the importance of domain knowledge to interpret explanations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The code of the project can be found here: https://github.com/lstate/explainability-in-practice.git.
- 3.
- 4.
- 5.
- 6.
The ratio between instances in class 9 and the full dataset before subsampling is \(imb = 33 \% \).
- 7.
12.7M downloads of LIME python package, 63M downloads of SHAP python package, retrieved on 6th of April 2022 https://pepy.tech.
References
Abebe, R., et al.: Narratives and counternarratives on data sharing in Africa. In: FAccT, pp. 329–341. ACM (2021)
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
Agence Nationale de la Statistique et de la Démographie: Census and gis data (2013). https://www.ansd.sn/index.php?option=com_content &view=article &id=134 &Itemid=262. Accessed Apr 2022
Alvarez-Melis, D., Jaakkola, T.S.: On the robustness of interpretability methods. CoRR abs/1806.08049 (2018)
Beckh, K., et al.: Explainable machine learning with prior knowledge: an overview. CoRR abs/2105.10172 (2021)
Blumenstock, J.E.: Calling for better measurement: estimating an individual’s wealth and well-being from mobile phone transaction records. Center for Effective Global Action, UC Berkeley (2015). https://escholarship.org/uc/item/8zs63942
Calegari, R., Ciatto, G., Omicini, A.: On the integration of symbolic and sub-symbolic techniques for XAI: a survey. Intelligenza Artificiale 14(1), 7–32 (2020)
Craven, M.W., Shavlik, J.W.: Extracting tree-structured representations of trained networks. In: NIPS, pp. 24–30. MIT Press (1995)
Dempster, A., Petitjean, F., Webb, G.I.: ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Discov. 34(5), 1454–1495 (2020)
Guidotti, R., Monreale, A., Ruggieri, S., Pedreschi, D., Turini, F., Giannotti, F.: Local rule-based explanations of black box decision systems. CoRR abs/1805.10820 (2018)
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93:1–93:42 (2019)
Guidotti, R., Monreale, A., Spinnato, F., Pedreschi, D., Giannotti, F.: Explaining any time series classifier. In: CogMI, pp. 167–176. IEEE (2020)
Houngbonon, G.V., Quentrec, E.L., Rubrichi, S.: Access to electricity and digital inclusion: evidence from mobile call detail records. Hum. Soc. Sci. Commun. 8(1) (2021). https://doi.org/10.1057/s41599-021-00848-0
Ledesma, C., Garonita, O.L., Flores, L.J., Tingzon, I., Dalisay, D.: Interpretable poverty mapping using social media data, satellite images, and geospatial information. CoRR abs/2011.13563 (2020)
Letouzé, E.: Applications and implications of big data for demo-economic analysis: the case of call-detail records. Ph.D. thesis, University of California, Berkeley, USA (2016)
Lundberg, S.M., Lee, S.: A unified approach to interpreting model predictions. In: NIPS 2017: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4765–4774 (2017)
Martinez-Cesena, E.A., Mancarella, P., Ndiaye, M., Schläpfer, M.: Using mobile phone data for electricity infrastructure planning. arXiv preprint (2015). https://arxiv.org/abs/1504.03899
Molnar, C.: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2022). https://christophm.github.io/interpretable-ml-book/
de Montjoye, Y., Smoreda, Z., Trinquart, R., Ziemlicki, C., Blondel, V.D.: D4D-senegal: the second mobile phone data for development challenge. CoRR abs/1407.4885 (2014)
Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Interpretable machine learning: definitions, methods, and applications. CoRR abs/1901.04592 (2019)
NOAA National Centers for Environmental Information (NCEI): Version 4 DMSP-OLS Nighttime Lights Time Series (2014). https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html. Accessed Apr 2022
Okolo, C.T., Dell, N., Vashistha, A.: Making AI explainable in the global south: a systematic review. In: COMPASS, pp. 439–452. ACM (2022)
Pestre, G., Letouzé, E., Zagheni, E.: The ABCDE of big data: assessing biases in call-detail records for development estimates. World Bank Econ. Rev. 34(Supplement_1), S89–S97 (2019). https://doi.org/10.1093/wber/lhz039
Pokhriyal, N., Jacques, D.C.: Combining disparate data sources for improved poverty prediction and mapping. Proc. Natl. Acad. Sci. 114(46), E9783–E9792 (2017). https://doi.org/10.1073/pnas.1700319114
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)
Salat, H., Schläpfer, M., Smoreda, Z., Rubrichi, S.: Analysing the impact of electrification on rural attractiveness in Senegal with mobile phone data. R. Soc. Open Sci. 8(10) (2021). https://doi.org/10.1098/rsos.201898
Salat, H., Smoreda, Z., Schläpfer, M.: A method to estimate population densities and electricity consumption from mobile phone data in developing countries. PLOS ONE 15(6) (2020). https://doi.org/10.1371/journal.pone.0235224
Schmid, T., Bruckschen, F., Salvati, N., Zbiranski, T.: Constructing socio-demographic indicators for national statistical institutes using mobile phone data: estimating literacy rates in Senegal. J. R. Stat. Soc. Series A (Statistics in Society) 180(4), 1163–1190 (2017). https://www.jstor.org/stable/44682668
Steele, J.E., et al.: Mapping poverty using mobile phone and satellite data. J. R. Soc. Interface 14(127), 20160690 (2017). https://doi.org/10.1098/rsif.2016.0690
Acknowledgements
Thanks to Salvatore Ruggieri and Franco Turini. This work has received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie Actions (grant agreement number 860630) for the project “NoBIAS - Artificial Intelligence without Bias” (https://nobias-project.eu/). This work reflects only the authors’ views and the European Research Executive Agency (REA) is not responsible for any use that may be made of the information it contains.
Author information
Authors and Affiliations
Contributions
Laura State: conceptualization, data processing and experiments, paper draft, writing and editing Hadrien Salat: data preparation, paper co-writing and editing Stefania Rubrichi: data curation, paper reviewing, co-supervision Zbigniew Smoreda: data curation, paper reviewing
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Data Distribution
Figure 6 shows the distribution of available call data (left panel) and text message data (right panel). While both data are more dense in the Western part of Senegal, text message data are specifically sparse in the Eastern part of the country.
1.2 A.2 Additional Results
Additional results for the prediction of the electrification rate are shown in Table 3. This second set of models is also trained with default parameters.
Further, we present a confusion matrix of the best performing model (RF, Fig. 7).
1.3 A.3 Time Series Data
In this section, we briefly describe the work on time series data.
Data Processing. A time series \(S = s_1 .. s_T\) consists of \(T = 24 \times 12\) ordered data points, each being the monthly average of the aggregated number of events per hour, such that \(s_t, t \in 1 .. 24\) represent the monthly average of the aggregated number of events per hour in January (“daily activity curve” for January), etc. Events are separated by direction (incoming or outgoing). Thus, per cell tower, we create six time series. We refer to the TS dataset based on number of calls as CN, based on length of calls as CL and based on number of text messages as SN, and use out/in for outgoing/incoming activity, respectively. We standardize each of the time series separately by applying the min-max scaler as provided by sklearn. Data labeling and subsampling applies as above. Data partitioning follows [12].
We find that text message data is heavily imbalanced, and that the dataset is smaller than the other datasets. Thus, we exclude this data from the time series analysis.
The variational autoencoder that is used in the explanation as displayed below, is trained for \(k = 50\) dimensions and over \(e = 500\) epochs. We used the “out CL” data and model for explanations as it provides the smallest MAE.
Classification. To classify based on time series data, we use ROCKET (RandOm Convolutional KErnel Transform) [9], a method based on random convolutional kernels for feature extraction and linear classification. Results are displayed in Table 4.
Explanations. Figure 8 shows a sample explanation by LASTS [12]. Explanations are provided in visual form and as rule, the latter can be read off the plot. The time series belongs to class 4, i.e. has an electrification rate between 0.4 and 0.5, and is correctly classified by the model. In the left panel, the factual rule is plotted against the original time series. The number above the shapelet indicates its index. The rule reads as follows: “If shapelet no. 12 is contained in the time series, then it is classified as class 4.” This is mirrored by the rule for instances belonging to the opposite class (here: class 0..3, 5..9): ‘If shapelet no. 12 is not contained in the time series, then it is not classified as class 4.”, displayed in Fig. 8, right, plotted against a synthetically generated time series from a class different to class 4.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
State, L., Salat, H., Rubrichi, S., Smoreda, Z. (2023). Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1902. Springer, Cham. https://doi.org/10.1007/978-3-031-44067-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-44067-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44066-3
Online ISBN: 978-3-031-44067-0
eBook Packages: Computer ScienceComputer Science (R0)