Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal

State, Laura; Salat, Hadrien; Rubrichi, Stefania; Smoreda, Zbigniew

doi:10.1007/978-3-031-44067-0_6

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1902))

Included in the following conference series:

World Conference on Explainable Artificial Intelligence

922 Accesses
1 Citations

Abstract

Explainable artificial intelligence (XAI) provides explanations for not interpretable machine learning (ML) models. While many technical approaches exist, there is a lack of validation of these techniques on real-world datasets. In this work, we present a use-case of XAI: an ML model which is trained to estimate electrification rates based on mobile phone data in Senegal. The data originate from the Data for Development challenge by Orange in 2014/15. We apply two model-agnostic, local explanation techniques and find that while the model can be verified, it is biased with respect to the population density. We conclude our paper by pointing to the two main challenges we encountered during our work: data processing and model design that might be restricted by currently available XAI methods, and the importance of domain knowledge to interpret explanations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Explainable Artificial Intelligence on Smart Human Mobility: A Comparative Study Approach

Machine learning insights into forecasting solar power generation with explainable AI

Article 28 December 2024

Interpretability and Explainability in Machine Learning

Notes

1.
https://algorithmwatch.org/en/automating-society-2020/.
2.
The code of the project can be found here: https://github.com/lstate/explainability-in-practice.git.
3.
https://ourworldindata.org/grapher/mobile-cellular-subscriptions-per-100-people.
4.
https://sdgs.un.org/goals.
5.
https://data.worldbank.org/country/senegal.
6.
The ratio between instances in class 9 and the full dataset before subsampling is $imb = 33 \% $.
7.
12.7M downloads of LIME python package, 63M downloads of SHAP python package, retrieved on 6th of April 2022 https://pepy.tech.

References

Abebe, R., et al.: Narratives and counternarratives on data sharing in Africa. In: FAccT, pp. 329–341. ACM (2021)
Google Scholar
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
Article Google Scholar
Agence Nationale de la Statistique et de la Démographie: Census and gis data (2013). https://www.ansd.sn/index.php?option=com_content &view=article &id=134 &Itemid=262. Accessed Apr 2022
Alvarez-Melis, D., Jaakkola, T.S.: On the robustness of interpretability methods. CoRR abs/1806.08049 (2018)
Google Scholar
Beckh, K., et al.: Explainable machine learning with prior knowledge: an overview. CoRR abs/2105.10172 (2021)
Google Scholar
Blumenstock, J.E.: Calling for better measurement: estimating an individual’s wealth and well-being from mobile phone transaction records. Center for Effective Global Action, UC Berkeley (2015). https://escholarship.org/uc/item/8zs63942
Calegari, R., Ciatto, G., Omicini, A.: On the integration of symbolic and sub-symbolic techniques for XAI: a survey. Intelligenza Artificiale 14(1), 7–32 (2020)
Article Google Scholar
Craven, M.W., Shavlik, J.W.: Extracting tree-structured representations of trained networks. In: NIPS, pp. 24–30. MIT Press (1995)
Google Scholar
Dempster, A., Petitjean, F., Webb, G.I.: ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Discov. 34(5), 1454–1495 (2020)
Article MathSciNet Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Pedreschi, D., Turini, F., Giannotti, F.: Local rule-based explanations of black box decision systems. CoRR abs/1805.10820 (2018)
Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93:1–93:42 (2019)
Google Scholar
Guidotti, R., Monreale, A., Spinnato, F., Pedreschi, D., Giannotti, F.: Explaining any time series classifier. In: CogMI, pp. 167–176. IEEE (2020)
Google Scholar
Houngbonon, G.V., Quentrec, E.L., Rubrichi, S.: Access to electricity and digital inclusion: evidence from mobile call detail records. Hum. Soc. Sci. Commun. 8(1) (2021). https://doi.org/10.1057/s41599-021-00848-0
Ledesma, C., Garonita, O.L., Flores, L.J., Tingzon, I., Dalisay, D.: Interpretable poverty mapping using social media data, satellite images, and geospatial information. CoRR abs/2011.13563 (2020)
Google Scholar
Letouzé, E.: Applications and implications of big data for demo-economic analysis: the case of call-detail records. Ph.D. thesis, University of California, Berkeley, USA (2016)
Google Scholar
Lundberg, S.M., Lee, S.: A unified approach to interpreting model predictions. In: NIPS 2017: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4765–4774 (2017)
Google Scholar
Martinez-Cesena, E.A., Mancarella, P., Ndiaye, M., Schläpfer, M.: Using mobile phone data for electricity infrastructure planning. arXiv preprint (2015). https://arxiv.org/abs/1504.03899
Molnar, C.: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2022). https://christophm.github.io/interpretable-ml-book/
de Montjoye, Y., Smoreda, Z., Trinquart, R., Ziemlicki, C., Blondel, V.D.: D4D-senegal: the second mobile phone data for development challenge. CoRR abs/1407.4885 (2014)
Google Scholar
Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Interpretable machine learning: definitions, methods, and applications. CoRR abs/1901.04592 (2019)
Google Scholar
NOAA National Centers for Environmental Information (NCEI): Version 4 DMSP-OLS Nighttime Lights Time Series (2014). https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html. Accessed Apr 2022
Okolo, C.T., Dell, N., Vashistha, A.: Making AI explainable in the global south: a systematic review. In: COMPASS, pp. 439–452. ACM (2022)
Google Scholar
Pestre, G., Letouzé, E., Zagheni, E.: The ABCDE of big data: assessing biases in call-detail records for development estimates. World Bank Econ. Rev. 34(Supplement_1), S89–S97 (2019). https://doi.org/10.1093/wber/lhz039
Pokhriyal, N., Jacques, D.C.: Combining disparate data sources for improved poverty prediction and mapping. Proc. Natl. Acad. Sci. 114(46), E9783–E9792 (2017). https://doi.org/10.1073/pnas.1700319114
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)
Google Scholar
Salat, H., Schläpfer, M., Smoreda, Z., Rubrichi, S.: Analysing the impact of electrification on rural attractiveness in Senegal with mobile phone data. R. Soc. Open Sci. 8(10) (2021). https://doi.org/10.1098/rsos.201898
Salat, H., Smoreda, Z., Schläpfer, M.: A method to estimate population densities and electricity consumption from mobile phone data in developing countries. PLOS ONE 15(6) (2020). https://doi.org/10.1371/journal.pone.0235224
Schmid, T., Bruckschen, F., Salvati, N., Zbiranski, T.: Constructing socio-demographic indicators for national statistical institutes using mobile phone data: estimating literacy rates in Senegal. J. R. Stat. Soc. Series A (Statistics in Society) 180(4), 1163–1190 (2017). https://www.jstor.org/stable/44682668
Steele, J.E., et al.: Mapping poverty using mobile phone and satellite data. J. R. Soc. Interface 14(127), 20160690 (2017). https://doi.org/10.1098/rsif.2016.0690
Article Google Scholar

Download references

Acknowledgements

Thanks to Salvatore Ruggieri and Franco Turini. This work has received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie Actions (grant agreement number 860630) for the project “NoBIAS - Artificial Intelligence without Bias” (https://nobias-project.eu/). This work reflects only the authors’ views and the European Research Executive Agency (REA) is not responsible for any use that may be made of the information it contains.

Author information

Authors and Affiliations

University of Pisa, Pisa, Italy
Laura State
Scuola Normale Superiore, Pisa, Italy
Laura State
Alan Turing Institute, London, UK
Hadrien Salat
Orange Innovation, Châtillon, France
Stefania Rubrichi & Zbigniew Smoreda

Authors

Laura State
View author publications
You can also search for this author in PubMed Google Scholar
Hadrien Salat
View author publications
You can also search for this author in PubMed Google Scholar
Stefania Rubrichi
View author publications
You can also search for this author in PubMed Google Scholar
Zbigniew Smoreda
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Laura State: conceptualization, data processing and experiments, paper draft, writing and editing Hadrien Salat: data preparation, paper co-writing and editing Stefania Rubrichi: data curation, paper reviewing, co-supervision Zbigniew Smoreda: data curation, paper reviewing

Corresponding author

Correspondence to Laura State .

Editor information

Editors and Affiliations

Technological University Dublin, Dublin, Ireland
Luca Longo

A Appendix

1.1 A.1 Data Distribution

Figure 6 shows the distribution of available call data (left panel) and text message data (right panel). While both data are more dense in the Western part of Senegal, text message data are specifically sparse in the Eastern part of the country.

Table 3. Classification results. The higher the accuracy (acc) the better, the lower the MAE the better.

Full size table

1.2 A.2 Additional Results

Additional results for the prediction of the electrification rate are shown in Table 3. This second set of models is also trained with default parameters.

Further, we present a confusion matrix of the best performing model (RF, Fig. 7).

1.3 A.3 Time Series Data

In this section, we briefly describe the work on time series data.

Data Processing. A time series $S = s_1 .. s_T$ consists of $T = 24 \times 12$ ordered data points, each being the monthly average of the aggregated number of events per hour, such that $s_t, t \in 1 .. 24$ represent the monthly average of the aggregated number of events per hour in January (“daily activity curve” for January), etc. Events are separated by direction (incoming or outgoing). Thus, per cell tower, we create six time series. We refer to the TS dataset based on number of calls as CN, based on length of calls as CL and based on number of text messages as SN, and use out/in for outgoing/incoming activity, respectively. We standardize each of the time series separately by applying the min-max scaler as provided by sklearn. Data labeling and subsampling applies as above. Data partitioning follows [12].

We find that text message data is heavily imbalanced, and that the dataset is smaller than the other datasets. Thus, we exclude this data from the time series analysis.

The variational autoencoder that is used in the explanation as displayed below, is trained for $k = 50$ dimensions and over $e = 500$ epochs. We used the “out CL” data and model for explanations as it provides the smallest MAE.

Classification. To classify based on time series data, we use ROCKET (RandOm Convolutional KErnel Transform) [9], a method based on random convolutional kernels for feature extraction and linear classification. Results are displayed in Table 4.

Table 4. Classification results. The higher the accuracy the better, the lower the MAE the better.

Full size table

Explanations. Figure 8 shows a sample explanation by LASTS [12]. Explanations are provided in visual form and as rule, the latter can be read off the plot. The time series belongs to class 4, i.e. has an electrification rate between 0.4 and 0.5, and is correctly classified by the model. In the left panel, the factual rule is plotted against the original time series. The number above the shapelet indicates its index. The rule reads as follows: “If shapelet no. 12 is contained in the time series, then it is classified as class 4.” This is mirrored by the rule for instances belonging to the opposite class (here: class 0..3, 5..9): ‘If shapelet no. 12 is not contained in the time series, then it is not classified as class 4.”, displayed in Fig. 8, right, plotted against a synthetically generated time series from a class different to class 4.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

State, L., Salat, H., Rubrichi, S., Smoreda, Z. (2023). Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1902. Springer, Cham. https://doi.org/10.1007/978-3-031-44067-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-44067-0_6
Published: 21 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44066-3
Online ISBN: 978-3-031-44067-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal