Skip to main content

Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal

  • Conference paper
  • First Online:
Explainable Artificial Intelligence (xAI 2023)

Abstract

Explainable artificial intelligence (XAI) provides explanations for not interpretable machine learning (ML) models. While many technical approaches exist, there is a lack of validation of these techniques on real-world datasets. In this work, we present a use-case of XAI: an ML model which is trained to estimate electrification rates based on mobile phone data in Senegal. The data originate from the Data for Development challenge by Orange in 2014/15. We apply two model-agnostic, local explanation techniques and find that while the model can be verified, it is biased with respect to the population density. We conclude our paper by pointing to the two main challenges we encountered during our work: data processing and model design that might be restricted by currently available XAI methods, and the importance of domain knowledge to interpret explanations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://algorithmwatch.org/en/automating-society-2020/.

  2. 2.

    The code of the project can be found here: https://github.com/lstate/explainability-in-practice.git.

  3. 3.

    https://ourworldindata.org/grapher/mobile-cellular-subscriptions-per-100-people.

  4. 4.

    https://sdgs.un.org/goals.

  5. 5.

    https://data.worldbank.org/country/senegal.

  6. 6.

    The ratio between instances in class 9 and the full dataset before subsampling is \(imb = 33 \% \).

  7. 7.

    12.7M downloads of LIME python package, 63M downloads of SHAP python package, retrieved on 6th of April 2022 https://pepy.tech.

References

  1. Abebe, R., et al.: Narratives and counternarratives on data sharing in Africa. In: FAccT, pp. 329–341. ACM (2021)

    Google Scholar 

  2. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)

    Article  Google Scholar 

  3. Agence Nationale de la Statistique et de la Démographie: Census and gis data (2013). https://www.ansd.sn/index.php?option=com_content &view=article &id=134 &Itemid=262. Accessed Apr 2022

  4. Alvarez-Melis, D., Jaakkola, T.S.: On the robustness of interpretability methods. CoRR abs/1806.08049 (2018)

    Google Scholar 

  5. Beckh, K., et al.: Explainable machine learning with prior knowledge: an overview. CoRR abs/2105.10172 (2021)

    Google Scholar 

  6. Blumenstock, J.E.: Calling for better measurement: estimating an individual’s wealth and well-being from mobile phone transaction records. Center for Effective Global Action, UC Berkeley (2015). https://escholarship.org/uc/item/8zs63942

  7. Calegari, R., Ciatto, G., Omicini, A.: On the integration of symbolic and sub-symbolic techniques for XAI: a survey. Intelligenza Artificiale 14(1), 7–32 (2020)

    Article  Google Scholar 

  8. Craven, M.W., Shavlik, J.W.: Extracting tree-structured representations of trained networks. In: NIPS, pp. 24–30. MIT Press (1995)

    Google Scholar 

  9. Dempster, A., Petitjean, F., Webb, G.I.: ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Discov. 34(5), 1454–1495 (2020)

    Article  MathSciNet  Google Scholar 

  10. Guidotti, R., Monreale, A., Ruggieri, S., Pedreschi, D., Turini, F., Giannotti, F.: Local rule-based explanations of black box decision systems. CoRR abs/1805.10820 (2018)

    Google Scholar 

  11. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93:1–93:42 (2019)

    Google Scholar 

  12. Guidotti, R., Monreale, A., Spinnato, F., Pedreschi, D., Giannotti, F.: Explaining any time series classifier. In: CogMI, pp. 167–176. IEEE (2020)

    Google Scholar 

  13. Houngbonon, G.V., Quentrec, E.L., Rubrichi, S.: Access to electricity and digital inclusion: evidence from mobile call detail records. Hum. Soc. Sci. Commun. 8(1) (2021). https://doi.org/10.1057/s41599-021-00848-0

  14. Ledesma, C., Garonita, O.L., Flores, L.J., Tingzon, I., Dalisay, D.: Interpretable poverty mapping using social media data, satellite images, and geospatial information. CoRR abs/2011.13563 (2020)

    Google Scholar 

  15. Letouzé, E.: Applications and implications of big data for demo-economic analysis: the case of call-detail records. Ph.D. thesis, University of California, Berkeley, USA (2016)

    Google Scholar 

  16. Lundberg, S.M., Lee, S.: A unified approach to interpreting model predictions. In: NIPS 2017: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4765–4774 (2017)

    Google Scholar 

  17. Martinez-Cesena, E.A., Mancarella, P., Ndiaye, M., Schläpfer, M.: Using mobile phone data for electricity infrastructure planning. arXiv preprint (2015). https://arxiv.org/abs/1504.03899

  18. Molnar, C.: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2022). https://christophm.github.io/interpretable-ml-book/

  19. de Montjoye, Y., Smoreda, Z., Trinquart, R., Ziemlicki, C., Blondel, V.D.: D4D-senegal: the second mobile phone data for development challenge. CoRR abs/1407.4885 (2014)

    Google Scholar 

  20. Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Interpretable machine learning: definitions, methods, and applications. CoRR abs/1901.04592 (2019)

    Google Scholar 

  21. NOAA National Centers for Environmental Information (NCEI): Version 4 DMSP-OLS Nighttime Lights Time Series (2014). https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html. Accessed Apr 2022

  22. Okolo, C.T., Dell, N., Vashistha, A.: Making AI explainable in the global south: a systematic review. In: COMPASS, pp. 439–452. ACM (2022)

    Google Scholar 

  23. Pestre, G., Letouzé, E., Zagheni, E.: The ABCDE of big data: assessing biases in call-detail records for development estimates. World Bank Econ. Rev. 34(Supplement_1), S89–S97 (2019). https://doi.org/10.1093/wber/lhz039

  24. Pokhriyal, N., Jacques, D.C.: Combining disparate data sources for improved poverty prediction and mapping. Proc. Natl. Acad. Sci. 114(46), E9783–E9792 (2017). https://doi.org/10.1073/pnas.1700319114

    Article  Google Scholar 

  25. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)

    Google Scholar 

  26. Salat, H., Schläpfer, M., Smoreda, Z., Rubrichi, S.: Analysing the impact of electrification on rural attractiveness in Senegal with mobile phone data. R. Soc. Open Sci. 8(10) (2021). https://doi.org/10.1098/rsos.201898

  27. Salat, H., Smoreda, Z., Schläpfer, M.: A method to estimate population densities and electricity consumption from mobile phone data in developing countries. PLOS ONE 15(6) (2020). https://doi.org/10.1371/journal.pone.0235224

  28. Schmid, T., Bruckschen, F., Salvati, N., Zbiranski, T.: Constructing socio-demographic indicators for national statistical institutes using mobile phone data: estimating literacy rates in Senegal. J. R. Stat. Soc. Series A (Statistics in Society) 180(4), 1163–1190 (2017). https://www.jstor.org/stable/44682668

  29. Steele, J.E., et al.: Mapping poverty using mobile phone and satellite data. J. R. Soc. Interface 14(127), 20160690 (2017). https://doi.org/10.1098/rsif.2016.0690

    Article  Google Scholar 

Download references

Acknowledgements

Thanks to Salvatore Ruggieri and Franco Turini. This work has received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie Actions (grant agreement number 860630) for the project “NoBIAS - Artificial Intelligence without Bias” (https://nobias-project.eu/). This work reflects only the authors’ views and the European Research Executive Agency (REA) is not responsible for any use that may be made of the information it contains.

Author information

Authors and Affiliations

Authors

Contributions

Laura State: conceptualization, data processing and experiments, paper draft, writing and editing Hadrien Salat: data preparation, paper co-writing and editing Stefania Rubrichi: data curation, paper reviewing, co-supervision Zbigniew Smoreda: data curation, paper reviewing

Corresponding author

Correspondence to Laura State .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Data Distribution

Figure 6 shows the distribution of available call data (left panel) and text message data (right panel). While both data are more dense in the Western part of Senegal, text message data are specifically sparse in the Eastern part of the country.

Fig. 6.
figure 6

Spatial distribution of available data. Left panel: call data, right panel: text message data. Colored points reference to cell tower locations. Call data based on CN. Plots based on time series data, outgoing.

Table 3. Classification results. The higher the accuracy (acc) the better, the lower the MAE the better.
Fig. 7.
figure 7

Confusion matrix for the random forest model (accuracy of 0.516, MAE of 0.972, best performing model.

1.2 A.2 Additional Results

Additional results for the prediction of the electrification rate are shown in Table 3. This second set of models is also trained with default parameters.

Further, we present a confusion matrix of the best performing model (RF, Fig. 7).

1.3 A.3 Time Series Data

In this section, we briefly describe the work on time series data.

Data Processing. A time series \(S = s_1 .. s_T\) consists of \(T = 24 \times 12\) ordered data points, each being the monthly average of the aggregated number of events per hour, such that \(s_t, t \in 1 .. 24\) represent the monthly average of the aggregated number of events per hour in January (“daily activity curve” for January), etc. Events are separated by direction (incoming or outgoing). Thus, per cell tower, we create six time series. We refer to the TS dataset based on number of calls as CN, based on length of calls as CL and based on number of text messages as SN, and use out/in for outgoing/incoming activity, respectively. We standardize each of the time series separately by applying the min-max scaler as provided by sklearn. Data labeling and subsampling applies as above. Data partitioning follows [12].

We find that text message data is heavily imbalanced, and that the dataset is smaller than the other datasets. Thus, we exclude this data from the time series analysis.

The variational autoencoder that is used in the explanation as displayed below, is trained for \(k = 50\) dimensions and over \(e = 500\) epochs. We used the “out CL” data and model for explanations as it provides the smallest MAE.

Classification. To classify based on time series data, we use ROCKET (RandOm Convolutional KErnel Transform) [9], a method based on random convolutional kernels for feature extraction and linear classification. Results are displayed in Table 4.

Table 4. Classification results. The higher the accuracy the better, the lower the MAE the better.
Fig. 8.
figure 8

Local explanation by LASTS, shapelet-based, out CL data. Explained time series belongs to class 4, correctly classified by ML model. Left: factual rule, plotted against original time series, right: rule of opposite class, plotted against synthetically generated time series. Time steps in hours.

Explanations. Figure 8 shows a sample explanation by LASTS [12]. Explanations are provided in visual form and as rule, the latter can be read off the plot. The time series belongs to class 4, i.e. has an electrification rate between 0.4 and 0.5, and is correctly classified by the model. In the left panel, the factual rule is plotted against the original time series. The number above the shapelet indicates its index. The rule reads as follows: “If shapelet no. 12 is contained in the time series, then it is classified as class 4.” This is mirrored by the rule for instances belonging to the opposite class (here: class 0..3, 5..9): ‘If shapelet no. 12 is not contained in the time series, then it is not classified as class 4.”, displayed in Fig. 8, right, plotted against a synthetically generated time series from a class different to class 4.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

State, L., Salat, H., Rubrichi, S., Smoreda, Z. (2023). Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1902. Springer, Cham. https://doi.org/10.1007/978-3-031-44067-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44067-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44066-3

  • Online ISBN: 978-3-031-44067-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics