Abstract
There are many big data sources in Indonesia, for example, data from social media, financial transactions, transportation, call detail records, and e-commerce. These types of data have been considered as potential resources to complement periodic surveys and censuses to monitor development indicators such as poverty levels. Data from e-commerce in particular could potentially represent the real expenditure of households, better complying with the formal calculation of the poverty line than other datasets. The contribution of this research is to propose a framework for poverty rate estimation based on e-commerce data using machine learning algorithms. The influence of items and aspects in e-commerce data was investigated in conjunction with poverty rate estimation. The experimental result showed that e-commerce data could potentially be used as a proxy for calculating city-level poverty rates. It was also found that cars and motorbikes are the two most significant items for poverty prediction in Indonesia.






Similar content being viewed by others
References
Moore, B., Akib, K., & Sugden, S. (2018). E-commerce in Indonesia: A guide for Australian business. Sydney. Retrieved January 6, 2019, from https://www.austrade.gov.au/ArticleDocuments/1358/E-commerce-in-Indonesia-Guide.pdf.aspx.
OECD. (2018). Poverty rate (indicator). In Organisation for economic co-operation and development. Retrieved July 26, 2018, from https://data.oecd.org/inequality/poverty-rate.htm.
Indonesia, B.-S. (2018). National social and economic survey, Jakarta. Retrieved June 13, 2020, from https://microdata.bps.go.id/mikrodata/index.php/catalog/SUSENAS/about.
BPS—Statistics Indonesia. (2018). Kemiskinan dan Ketimpangan. Retrieved June 13, 2020, from https://www.bps.go.id/subject/23/kemiskinan-dan-ketimpangan.html.
Kipkosgei Lagat, A. (2019). Support vector regression and artificial neural network approaches: Case of economic growth in East Africa community. American Journal of Theoretical and Applied Statistics, 7(2), 67. https://doi.org/10.11648/j.ajtas.20180702.13.
Shirzad, A., Tabesh, M., & Farmani, R. (2014). A comparison between performance of support vector regression and artificial neural network in prediction of pipe burst rate in water distribution networks. KSCE Journal of Civil Engineering, 18(4), 941–948. https://doi.org/10.1007/s12205-014-0537-8.
Naguib, I. A., & Darwish, H. W. (2012). Support vector regression and artificial neural network models for stability indicating analysis of mebeverine hydrochloride and sulpiride mixtures in pharmaceutical preparation: A comparative study. Spectrochimica Acta—Part A: Molecular and Biomolecular Spectroscopy, 86, 515–526. https://doi.org/10.1016/j.saa.2011.11.003.
Mustakim, B. A., & Hermadi, I. (2016). Performance comparison between support vector regression and artificial neural network for prediction of oil palm production. Journal of Computer Science and Information, 1, 99–102. https://doi.org/10.21609/jiki.v9i1.287.
Guo, K. H., & Wang, X. Y. (2011). Comparisons of support vector regression and neural network in modelling the hydraulic damper. Advanced Materials Research, 403–408, 3805–3812. https://doi.org/10.4028/www.scientific.net/amr.403-408.3805.
Wijaya, D. R., Sarno, R., & Zulaika, E. (2019). Noise filtering framework for electronic nose signals: An application for beef quality monitoring. Computers and Electronics in Agriculture, 157(January 2018), 305–321. https://doi.org/10.1016/j.compag.2019.01.001.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. https://doi.org/10.1109/MSP.2012.2205597.
Li, X., He, Q., Wang, Q., Huang, Q., Li, Y., Zhang, X., et al. (2017). Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection. Multimedia Tools and Applications, 77(1), 897–916. https://doi.org/10.1007/s11042-016-4332-z.
Guan, W. (2018). Performance optimization of speech recognition system with deep neural network model. Optical Memory and Neural Networks, 27(4), 272–282. https://doi.org/10.3103/s1060992x18040094.
Du, J., & Xu, Y. (2017). Hierarchical deep neural network for multivariate regression. Pattern Recognition, 63(June 2015), 149–157. https://doi.org/10.1016/j.patcog.2016.10.003.
Braithwaite, A., Dasandi, N., & Hudson, D. (2016). Does poverty cause conflict? Isolating the causal origins of the conflict trap. Conflict Management and Peace Science, 33(1), 45–66. https://doi.org/10.1177/0738894214559673.
Målqvist, M. (2015). Abolishing inequity, a necessity for poverty reduction and the realisation of child mortality targets. Archives of Disease in Childhood, 100(Suppl 1), S5–S9. https://doi.org/10.1136/archdischild-2013-305722.
Fund, U. N. P. (2014). Population and poverty. Retrieved July 1, 2019, from https://www.unfpa.org/resources/population-and-poverty.
Steele, J. E., Sundsøy, R., Pezzulo, C., Alegana, V. A., Steele, J. E., Bird, T. J., et al. (2017). Mapping poverty using mobile phone and satellite data. Journal of the Royal Society, Interface. https://doi.org/10.1098/rsif.2016.0690.
The United Nations. (2015). The millennium development goals report. United Nations. ISBN 978-92-1-101320-7.
Blumenstock, J., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350(6264), 1073–1076. https://doi.org/10.1126/science.aac4420.
Soto, V., & Virseda, J. (2011). Prediction of socio-economic levels using cellphone records. In J. A. Konstan, R. Conejo, J. L. Marzo, & N. Oliver (Eds.), International conference on user modeling, adaptation, and personalization (pp. 377–388). Girona: Springer. https://doi.org/10.1007/978-3-642-22362-4.
Mellander, C., Lobo, J., Stolarick, K., & Matheson, Z. (2015). Night-time light data: A good proxy measure for economic activity? PLoS ONE, 10(10), 1–18. https://doi.org/10.1371/journal.pone.0139779.
Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794. https://doi.org/10.1126/science.aaf7894.
Babenko, B., Hersh, J., Newhouse, D., Ramakrishnan, A., & Swartz, T. (2017). Poverty mapping using convolutional neural networks trained on high and medium resolution satellite images, with an application in Mexico. In 31st conference on neural information processing systems (NIPS 2017) (pp. 1–4). Long Beach. https://doi.org/10.1109/vppc.2005.1554579.
Perez, A., Azzari, G., & Burke, M. (2017). Poverty prediction with public Landsat 7 satellite imagery and machine learning. In 31st conference on neural information processing systems (NIPS 2017). Long Beach: Neural Information Processing Systems Foundation, Inc.
Pandey, S. M., Agarwal, T., & Krishnan, N. C. (2018). Multi-task deep learning for predicting poverty from satellite images. In The thirtieth AAAI conference on innovative applications of artificial intelligence (IAAI-18) (pp. 7793–7798). New Orleans: Association for the Advancement of Artificial Intelligence.
Njuguna, C., & McSharry, P. (2017). Constructing spatiotemporal poverty indices from big data. Journal of Business Research, 70, 318–327. https://doi.org/10.1016/j.jbusres.2016.08.005.
Pokhriyal, N., & Christophe, D. (2017). Combining disparate data sources for improved poverty prediction and mapping. Proceedings of the National Academy of Sciences of the United States of America. https://doi.org/10.1073/pnas.1700319114.
Alencar, P., & Cowan, D. (2018). The use of machine learning algorithms in recommender systems: A systematic review. Expert Systems with Applications, 97, 205–227. https://doi.org/10.1016/j.eswa.2017.12.020.
Tian, F., Wu, F., Chao, K. M., Zheng, Q., Shah, N., Lan, T., et al. (2016). A topic sentence-based instance transfer method for imbalanced sentiment classification of Chinese product reviews. Electronic Commerce Research and Applications, 16, 66–76. https://doi.org/10.1016/j.elerap.2015.10.003.
Lee, S., & Kim, W. (2017). Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification. Electronic Commerce Research and Applications, 26, 35–49. https://doi.org/10.1016/j.elerap.2017.09.006.
Li, Q., Kurniajaya, K. J., Tseng, K.-K., Zhou, H., & Lin, R. F.-Y. (2017). Price prediction of e-commerce products through Internet sentiment analysis. Electronic Commerce Research, 18(1), 65–88. https://doi.org/10.1007/s10660-017-9272-9.
Rout, J. K., Choo, K. K. R., Dash, A. K., Bakshi, S., Jena, S. K., & Williams, K. L. (2018). A model for sentiment and emotion analysis of unstructured social media text. Electronic Commerce Research, 18(1), 181–199. https://doi.org/10.1007/s10660-017-9257-8.
Wang, Y., Lu, X., & Tan, Y. (2018). Impact of product attributes on customer satisfaction: An analysis of online reviews for washing machines. Electronic Commerce Research and Applications, 29, 1–11. https://doi.org/10.1016/j.elerap.2018.03.003.
Yang, S., Joo, H., & Youm, S. (2019). Demand forecasting model development through big data analysis. Electronic Commerce Research. https://doi.org/10.1007/s10660-019-09337-8.
Ou, W., Huynh, V. N., & Sriboonchitta, S. (2018). Training attractive attribute classifiers based on opinion features extracted from review data. Electronic Commerce Research and Applications, 32(October), 13–22. https://doi.org/10.1016/j.elerap.2018.10.003.
Zhang, W., Du, Y., Yang, Y., & Yoshida, T. (2018). DeRec: A data-driven approach to accurate recommendation with deep learning and weighted loss function. Electronic Commerce Research and Applications, 31(August), 12–23. https://doi.org/10.1016/j.elerap.2018.08.001.
Vincent, O. R., Makinde, A. S., & Akinwale, A. T. (2017). A cognitive buying decision-making process in B2B e-commerce using Analytic-MLP. Electronic Commerce Research and Applications, 25, 59–69. https://doi.org/10.1016/j.elerap.2017.08.002.
Wijaya, D. R., & Afianti, F. (2020). Stability assessment of feature selection algorithms on homogeneous datasets: A study for sensor array optimization problem. IEEE Access, 8, 33944–33953. https://doi.org/10.1109/ACCESS.2020.2974982.
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., et al. (2016). Feature selection: A data perspective. ACM Computing Surveys. https://doi.org/10.1145/3136625.
Liu, X., Zhang, H., Kong, X., & Lee, K. Y. (2020). Wind speed forecasting using deep neural network with feature selection. Neurocomputing. https://doi.org/10.1016/j.neucom.2019.08.108.
Wang, L., Yan, X., Liu, M. L., Song, K. J., Sun, X. F., & Pan, W. W. (2019). Prediction of RNA-protein interactions by combining deep convolutional neural network with feature selection ensemble method. Journal of Theoretical Biology, 461, 230–238. https://doi.org/10.1016/j.jtbi.2018.10.029.
Jiang, S., Chin, K. S., Wang, L., Qu, G., & Tsui, K. L. (2017). Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Systems with Applications, 82, 216–230. https://doi.org/10.1016/j.eswa.2017.04.017.
Mirzaei, A., Pourahmadi, V., Soltani, M., & Sheikhzadeh, H. (2020). Deep feature selection using a teacher-student network. Neurocomputing, 383, 396–408. https://doi.org/10.1016/j.neucom.2019.12.017.
Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In International conference on machine learning (ICML) (pp. 1–8).
Wijaya, D. R., Sarno, R., & Zulaika, E. (2016). Sensor array optimization for mobile electronic nose: Wavelet transform and filter based feature selection approach. International Review on Computers and Software, 11(8), 659–671. https://doi.org/10.15866/irecos.v11i8.9425.
Brown, G., Pocock, A., Zhao, M.-J., & Lujan, M. (2012). Conditional likelihood maximisation: A unifying framework for mutual information feature selection. Journal of Machine Learning Research, 13, 27–66. https://doi.org/10.1016/j.patcog.2015.11.007.
Hariyanto, S. R., & Wijaya, D. R. (2017). Detection of diabetes from gas analysis of human breath using e-Nose. In 2017 11th international conference on information & communication technology and system (ICTS) (Vol. 0, pp. 241–246). Surabaya: IEEE. https://doi.org/10.1109/icts.2017.8265677.
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–27. https://doi.org/10.1145/1961189.1961199.
Greff, K., Srivastava, R. K., Koutnik, J., Steunebrink, B. R., & Schmidhuber, J. (2017). LSTM: A search space Odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924.
Kalman, B. L., & Kwasny, S. C. (1992). Why tanh: Choosing a sigmoidal function. In IJCNN international joint conference on neural networks (pp. 578–581). Baltimore: IEEE. https://doi.org/10.1109/ijcnn.1992.227257.
Deeplearning4j Development Team. (2017). Deeplearning4j: Open-source distributed deep learning for the JVM. Apache Software Foundation License 2.0. San Francisco: Skymind. Retrieved January 6, 2019, from http://deeplearning4j.org.
Baranyi, J., Pin, C., & Ross, T. (1999). Validating and comparing predictive models. International Journal of Food Microbiology, 48(3), 159–166.
Indonesia, B.-S. (2018). Persentase Penduduk Miskin Menurut Kabupaten/Kota, 2015–2017. Jakarta. Retrieved January 6, 2019, from https://www.bps.go.id/dynamictable/2017/08/03/1261/persentase-penduduk-miskin-menurut-kabupaten-kota-2015%972017.html.
Acknowledgements
This work was supported by Pulse Lab Jakarta (PLJ), which is a joint initiative of the United Nations and the Government of Indonesia.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wijaya, D.R., Paramita, N.L.P.S.P., Uluwiyah, A. et al. Estimating city-level poverty rate based on e-commerce data with machine learning. Electron Commer Res 22, 195–221 (2022). https://doi.org/10.1007/s10660-020-09424-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10660-020-09424-1