Abstract
Mobile Crowd Sensing (MCS) involves allocation of sensing tasks associated with an area of interest to a crowd of participants over time. Consequently, the collective amount of time and energy spent on sensing can be quite large. Sparse Mobile Crowd Sensing (Sparse-MCS) aims at reducing this overhead by reducing the number of sensing tasks, which results in obtaining sensed values from only some portions of the area or time. For those portions which are not thus covered, their corresponding values can be inferred from the collected sensed values. Hence, missing data inference is an integral part of Sparse-MCS. This study is divided into two phases: First, we explore the viability of using machine learning, viz., regression for missing data inference in Sparse-MCS. Hence, we explore several representative regression algorithms such as Linear Regression, LASSO, Elastic Net, Ridge, Decision Tree (DT), Random Forest (RF) and KNN. Using two real data-sets, we conclude that some algorithms such as DT and RF exhibit good performance (giving normalized mean absolute error much less than 0.1 most of the time) whereas the rest do not. Moreover, we compare these techniques with a state-of-the-art missing data inference method known as Compressing Sensing with the help of simulation results. Next, we propose a divide-and-conquer polynomial-time algorithm for task reduction which is based on the proposed inference approach. We also present the results of the analysis of the algorithm in terms of: (i) its time complexity, and (ii) lower and upper bounds on task reduction.
Similar content being viewed by others
References
Ganti RK, Ye F, Lei H (2011) Mobile crowdsensing: current state and future challenges. IEEE Commun Mag 49(11):32–39
Wang L, Zhang D, Wang Y, Chen C, Han X, M’hamed A (2016) Sparse mobile crowdsensing: challenges and opportunities: IEEE Commun Mag 54(7):161–167
Dutta P, et al (2009) Demo Abstract: Common Sense: participatory urban sensing using a network of handheld air quality monitors. In: Proceedings of the ACM SenSys, pp 349–50
Zhang X, Xie Z, Hu L, Huang Y, Pang J (2021) A semiopportunistic task allocation framework for mobile crowdsensing with deep learning. In: Wireless Communications and Mobile Computing, Hindawi, pp 1530–8669, https://doi.org/10.1155/2021/6643229,
Yin H, Yu Z, Wang L, Wang J, Han L, Guo B (2021) ISIATasker: task allocation for instant-sensing-instant-actuation mobile crowd sensin. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2021.3095160
Ji J, Guo Y, Gong D, Shen X (2021) Evolutionary multi-task allocation for mobile crowdsensing with limited resource. Swarm Evol Comput, 63. https://doi.org/10.1016/j.swevo.2021.100872
Liu W, Yang Y, Wang E, Wu J (2020) User recruitment for enhancing data inference accuracy in sparse mobile crowdsensing. IEEE Internet Things J 7(3):1802–1814. https://doi.org/10.1109/JIOT.2019.2957399
Kong L, Xia M, Liu X-Y, Chen G, Gu Y, Wu M-Y, Liu X (2014) Data loss and reconstruction in wireless sensor networks. IEEE Trans Parallel Distrib Syst 25(11):2818–2828. https://doi.org/10.1109/TPDS.2013.269
Zhu Y, Li Z, Zhu H, Li M, Zhang QA (2013) Compressive sensing approach to urban traffic estimation with probe vehicles. IEEE Trans Mob Comput 12(11):2289–2302
Wang L, Zhang D, Pathak A, Chen C, Xiong H, Yang D, Wang Y (2015) CCS-TA: quality-guaranteed online task allocation in compressing crowdsensing. In: Proceedings of UBICOMP 2015, Sep 7–11, Osaka, Japan
Wang L, Zhang D, Yang D, Pathak A, Chen C, Han X, Xiong H, Wang Y (2017) SPACE-TA: cost-effective task allocation exploiting intradata and interdata correlations in sparse crowdsensing. ACM Trans Intell Syst Technol 9(2), article 20
Wang L, Liu W, Zhang D, Wang Y, Wang E, Yang Y (2018) Cell selection with deep reinforcement learning in sparse mobile crowdsensing. In: Proceedings of 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp 1543-1546. https://doi.org/10.1109/ICDCS.2018.00164
Marchang N, Tripathi R (2020) KNN-ST: exploiting spatio-temporal correlation for missing data inference in environmental crowd sensing. IEEE Sensors, early access article; https://doi.org/10.1109/JSEN.2020.3024976
Jerez JM, Molina I, García-Laencina PJ, Alba E, Ribelles N (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105–115
Lakshminarayan K, Harp SA, Goldman R, Samad T (1996) Imputation of missing data using machine learning techniques. In: Proceedings of Second International Conference on Knowledge Discovery and Data Mining, edited by Simoudis, Han and Fayyad, pp 140–145
Lakshminarayan K, Har S, Samad T (1999) Imputation of missing data in industrial databases. Appl Intell 11:259–275
Qin Y, Zhang S, Zhu X, Zhang J, Zhang C (2007) Semi-parametric optimization for missing data imputation. Appl Intell 27(1):79–88
Pyle D (1999) Data Preparation for Data Mining. Morgan Kaufmann
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann
Friedman JH, Kohavi R, Yun Y (1996) Lazy decision trees. In: Proceedings of National Conference on Artificial Intelligence, pp 717–724
Cheeseman P, Stutz J (1996) Bayesian classification (Autoclass): theory and results. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthirusamy R (eds) Advances in Knowledge Discovery and Data Mining
White AP (1987) Probabilistic induction by dynamic path generation in virtual trees. In: Bramer MA (ed) Research and Development in Expert Systems III, pp. 35–46
Shi W, Zhu Y, Zhang J, Tao X, Sheng G, Lian Y, Wang G, Chen Y (2015) Improving power grid monitoring data quality: an efficient machine learning framework for missing data prediction. In: Proceedings of 17th International Conference on High Performance Computing and Communications, pp 417–422 (2015)
Ma J, Cheng JC, Jiang F, Chen W, Wang M, Zhai C (2020) A bi-directional missing data imputation scheme based on LSTM and transfer learning for building energy data. Energy Build, 216(109941)
Ingelrest F, Barrenetxea G, Schaefer G, Vetterli M, Couach O, Parlange M (2010) Sensorscope: application-specific sensor network for environmental monitoring. ACM Trans Sens Netw 6(2): 1–32
Zheng Y, Liu F, Hsieh H-P (2013) U-air: when urban air quality inference meets big data. In: KDD 1436–1444
Shang J, Zheng Y, Tong W, Chang E, Yu Y (2014) Inferring gas consumption and pollution emission of vehicles throughout a city. In: 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1027–1036
Alswailim MA, Hassanein HS, Zulkernine M (2015) CRAWDAD dataset queensu/crowd\_temperature (v. 2015-11-20): derived from roma/taxi (v. 2014-07-17), downloaded from https://crawdad.org/queensu/crowd\_temperature/20151120, https://doi.org/10.15783/C7CG65
Durán-Rosal AM, Herv/’as-Martínez C, Tallón-Ballesteros AJ (2016) Massive missing data reconstruction in ocean buoys with evolutionary product unit neural networks. Ocean Eng, 117:292—301
Tak S, Woo S, Yeo H (2016) Data-driven imputation method for traffic data in sectional units of road links. IEEE Trans Intell Transp Syst 17:1762–1771
Tonini F, Dillon WW, Money ES (2016) Spatio-temporal reconstruction of missing forest microclimate measurements. Agric For Meteorol 2016(218–219):1–10
Londhe S, Dixit P, Shah S (2015) Infilling of missing daily rainfall records using artificial neural network. ISH J Hydraul Eng 2015(21):255–264
Tipton J, Hooten M, Goring S (2017) Reconstruction of spatio-temporal temperature from sparse historical records using robust probabilistic principal component regression. Adv Stat Clim Meteorol Oceanogr 2017(3):1–16
Ruan W, Xu P, Sheng QZ (2017) Recovering missing values from corrupted spatio-temporal sensory data via robust low-rank tensor completion. In: Proceedings of International Conference on Database Systems for Advanced Applications, Springer: Cham
Cheng S, Lu F, Peng P, Wu S (2018) A spatiotemporal multi-view-based learning method for short-term traffic forecasting. ISPRS Int J Geo-Inf 7:218. https://doi.org/10.3390/ijgi7060218
Luo X, Li D, Yang Y, Zhang S (2019) Spatiotemporal traffic flow prediction with KNN and LSTM. J Adv Transp, Volume (2019), Article ID 4145353. https://doi.org/10.1155/2019/4145353
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Zhu H, Zhu Y, Li M, Ni LM (2009) Seer: metropolitan-scale traffic perception based on lossy sensory data. In: Proceedings of IEEE INFOCOM
Zhang Y, Roughan M, Willinger W, Qiu L (2019) Spatio-temporal compressive sensing and internet traffic matrices. In: SIGCOMM 2019, pp 267–278
Baraniuk R (2007) Compressing sensing. IEEE Signal Process Mag 24(4):118–121
Candes EJ, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52(2):489–509
Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289–1306
Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9(6):717–772
Xiaofeng L et al (2020) Spatial imputation for air pollutants data sets via low rank matrix completion algorithm. Environ Int 139:105713
Wang Z, Lai M-J, Lu Z, Fan W, Davulcu H, Ye J (2014) Rank-one matrix pursuit for matrix completion. In: Proceedings of International Conference on Machine Learning, Beijing, China, pp 91–99
Gotoh JY, Takeda A, Tono K (2018) DC formulations and algorithms for sparse optimization problems. Math Program 169:141–176
Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition, pp 5115–5124
Etemadi M, Ghobaei-Arani M, Shahidinejad A (2020) Resource provisioning for IoT services in the fog computing environment: an autonomic approach. Comput Commun 161:109–131. https://doi.org/10.1016/j.comcom.2020.07.028
Aslanpour MS, Dashti SE, Ghobaei-Arani M, Rahmanian AA (2018) Resource provisioning for cloud applications: a 3-D, provident and flexible approach. J Supercomput, 74(12):6470–6501. https://doi.org/10.1007/s11227-017-2156-x
Ghobaei-Arani M, Shahidinejad A (2021) An efficient resource provisioning approach for analyzing cloud workloads: a metaheuristic-based clustering approach. J Supercomput 77(1):711–750. https://doi.org/10.1007/s11227-020-03296-w
Cormen TH, Leiserson CE, Rivest LR, Stien C. Introduction to Algorithms, 3rd Edition. MIT Press
Acknowledgements
This work is partially supported by DST-SERB, Government of India under grant EEQ/2017/000083.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Marchang, N., Meitei, G.M. & Thakur, T. Task reduction using regression-based missing data imputation in sparse mobile crowdsensing. J Supercomput 78, 15995–16028 (2022). https://doi.org/10.1007/s11227-022-04518-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04518-z