Abstract
With the increasing development of electronic technology, traditional paper-driven medical systems have been converting to efficient electronic records that can be easily checked and transmitted. However, due to system updating and equipment failure, missing data problems are very common in the healthcare field. Health data can help people evaluate their health status and adjust their fitness. Therefore, predicting missing health data is a current pressing task. There are two challenges when predicting missing data: (1) people’s health data are complex. The data contain multiple data types (such as continuous data, discrete data and Boolean data) and (2) privacy issues are raised at the edge because huge amounts of health data are published while the edge devices can only provide limited computing and storage resources. Therefore, a novel multitype health data privacy-aware prediction approach based on locality-sensitive hashing is proposed in this paper. Through locality-sensitive hashing, our proposed method can realize a good tradeoff between prediction accuracy and privacy preservation. Finally, through a set of experiments deployed on the WISDM dataset, we verify the validity of our approach in dealing with multitype data and attaining user privacy.






Similar content being viewed by others
References
Agarwal, A., Sharma, S., Kumar, V., Kaur, M.: Effect of E-Learning on public health and environment during COVID-19 Lockdown. Big Data Mining and Analytics 4(2), 104–115 (2021)
Ahila, S. S., Shunmuganathan, K.L.: Role of agent technology in web usage mining: homomorphic encryption based recommendation for ecommerce applications. Wireless Personal Communications 87(2), 499–512 (2016)
Cai, Z., Zheng, X.: A private and efficient mechanism for data uploading in smart cyber-physical systems. IEEE Transactions on Network Science and Engineering (TNSE) 7(2), 766–775 (2020)
Cheng, C. H., Chan, C. P., Sheu, Y.J.: A novel purity-based k nearest neighbors imputation method and its application in financial distress prediction. Eng. Appl. Artif. Intel. 81, 283–299 (2019)
Dou, W., Zhang, X., Liu, J., Chen, J.: hiresome-II: Towards privacy-aware cross-cloud service composition for big data applications. IEEE Transactions on Parallel and Distributed Systems 26(2), 455–466 (2015)
Dou, K., Guo, B., Kuang, L.: A privacy-preserving multimedia recommendation in the context of social network based on weighted noise injection. Multimedia Tools and Applications 78(19), 26907–26926 (2019)
Gerber, F., Jong, de R., Schaepman, M.E., Schaepman-Strub, G., Furrer, R.: Predicting missing values in spatio-temporal remote sensing data. IEEE Transactions on Geoscience and Remote Sensing 56(5), 2841–2853 (2018)
Gionis, A., Indyky, P., Motwani, R.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Databases (1999)
Gupta, V. K., Gupta, A., Kumar, D., Sardana, A.: Prediction of COVID-19 confirmed, death, and cured cases in india using random forest model. Big Data Mining and Analytics 4(2), 116–123 (2021)
Huang, H., Lin, J., Wu, L., Fang, B., Wen, Z., Sun, F.: Machine learning-based multi-modal information perception for soft robotic hands. Tsinghua Sci. Technol. 25(02), 255–269 (2020)
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. https://doi.org/10.1145/276698.276876 (1998)
Ioannidis, Y., et al.: Data mining and query log analysis for scalable temporal and continuous query answering (2015)
Kumari, R., Kumar, S., Poonia, R. C., Singh, V., Raja, L., Bhatnagar, V., Agarwal, P.: Analysis and predictions of spread, recovery, and death caused by COVID-19 in India. Big Data Mining and Analytics 4(2), 65–75 (2021)
Kwapisz, J. R., Weiss, G. M., Moore, S.A.: Activity recognition using cell phone accelerometers. SIGKDD Explor. Newsl. 12(2), 74–82 (2011)
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: Privacy Beyond kAnonymity and l-Diversity. In: International Conference on Data Engineering. https://doi.org/10.1109/ICDE.2007.367856 (2007)
Li, D., Chen, C., Lv, Q., Shang, L., Zhao, Y., Lu, T., Gu, N.: An algorithm for efficient privacy-preserving item-based collaborative filtering. Futur. Gener. Comput. Syst. 55, 311–320 (2016)
Li, C., Palanisamy, B., Josh, J.: Differentially private trajectory analysis for points-of-interest recommendation. In: IEEE International Congress on Big Data. https://doi.org/10.1109/BigDataCongress.2017.16 (2017)
Li, D., Zhang, W., Shen, S., Zhang, Y.: SES-LSH: Shuffle-Efficient Locality Sensitive Hashing for Distributed Similarity Search. In: IEEE International Conference on Web Services. https://doi.org/10.1109/ICWS.2017.99 (2017)
li, B., He, Q., Chen, F., Jn, H., Xiang, Y., Yang, Y.: Auditing cache data integrity in the edge computing environment. IEEE Transactions on Parallel and Distributed Systems 32(5), 1210–1223 (2021)
Liu, Y., Wang, F., Yang, Y., Zhang, X., Wang, H., Dai, H., Qi, L.: An attention-based category-aware GRU model for next POI recommendation. International Journal of Intelligent Systems https://doi.org/10.1002/int.22412 (2021)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy beyond K-anonymity. International Conference on Data Engineering 1(1), 3–es (2006)
Monteiro, A., Mathew, A. J., Colaco, G. V., Fernandes, M., Fernandes, K. R.: The Mechanism to Combat Data Leakage Trojans in Circuits using Ranomized Encoding. In: IEEE International Conference on Distributed Computing. https://doi.org/10.1109/DISCOVER50404.2020.9278053 (2020)
Qi, L., Zhang, X., Dou, W., Ni, Q.: A distributed locality-sensitive hashing-based approach for cloud service recommendation from multi-source data. IEEE Journal on Selected Areas in Communications 35(11), 2616–2624 (2017)
Qi, L., Wang, X., Xu, X., Dou, W., Li, S.: Privacy-aware cross-platform service recommendation based on enhanced locality-sensitive hashing. In: IEEE Transactions on Network Science and Engineering. https://doi.org/10.1109/TNSE.2020.2969489 (2020)
Rusdah, D. A., Murfi, H.: XGBoost in handling missing values for life insurance risk prediction. SN Appl. Sci. 2(8), 1336 (2020)
Shi, W., Zhu, Y., Yu, P. S., Huang, T., Wang, C., Mao, Y., Chen, Y.: Temporal dynamic matrix factorization for missing data prediction in large scale coevolving time series. IEEE Access 4, 6719–6732 (2016)
Shu, J., Jia, X., Yang, K., Wang, H.: Privacy-preserving task recommendation services for crowdsourcing. IEEE Transactions on Services Computing https://doi.org/10.1109/TSC.2018.2791601 (2018)
Singh, K. K., Singh, A.: Diagnosis of COVID-19 from Chest X-Ray images using wavelets-based depthwise convolution network. Big Data Mining and Analytics 4(2), 84–93 (2021)
Sun, Z., Wang, Y., Cai, Z., Liu, T., Tong, X., Jiang, N.: A two-stage privacy protection mechanism based on blockchain in mobile crowdsourcing. International Journal of Intelligent Systems. https://doi.org/10.1002/int.22371 (2021)
Wang, Y., Cai, Z., Tong, X., Gao, Y., Yin, G.: Truthful incentive mechanism with location privacy-preserving for mobile crowdsourcing systems. Computer Network 135, 32–43 (2018)
Wang, Y., Cai, Z., Zhan, Z., Gong, Y., Tong, X.: An optimization and auction based incentive mechanism to maximize social welfare for mobile crowdsourcing. IEEE Trans. Comput. Soc. Syst. 6(3), 414–429 (2019)
Xia, Z., Wang, X., Zhang, L., Qin, Z., Sun, X., Ren, K.: A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing. IEEE Trans. Inform. Forens. Sec. 11(11), 2594–2608 (2016)
Xia, X., Chen, F., He, Q., Grundy, J., Abdelrazek, M., Jin, H.: Online collaborative data caching in edge computing. IEEE Transactions on Parallel and Distributed Systems 32(2), 281–294 (2021)
Xia, X., Chen, F., He, Q., Grundy, J., Abdelrazek, M., Jin, H.: Cost-Effective App data distribution in edge computing. IEEE Transactions on Parallel and Distributed Systems 32(1), 31–44 (2021)
Xiong, Y., Chen, S., Qin, H., Cao, H., Shen, Y., Wang, X., Chen, Q., Yan, J., Tang, B.: Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity. BMC Medical Informatics and Decision Making, 20(1). https://doi.org/10.1186/s12911-020-1045-z (2020)
Xu, X., Li, H., Xu, W., Liu, Z., Yao, L., Dai, F.: Artificial intelligence for edge service optimization in internet of vehicles: A survey. Tsinghua Science and Technology. https://doi.org/10.26599/TST.2020.901 (2020)
Xu, X., Huang, Q., Zhu, H., Sharma, S., Zhang, X., Qi, L., Bhuiyan, M.Z.A.: Secure service offloading for internet of vehicles in SDN-Enabled mobile edge computing. IEEE Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/TITS.2020.3034197 (2020)
Yuan, L., He, Q., Tan, S., Li, B., Yu, J., Chen, F., Jin, H., Yang, Y.: A decentralized blockchain-based platform for cooperative edge computing. In: 30th The Web Conference, Ljubljana, Slovenia. https://doi.org/10.1145/3442381.3449994 (2021)
Yue, Z., Chu, X., Xia, J.: PredCID: Prediction of driver frameshift indels in human cancer. Briefings in Bioinformatics. https://doi.org/10.1093/bib/bbaa119 (2020)
Zhang, K., Fan, S., Wang, H.J.: An efficient recommender system using locality sensitive hashing. In: The 51th Annual Hawaii International Conference on System Sciences. https://doi.org/10.24251/HICSS.2018.098 (2018)
Zhang, Y., Pan, J., Qi, L., He, Q.: Privacy-Preserving Quality Prediction for Edge-based IoT Services. Future Generation Computer Systems. https://doi.org/10.1016/j.future.2020.08.014 (2020)
Zhang, X., Yan, C., Gao, C., Malin, B. A., Chen, Y.: Predicting Missing Values in Medical Data Via XGBoost Regression. Journal of Healthcare Informatics Research 4(4), 383–394 (2020)
Zhao, X., Wang, Z., Gao, L., Li, Y., Wang, S.: Incremental face clustering with optimal summary learning via graph convolutional network. Tsinghua Sci. Technol. 26(4), 536–547 (2021)
Zheng, X., Cai, Z., Li, J., Gao, H.: Location-privacy-aware review publication mechanism for local business service systems. In: IEEE International Conference on Computer Communications. https://doi.org/10.1109/INFOCOM.2017.8056976 (2017)
Zhou, P., Zhou, Y., Wu, D., Jin, H.: Differentially private online learning for cloud-based video recommendation with multimedia big data in social networks. IEEE Transactions on Multimedia 18(6), 1217–1229 (2016)
Zhu, J., He, P., Zheng, Z., Lyu, M.R.: A privacy-preserving QoS prediction framework for web service recommendation. In: IEEE International Conference on Web Services. https://doi.org/10.1109/ICWS.2015.41 (2015)
Zhu, T., Li, G., Zhou, W., Xiong, P., Yuan, C.: Privacy-preserving topic model for tagging recommender systems. Knowl. Inf. Syst. 46(1), 33–58 (2016)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 61872219) and the Natural Science Foundation of Shandong Province (ZR2019MF001).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Resource Management at the Edge for Future Web, Mobile and IoT Applications
Guest Editors: Qiang He, Fang Dong, Chenshu Wu, and Yun Yang
Rights and permissions
About this article
Cite this article
Kong, L., Wang, L., Gong, W. et al. LSH-aware multitype health data prediction with privacy preservation in edge environment. World Wide Web 25, 1793–1808 (2022). https://doi.org/10.1007/s11280-021-00941-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-021-00941-z