Abstract
During the collection of real-time data, data missing is a common phenomenon in some stream data, which leads to the difficulty of such stream analysis and knowledge mining. In this paper, we study an approach of data imputation based on gamma distribution, which determines the number of missing data items in each time interval and the values of missing data items in a stream. We also present some metrics, such as fitting degree, credibility, matching degree, to evaluate the effectiveness of data imputation. Experimental results show that our approach improves the credibility by 15.0% to 20.0% compared to EMI, a widely adopted approach of data imputation, and outperforms traditional techniques significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Xiao, Q., Chen, S., Zhou, Y., et al.: Estimating cardinality for arbitrarily large data stream with improved memory efficiency. IEEE/ACM Trans. Netw. 28(2), 433–446 (2020)
Qiu, H., Noura, H., et al.: A user-centric data protection method for cloud storage based on invertible DWT. IEEE Trans. Cloud Comput. 1 (2019)
Security protection and checking for embedded system integration against buffer overflow attacks via hardware/software. IEEE Trans. Comput. 55(4), 443–453 (2006)
Turner, S., Albert, L.: Archived intelligent transportation system data quality: preliminary analyses of San Antonio TransGuide data. Transp. Res. Rec. 1719(1), 77–84 (2000)
Gai, K., Qiu, M., Zhao, H.: Privacy-preserving data encryption strategy for big data in mobile cloud computing. IEEE Trans. Big Data, 1 (2017)
Strike, K., Emam, K.E.: Software cost estimation with incomplete data. IEEE Trans. Softw. Eng. 27(10), 890–908 (2001)
Valarmathie, P., Dinakaran, K.: An efficient technique for missing value imputation in microarray gene expression data. In: Proceedings of IEEE International Conference on Computer Communication and Systems, Chennai, pp. 073–080 (2014)
Jea, K.F., Hsu, C.W.: A missing data imputation method with distance function. In: 2018 International Conference on Machine Learning and Cybernetics, Chengdu, pp. 450–455. IEEE (2018)
Chang, G., Ge, T.: Comparison of missing data imputation methods for traffic flow. In: IEEE International Conference on Transportation, Mechanical, and Electrical Engineering, Changchun, pp. 639–642. IEEE (2011)
Dempster, A.P., Laird, N.M.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39(1), 1–22 (1977)
Murray, J.S.: Multiple imputation: a review of practical and theoretical findings. Stat. Sci. 33(2), 142–159 (2018)
Dick, U., Haider, P.: Learning from incomplete data with infinite imputations. In: the 25th International Conference on Machine Learning, pp. 232–239. Association for Computing Machinery, Helsinki (2008)
Vellido, A.: Missing data imputation through GTM as a mixture of t-distributions. Neural Netw. 19(10), 1624–1635 (2006)
Deng, C.J., Chen, D.: A recombination information process method of missing data in WSN. Int. J. Electron. 104(6), 1063–1076 (2017)
Demirtas, H.: Multiple imputation under the generalized lambda distribution. J. Biopharm. Stat. 19(1), 77–89 (2009)
Li, H.C., Hong, W.: On the empirical-statistical modeling of SAR images with generalized gamma distribution. IEEE J. Sel. Top. Sign. Proces. 5(3), 386–397 (2011)
Luo, Q., Zhou, J.L.: Parameter estimation and hypothesis testing of two gamma populations with missing data. Math. Pract. Theory 47(13), 196–201 (2017)
Li, N.Y.: The empirical Bayes two-sided test for the parameter of gamma distribution family under random censored. J. Syst. Sci. Math. Sci. 31(4), 458–465 (2011)
Ma, X., Liu, X.D.: Lifetime distribution fitting method for a type of component with missing field data. Mech. Sci. Technol. Aerosp. Eng. 31(7), 1136–1139 (2012)
Saulo, H., Bourguignon, M.: Some simple estimators for the two-parameter gamma distribution. Commun. Stat.-Simul. Comput. 48, 13 (2018)
Thorn, H.C.S.: A note on the gamma distribution. Mon. Weather Rev. 86(4), 117–122 (1958)
Shenton, L.R., Bowman, K.O.: Further remarks on maximum likelihood estimators for the gamma distribution. Technometrics 14(3), 725–733 (1972)
Kyriakides, E., Heydt, G.T.: Calculating confidence intervals in parameter estimation: a case study. IEEE Trans. Power Delivery 21(1), 508–509 (2005)
Rehmana, A., Ashraf, S.: New results on the measures of transitivity. J. Intell. Fuzzy Syst. 36(4), 3825–3832 (2019)
Peters, J., Janzing, D.: Identifying cause and effect on discrete data using additive noise models. In: The Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 597–604. JMLR.org, Sardinia (2010)
Dalca, A.V., Bouman, K.L.: Medical image imputation from image collections. IEEE Trans. Med. Imaging 38(2), 504–514 (2019)
Acknowledgments
This work was supported by the National Social Science Foundation of China under grant No. 17BTQ086; the National Key R&D Program of China under grant No. 2019YFB1704100; the National Natural Science Foundation of China under grant No. 62072337; and Excellent experimental project of Tongji University under grant No. 1380104112.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, Z., Zeng, G., Ding, C. (2021). Imputation for Missing Items in a Stream Data Based on Gamma Distribution. In: Qiu, M. (eds) Smart Computing and Communication. SmartCom 2020. Lecture Notes in Computer Science(), vol 12608. Springer, Cham. https://doi.org/10.1007/978-3-030-74717-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-74717-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74716-9
Online ISBN: 978-3-030-74717-6
eBook Packages: Computer ScienceComputer Science (R0)