Skip to main content

Imputation for Missing Items in a Stream Data Based on Gamma Distribution

  • Conference paper
  • First Online:
Smart Computing and Communication (SmartCom 2020)

Abstract

During the collection of real-time data, data missing is a common phenomenon in some stream data, which leads to the difficulty of such stream analysis and knowledge mining. In this paper, we study an approach of data imputation based on gamma distribution, which determines the number of missing data items in each time interval and the values of missing data items in a stream. We also present some metrics, such as fitting degree, credibility, matching degree, to evaluate the effectiveness of data imputation. Experimental results show that our approach improves the credibility by 15.0% to 20.0% compared to EMI, a widely adopted approach of data imputation, and outperforms traditional techniques significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Xiao, Q., Chen, S., Zhou, Y., et al.: Estimating cardinality for arbitrarily large data stream with improved memory efficiency. IEEE/ACM Trans. Netw. 28(2), 433–446 (2020)

    Article  Google Scholar 

  2. Qiu, H., Noura, H., et al.: A user-centric data protection method for cloud storage based on invertible DWT. IEEE Trans. Cloud Comput. 1 (2019)

    Google Scholar 

  3. Security protection and checking for embedded system integration against buffer overflow attacks via hardware/software. IEEE Trans. Comput. 55(4), 443–453 (2006)

    Google Scholar 

  4. Turner, S., Albert, L.: Archived intelligent transportation system data quality: preliminary analyses of San Antonio TransGuide data. Transp. Res. Rec. 1719(1), 77–84 (2000)

    Article  Google Scholar 

  5. Gai, K., Qiu, M., Zhao, H.: Privacy-preserving data encryption strategy for big data in mobile cloud computing. IEEE Trans. Big Data, 1 (2017)

    Google Scholar 

  6. Strike, K., Emam, K.E.: Software cost estimation with incomplete data. IEEE Trans. Softw. Eng. 27(10), 890–908 (2001)

    Article  Google Scholar 

  7. Valarmathie, P., Dinakaran, K.: An efficient technique for missing value imputation in microarray gene expression data. In: Proceedings of IEEE International Conference on Computer Communication and Systems, Chennai, pp. 073–080 (2014)

    Google Scholar 

  8. Jea, K.F., Hsu, C.W.: A missing data imputation method with distance function. In: 2018 International Conference on Machine Learning and Cybernetics, Chengdu, pp. 450–455. IEEE (2018)

    Google Scholar 

  9. Chang, G., Ge, T.: Comparison of missing data imputation methods for traffic flow. In: IEEE International Conference on Transportation, Mechanical, and Electrical Engineering, Changchun, pp. 639–642. IEEE (2011)

    Google Scholar 

  10. Dempster, A.P., Laird, N.M.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39(1), 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  11. Murray, J.S.: Multiple imputation: a review of practical and theoretical findings. Stat. Sci. 33(2), 142–159 (2018)

    Article  MathSciNet  Google Scholar 

  12. Dick, U., Haider, P.: Learning from incomplete data with infinite imputations. In: the 25th International Conference on Machine Learning, pp. 232–239. Association for Computing Machinery, Helsinki (2008)

    Google Scholar 

  13. Vellido, A.: Missing data imputation through GTM as a mixture of t-distributions. Neural Netw. 19(10), 1624–1635 (2006)

    Article  Google Scholar 

  14. Deng, C.J., Chen, D.: A recombination information process method of missing data in WSN. Int. J. Electron. 104(6), 1063–1076 (2017)

    Article  Google Scholar 

  15. Demirtas, H.: Multiple imputation under the generalized lambda distribution. J. Biopharm. Stat. 19(1), 77–89 (2009)

    Article  MathSciNet  Google Scholar 

  16. Li, H.C., Hong, W.: On the empirical-statistical modeling of SAR images with generalized gamma distribution. IEEE J. Sel. Top. Sign. Proces. 5(3), 386–397 (2011)

    Article  Google Scholar 

  17. Luo, Q., Zhou, J.L.: Parameter estimation and hypothesis testing of two gamma populations with missing data. Math. Pract. Theory 47(13), 196–201 (2017)

    MathSciNet  MATH  Google Scholar 

  18. Li, N.Y.: The empirical Bayes two-sided test for the parameter of gamma distribution family under random censored. J. Syst. Sci. Math. Sci. 31(4), 458–465 (2011)

    MATH  Google Scholar 

  19. Ma, X., Liu, X.D.: Lifetime distribution fitting method for a type of component with missing field data. Mech. Sci. Technol. Aerosp. Eng. 31(7), 1136–1139 (2012)

    Google Scholar 

  20. Saulo, H., Bourguignon, M.: Some simple estimators for the two-parameter gamma distribution. Commun. Stat.-Simul. Comput. 48, 13 (2018)

    Google Scholar 

  21. Thorn, H.C.S.: A note on the gamma distribution. Mon. Weather Rev. 86(4), 117–122 (1958)

    Article  Google Scholar 

  22. Shenton, L.R., Bowman, K.O.: Further remarks on maximum likelihood estimators for the gamma distribution. Technometrics 14(3), 725–733 (1972)

    Article  Google Scholar 

  23. Kyriakides, E., Heydt, G.T.: Calculating confidence intervals in parameter estimation: a case study. IEEE Trans. Power Delivery 21(1), 508–509 (2005)

    Article  Google Scholar 

  24. Rehmana, A., Ashraf, S.: New results on the measures of transitivity. J. Intell. Fuzzy Syst. 36(4), 3825–3832 (2019)

    Article  Google Scholar 

  25. Peters, J., Janzing, D.: Identifying cause and effect on discrete data using additive noise models. In: The Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 597–604. JMLR.org, Sardinia (2010)

    Google Scholar 

  26. Dalca, A.V., Bouman, K.L.: Medical image imputation from image collections. IEEE Trans. Med. Imaging 38(2), 504–514 (2019)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Social Science Foundation of China under grant No. 17BTQ086; the National Key R&D Program of China under grant No. 2019YFB1704100; the National Natural Science Foundation of China under grant No. 62072337; and Excellent experimental project of Tongji University under grant No. 1380104112.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guosun Zeng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, Z., Zeng, G., Ding, C. (2021). Imputation for Missing Items in a Stream Data Based on Gamma Distribution. In: Qiu, M. (eds) Smart Computing and Communication. SmartCom 2020. Lecture Notes in Computer Science(), vol 12608. Springer, Cham. https://doi.org/10.1007/978-3-030-74717-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74717-6_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74716-9

  • Online ISBN: 978-3-030-74717-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics