Skip to main content

An Efficient Approach for Query Processing of Incomplete High Dimensional Data Streams

  • Conference paper
  • First Online:
Book cover Advanced Machine Learning Technologies and Applications (AMLTA 2021)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1339))

  • 1575 Accesses

Abstract

Most recent applications such as sensor networks generate continuous data streams. Additional constraints are faced for efficient query processing of such data streams that have uncertain nature and require fast and timely processing. Traditional query processing techniques of static data process the whole data without partitioning them, which is not applicable to data streams. Applying data clustering is demanded as a preprocessing step of data streams. Thus, in this paper, we propose the Incomplete High dimensional Data streams Query processing (IHDQ) algorithm for efficiently answering data streams queries. Obtained results reveal the efficiency of clustering and query processing of the proposed IHDQ compared to the alternative state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Najib, F.M., Ismail, R.M., Badr, N.L., Gharib, T.: Clustering based approach for incomplete data streams processing. J. Intell. Fuzzy Syst. 38(3), 3213–3227 (2020)

    Article  Google Scholar 

  2. Najib, F.M., Ismail, R.M., Badr, N.L., Tolba, M.F.: Multiple queries optimization for data streams on cloud computing. In: Tenth International Conference on Computer Engineering & Systems (ICCES), pp. 28–33. IEEE (2015)

    Google Scholar 

  3. Liu, Y., Li, X., Chen, X., Wang, X., Li, H.: High-performance machine learning for large-scale data classification considering class imbalance. Sci. Program. (2020)

    Google Scholar 

  4. Najib, F.M., Ismail, R.M., Badr, N.L., Tolba, M.F.: Cloud-based data streams optimization. WIREs Data Min. Knowl. Discov. 8(3), e1247 (2018)

    Article  Google Scholar 

  5. Datta, S., Bhattacharjee, S., Das, S.: Clustering with missing features: a penalized dissimilarity measure based approach. Mach. Learn. 107(12), 1987–2025 (2018)

    Article  MathSciNet  Google Scholar 

  6. Bu, F., Chen, Z., Zhang, Q., Yang, L.T.: Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud. J. Supercomput. 72(8), 2977–2990 (2016)

    Article  Google Scholar 

  7. Dzulkalnine, M.F., Sallehuddin, R.: Missing data imputation with fuzzy feature selection for diabetes dataset. SN. Appl. Sci. 1(4), 362 (2019)

    Article  Google Scholar 

  8. Kaur, A., Datta, A.: A novel algorithm for fast and scalable subspace clustering of high-dimensional data. J. Big Data 2(1), 17 (2015)

    Article  Google Scholar 

  9. Jain, N., Murthy, C.A.: Connectedness-based subspace clustering. Knowl. Inf. Syst. 58(1), 9–34 (2019)

    Article  Google Scholar 

  10. Wang, X., Lei, Z., Guo, X., Zhang, C., Shi, H., Li, S.Z.: Multi-view subspace clustering with intactness-aware similarity. Pattern Recogn. 88, 50–63 (2019)

    Article  Google Scholar 

  11. Struski, L., Śmieja, M., Tabor, J.: Pointed subspace approach to incomplete data. J. Classif. 28, 1–6 (2019)

    MATH  Google Scholar 

  12. Khalifa, S., Martin, P., Young, R.: Label-aware distributed ensemble learning: a simplified distributed classifier training model for big data. Big Data Res. 15, 1 (2019)

    Article  Google Scholar 

  13. van Rijn, J.N., Holmes, G., Pfahringer, B., Vanschoren, J.: The online performance estimation framework: heterogeneous ensemble learning for data streams. Mach. Learn. 107(1), 149–176 (2018)

    Article  MathSciNet  Google Scholar 

  14. Yin, C., Xia, L., Zhang, S., Sun, R., Wang, J.: Improved clustering algorithm based on high-speed network data stream. Soft Comput. 22(13), 4185–4195 (2018)

    Article  Google Scholar 

  15. Shaikh, S.A., Watanabe, Y., Wang, Y., Kitagawa, H.: Smart scheme: an efficient query execution scheme for event-driven stream processing. Knowl. Inf. Syst. 58(2), 341–370 (2019)

    Article  Google Scholar 

  16. Zhang, L., Lu, W., Liu, X., Pedrycz, W., Zhong, C., Wang, L.: A global clustering approach using hybrid optimization for incomplete data based on interval reconstruction of missing value. Int. J Intell. Syst. 31(4), 297–313 (2016)

    Article  Google Scholar 

  17. Daily and Sports Activities Data Set. https://archive.ics.uci.edu/ml/datasets/Daily+and+Sports+Activities

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatma M. Najib .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Najib, F.M., Ismail, R.M., Badr, N.L., Gharib, T.F. (2021). An Efficient Approach for Query Processing of Incomplete High Dimensional Data Streams. In: Hassanien, AE., Chang, KC., Mincong, T. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2021. Advances in Intelligent Systems and Computing, vol 1339. Springer, Cham. https://doi.org/10.1007/978-3-030-69717-4_57

Download citation

Publish with us

Policies and ethics