Top-k dominating queries on incomplete large dataset

Wu, Jimmy Ming-Tai; Wei, Min; Wu, Mu-En; Tayeb, Shahab

doi:10.1007/s11227-021-04005-x

Top-k dominating queries on incomplete large dataset

Published: 17 August 2021

Volume 78, pages 3976–3997, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jimmy Ming-Tai Wu¹,
Min Wei¹,
Mu-En Wu ORCID: orcid.org/0000-0002-4839-3849² &
…
Shahab Tayeb³

1721 Accesses
5 Citations
Explore all metrics

Abstract

Top-k dominating (TKD) query is one of the methods to find the interesting objects by returning the k objects that dominate other objects in a given dataset. Incomplete datasets have missing values in uncertain dimensions, so it is difficult to obtain useful information with traditional data mining methods on complete data. BitMap Index Guided Algorithm (BIG) is a good choice for solving this problem. However, it is even harder to find top-k dominance objects on incomplete big data. When the dataset is too large, the requirements for the feasibility and performance of the algorithm will become very high. In this paper, we proposed an algorithm to apply MapReduce on the whole process with a pruning strategy, called Efficient Hadoop BitMap Index Guided Algorithm (EHBIG). This algorithm can realize TKD query on incomplete datasets through BitMap Index and use MapReduce architecture to make TKD query possible on large datasets. By using the pruning strategy, the runtime and memory usage are greatly reduced. What’s more, we also proposed an improved version of EHBIG (denoted as IEHBIG) which optimizes the whole algorithm flow. Our in-depth work in this article culminates with some experimental results that clearly show that our proposed algorithm can perform well on TKD query in an incomplete large dataset and shows great performance in a Hadoop computing cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Processing of Top-K Dominating Queries on Incomplete Data Using MapReduce

Revealing Top-k Dominant Individuals in Incomplete Data Based on Spark Environment

Indexed Top-k Dominating Queries on Highly Incomplete Data

References

Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82
Article Google Scholar
Yiu ML, Mamoulis N (2009) Multi-dimensional top-k dominating queries. VLDB J 18(3):695–718
Article Google Scholar
Ge S, Mamoulis N, Cheung DW et al (2015) Dominance relationship analysis with budget constraints. Knowl Inf Syst 42(2):409–440
Article Google Scholar
Mamoulis N, Cheng KH, Yiu ML, Cheung DW (2006) Efficient aggregation of ranked inputs, In: 22nd International Conference on Data Engineering (ICDE’06), pp 72–72, IEEE
Tiakas E, Valkanas G, Papadopoulos AN, Manolopoulos Y, Gunopulos D (2014) Metric-based top-k dominating queries. In: EDBT, pp 415–426
Miao X, Gao Y, Zheng B, Chen G, Cui H (2015) Top-k dominating queries on incomplete data. IEEE Trans Knowl Data Eng 28(1):252–266
Article Google Scholar
Zhu H, Li X, Liu Q, Xu Z (2020) Top-k dominating queries on skyline groups. IEEE Trans Knowl Data Eng 32(7):1431–1444
Article Google Scholar
Tiwari D, Bhati BS (2021) A deep analysis and prediction of covid-19 in India: using ensemble regression approach. In: Artificial Intelligence and Machine Learning for COVID-19, pp 97–109
Tiwari D, Nagpal B (2020) Ensemble methodsof sentiment analysis: a survey. In: 2020 7th International Conference on Computing for Sustainable Global Development (INDIACom), pp 150–155, IEEE
Zhang X, Fan M, Wang D, Zhou P, Tao D (2020) Top-k feature selection framework using robust 0-1 integer programming. IEEE Trans Neural Netw Learn Syst
Schibler T, Suri S (2020) K-dominance in multidimensional data: theory and applications. Comput Geom 87:101594
Xie M, Wong RC-W, Lall A (2020) An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J 29(1):147–175
Article Google Scholar
Wang Y, Li X, Li X, Wang Y (2013) A survey of queries over uncertain data. Knowl Inf Syst 37(3):485–530
Article Google Scholar
Khalefa ME, Mokbel MF, Levandoski JJ (2008) Skyline query processing for incomplete data, In: 2008 IEEE 24th International Conference on Data Engineering, pp 556–565, IEEE
Lian X, Chen L (2009) Top-k dominating queries in uncertain databases, In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 660–671
Lian X, Chen L (2013) Probabilistic top-k dominating queries in uncertain databases. Inf Sci 226:23–46
Article MathSciNet Google Scholar
Han X, Li J, Gao H (2015) Tdep: efficiently processing top-k dominating query on massive data. Knowl Inf Syst 43(3):689–718
Article Google Scholar
Zhang K, Gao H, Han X, Cai Z, Li J (2020) Modeling and computing probabilistic skyline on incomplete data. IEEE Trans Knowl Data Eng 32(7):1405–1418
Article Google Scholar
Chen C-M, Chen L, Gan W, Qiu L, Ding W (2021) Discovering high utility-occupancy patterns from uncertain data. Inf Sci 546:1208–1229
Article MathSciNet Google Scholar
Sefidian AM, Daneshpour N (2019) Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model. Expert Syst Appl 115:68–94
Article Google Scholar
Biessmann F, Rukat T, Schmidt P, Naidu P, Schelter S, Taptunov A, Lange D, Salinas D (2019) Datawig: missing value imputation for tables. J Mach Learn Res 20(175):1–6
MATH Google Scholar
Wu K, Shoshani A, Stockinger K (2008) Analyses of multi-level and multi-component compressed bitmap indexes. ACM Trans Database Syst 35(1):1–52
Article Google Scholar
Chen Z, Wen Y, Cao J, Zheng W, Chang J, Wu Y, Ma G, Hakmaoui M, Peng G (2015) A survey of bitmap index compression algorithms for big data. Tsinghua Sci Technol 20(1):100–115
Article MathSciNet Google Scholar
Wu K, Otoo EJ, Shoshani A (2002) Compressing bitmap indexes for faster search operations, In: Proceedings 14th International Conference on Scientific and Statistical Database Management, pp 99–108, IEEE
Manogaran G, Lopez D (2018) Disease surveillance system for big climate data processing and dengue transmission, In: Climate Change and Environmental Concerns: Breakthroughs in Research and Practice, pp 427–446, IGI Global
Kamal S, Ripon SH, Dey N, Ashour AS, Santhi V (2016) A mapreduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. Comput Methods Programs Biomed 131:191–206
Article Google Scholar
Kamal MS, Parvin S, Ashour AS, Shi F, Dey N (2017) De-bruijn graph with mapreduce framework towards metagenomic data classification. Int J Inf Technol 9(1):59–75
Google Scholar
Matallah H, Belalem G, Bouamrane K (2017) Towards a new model of storage and access to data in big data and cloud computing. Int J Ambient Comput Intell 8(4):31–44
Article Google Scholar
Ezatpoor P, Zhan J, Wu JM-T, Chiu C (2018) Finding top-$k$ dominance on incomplete big data using mapreduce framework. IEEE Access 6:7872–7887
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Engineering, Shandong University of Science and Technology, Qindao, China
Jimmy Ming-Tai Wu & Min Wei
Department of Information and Finance Management, National Taipei University of Technology, Taipei, Taiwan
Mu-En Wu
Department of Electrical and Computer Engineering, California State University, Fresno, CA, USA
Shahab Tayeb

Authors

Jimmy Ming-Tai Wu
View author publications
You can also search for this author inPubMed Google Scholar
Min Wei
View author publications
You can also search for this author inPubMed Google Scholar
Mu-En Wu
View author publications
You can also search for this author inPubMed Google Scholar
Shahab Tayeb
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mu-En Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, J.MT., Wei, M., Wu, ME. et al. Top-k dominating queries on incomplete large dataset. J Supercomput 78, 3976–3997 (2022). https://doi.org/10.1007/s11227-021-04005-x

Download citation

Accepted: 17 July 2021
Published: 17 August 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11227-021-04005-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-k dominating queries on incomplete large dataset

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Processing of Top-K Dominating Queries on Incomplete Data Using MapReduce

Revealing Top-k Dominant Individuals in Incomplete Data Based on Spark Environment

Indexed Top-k Dominating Queries on Highly Incomplete Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now