A survey of estimating number of distinct values

Li, Jiajun; Lei, Runlin; Wei, Zhewei

doi:10.1007/s11704-024-40952-3

A survey of estimating number of distinct values

Review Article
Published: 21 March 2025

Volume 19, article number 199611, (2025)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Jiajun Li¹,
Runlin Lei¹ &
Zhewei Wei¹

82 Accesses
11 Altmetric
2 Mentions
Explore all metrics

Abstract

Estimating the Number of Distinct Values (NDVs) is a critical task in the fields of databases and data streams. Over time, various algorithms for estimating NDVs have been developed, each tailored to different requirements for time, I/O, and accuracy. These algorithms can be broadly categorized into two main types: sampling-based and sketch-based. Sampling-based NDV algorithms improve efficiency by sampling rather than accessing all items, often at the cost of reduced accuracy. In contrast, sketch-based NDV algorithms maintain a compact sketch using hashing to scan the entire dataset, typically offering higher accuracy but at the expense of increased I/O costs. When dealing with large-scale data, scanning the entire table may become infeasible. Thus, the challenge of efficiently and accurately estimating NDVs has persisted for decades. This paper provides a comprehensive review of the fundamental concepts, key techniques, and a comparative analysis of various NDV estimation algorithms. We first briefly examine traditional estimators in chronological order, followed by an in-depth discussion of the newer estimators developed over the past decade, highlighting the specific scenarios in which they are applicable. Furthermore, we illustrate how NDV estimation algorithms have been adapted to address the complexities of modern real-world data environments effectively. Despite significant progress in NDV estimation research, challenges remain in terms of theoretical scalability and practical application. This paper also explores potential future directions, including block sampling NDV estimation, learning-based NDV estimation, and their implications for database applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sliding Window Top-K Monitoring over Distributed Data Streams

Impact of Classifiers to Drift Detection Method: A Comparison

Dboost: A Fast Algorithm for DBSCAN-based Clustering on High Dimensional Data

References

Fisher R A, Corbet A S, Williams C B. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology, 1943, 12(1): 42–58
Article MATH Google Scholar
Efron B, Thisted R. Estimating the number of unseen species: how many words did Shakespeare know? Biometrika, 1976, 63(3): 435–447
MATH Google Scholar
Valiant G, Valiant P. Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. In: Proceedings of the 43rd Annual ACM Symposium on Theory of Computing. 2011, 685–694
MATH Google Scholar
Hou W C, Ozsoyoglu G, Taneja B K. Processing aggregate relational queries with hard time constraints. In: Proceedings of 1989 ACM SIGMOD International Conference on Management of Data. 1989, 68–77
Chapter MATH Google Scholar
Ozsoyoglu G, Du K, Tjahjana A, Hou W C, Rowland D Y. On estimating count, sum, and average relational algebra queries. In: Proceedings of the International Conference on Database and Expert Systems Applications. 1991, 406–412
Chapter Google Scholar
Haas P J, Naughton J F, Seshadri S, Stokes L. Sampling-based estimation of the number of distinct values of an attribute. In: Proceedings of the 21st International Conference on Very Large Data Bases. 1995, 311–322
MATH Google Scholar
Lemire D, Kaser O. Reordering columns for smaller indexes. Information Sciences, 2011, 181(12): 2550–2570
Article MathSciNet MATH Google Scholar
Li P, Wei W, Zhu R, Ding B, Zhou J, Lu H. ALECE: an attention-based learned cardinality estimator for SPJ queries on dynamic workloads. Proceedings of the VLDB Endowment, 2023, 17(2): 197–210
Article Google Scholar
Chabchoub Y, Chiky R, Dogan B. How can sliding HyperLogLog and EWMA detect port scan attacks in IP traffic? EURASIP Journal on Information Security, 2014, 2014(1): 5
Article Google Scholar
Cohen R, Nezri Y. Cardinality estimation in a virtualized network device using online machine learning. IEEE/ACM Transactions on Networking, 2019, 27(5): 2098–2110
Article MATH Google Scholar
Clemens V, Schulz L C, Gartner M, Hausheer D. DDoS detection in P4 using HYPERLOGLOG and COUNTMIN sketches. In: Proceedings of NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium. 2023, 1–6
Google Scholar
Kalai A T, Vempala S S. Calibrated language models must hallucinate. In: Proceedings of the 56th Annual ACM Symposium on Theory of Computing. 2024, 160–171
Chapter MATH Google Scholar
Bunge J, Fitzpatrick M. Estimating the number of species: a review. Journal of the American Statistical Association, 1993, 88(421): 364–373
Article MATH Google Scholar
Harmouch H, Naumann F. Cardinality estimation: an experimental survey. Proceedings of the VLDB Endowment, 2017, 11(4): 499–512
Article MATH Google Scholar
Batu T, Fortnow L, Rubinfeld R, Smith W D, White P. Testing that distributions are close. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science. 2000, 259–269
Chapter MATH Google Scholar
Paninski L. Estimation of entropy and mutual information. Neural Computation, 2003, 15(6): 1191–1253
Article MATH Google Scholar
Orlitsky A, Santhanam N P, Viswanathan K, Zhang J. On modeling profiles instead of values. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2004, 426–435
MATH Google Scholar
Brutlag J D, Richardson T S. A block sampling approach to distinct value estimation. Journal of Computational and Graphical Statistics, 2002, 11(2): 389–404
Article MathSciNet MATH Google Scholar
Chaudhuri S, Das G, Srivastava U. Effective use of block-level sampling in statistics estimation. In: Proceedings of 2004 ACM SIGMOD International Conference on Management of Data. 2004, 287–298
Chapter MATH Google Scholar
Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 2005, 55(1): 58–75
Article MathSciNet MATH Google Scholar
Bloom B H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13(7): 422–426
Article MATH Google Scholar
Li B, Lu Y, Wang C, Kandula S. Q-error bounds of random uniform sampling for cardinality estimation. 2021, arXiv preprint arXiv: 2108.02715
Charikar M, Chaudhuri S, Motwani R, Narasayya V. Towards estimation error guarantees for distinct values. In: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2000, 268–279
MATH Google Scholar
Goodman L A. On the estimation of the number of classes in a population. The Annals of Mathematical Statistics, 1949, 20(4): 572–579
Article MathSciNet MATH Google Scholar
Good I J, Toulmin G H. The number of new species, and the increase in population coverage, when a sample is increased. Biometrika, 1956, 43(1–2): 45–63
Article MathSciNet MATH Google Scholar
Bromwich T J I A. An Introduction to the Theory of Infinite Series. Providence: American Mathematical Society, 2005
MATH Google Scholar
Shlosser A. On estimation of the size of the dictionary of a long text on the basis of a sample. Engineering Cybernetics, 1981, 19(1): 97–102
MathSciNet MATH Google Scholar
Haas P J, Stokes L. Estimating the number of classes in a finite population. Journal of the American Statistical Association, 1998, 93(444): 1475–1487
Article MathSciNet MATH Google Scholar
Quenouille M H. Problems in plane sampling. The Annals of Mathematical Statistics, 1949, 20(3): 355–375
Article MathSciNet MATH Google Scholar
Burnham K P, Overton W S. Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika, 1978, 65(3): 625–633
Article MATH Google Scholar
Burnham K P, Overton W S. Robust estimation of population size when capture probabilities vary among animals. Ecology, 1979, 60(5): 927–936
Article MATH Google Scholar
Heltshe J F, Forrester N E. Estimating species richness using the jackknife procedure. Biometrics, 1983, 39(1): 1–11
Article MATH Google Scholar
Smith E P, van Belle G. Nonparametric estimation of species richness. Biometrics, 1984, 40(1): 119–129
Article MATH Google Scholar
Efron B. Bootstrap methods: another look at the jackknife. In: Kotz S, Johnson N L, eds. Breakthroughs in Statistics: Methodology and Distribution. New York: Springer, 1992, 569–593
Chapter MATH Google Scholar
Horvitz D G, Thompson D J. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 1952, 47(260): 663–685
Article MathSciNet MATH Google Scholar
Särndal C E, Swensson B, Wretman J. Model Assisted Survey Sampling. New York: Springer, 2003
MATH Google Scholar
Chao A. Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics, 1984, 11(4): 265–270
MathSciNet MATH Google Scholar
Chao A, Shen T. User’s guide for program spade (species prediction and diversity estimation). National Tsing Hua University, Dissertation, 2010
MATH Google Scholar
Chao A, Lee S M. Estimating the number of classes via sample coverage. Journal of the American statistical Association, 1992, 87(417): 210–217
Article MathSciNet MATH Google Scholar
Good I J. The population frequencies of species and the estimation of population parameters. Biometrika, 1953, 40(3–4): 237–264
Article MathSciNet MATH Google Scholar
Deolalikar V, Laffitte H. Extensive large-scale study of error in samping-based distinct value estimators for databases. 2016, arXiv preprint arXiv: 1612.00476
Motwani R, Vassilvitskii S. Distinct values estimators for power law distributions. In: Proceedings of the 3rd Workshop on Analytic Algorithmics and Combinatorics. 2006, 230–237
MATH Google Scholar
Korwar R M. On the observed number of classes from multivariate power series and hypergeometric distributions. Sankhyā: The Indian Journal of Statistics, Series B, 1988, 50(1): 39–59
MathSciNet MATH Google Scholar
Valiant P, Valiant G. Estimating the unseen: improved estimators for entropy and other properties. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2157–2165
MATH Google Scholar
Li J, Lei R, Wang S, Wei Z, Ding B. Learning-based property estimation with polynomials. Proceedings of the ACM on Management of Data, 2024, 2(3): 1–27
MATH Google Scholar
Valiant G, Valiant P. Instance optimal learning of discrete distributions. In: Proceedings of the 48th Annual ACM Symposium on Theory of Computing. 2016, 142–155
MATH Google Scholar
Raghunathan A, Valiant G, Zou J. Estimating the unseen from multiple populations. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 2855–2863
MATH Google Scholar
Valiant G, Valiant P. Estimating the unseen: improved estimators for entropy and other properties. Journal of the ACM (JACM), 2017, 64(6): 37
Article MathSciNet MATH Google Scholar
Valiant, G., & Valiant, P. (2010). Estimating the unseen: A sublinear-sample canonical estimator of distributions. Electronic Colloquium on Computational Complexity (ECCC), TR10-180.
MATH Google Scholar
Acharya J, Das H, Orlitsky A, Suresh A T. A unified maximum likelihood approach for estimating symmetric properties of discrete distributions. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 11–21
MATH Google Scholar
Pavlichin D S, Jiao J, Weissman T. Approximate profile maximum likelihood. Journal of Machine Learning Research, 2019, 20(122): 1–55
MathSciNet MATH Google Scholar
Acharya J, Das H, Mohimani H, Orlitsky A, Pan S. Exact calculation of pattern probabilities. In: Proceedings of 2010 IEEE International Symposium on Information Theory. 2010, 1498–1502
MATH Google Scholar
Orlitsky A, Sajama S, Santhanam N P, Viswanathan K, Zhang J. Algorithms for modeling distributions over large alphabets. In: Proceedings of International Symposium onInformation Theory. 2004, 304
MATH Google Scholar
Orlitsky A, Santhanam N, Viswanathan K, Zhang J. Theoretical and experimental results on modeling low probabilities. In: Proceedings of 2006 IEEE Information Theory Workshop – ITW’ 06 Punta del Este. 2006, 242–246
Chapter MATH Google Scholar
Vontobel P O. The bethe approximation of the pattern maximum likelihood distribution. In: Proceedings of 2012 IEEE International Symposium on Information Theory Proceedings. 2012, 2012–2016
MATH Google Scholar
Vontobel P O. The bethe and sinkhorn approximations of the pattern maximum likelihood estimate and their connections to the valiant-valiant estimate. In: Proceedings of 2014 Information Theory and Applications Workshop (ITA). 2014, 1–10
MATH Google Scholar
Wu Y, Yang P. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Transactions on Information Theory, 2016, 62(6): 3702–3720
Article MathSciNet MATH Google Scholar
Wu Y, Yang P. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. The Annals of Statistics, 2019, 47(2): 857–883
Article MathSciNet MATH Google Scholar
Chien I. Regularized weighted chebyshev approximations for support estimation. University of Illinois at Urbana-Champaign, Dissertation, 2019
MATH Google Scholar
Eden T, Indyk P, Narayanan S, Rubinfeld R, Silwal S, Wagner T. Learning-based support estimation in sublinear time. In: Proceedings of the 9th International Conference on Learning Representations. 2021
MATH Google Scholar
Wu R, Ding B, Chu X, Wei Z, Dai X, Guan T, Zhou J. Learning to be a statistician: learned estimator for number of distinct values. Proceedings of the VLDB Endowment, 2021, 15(2): 272–284
Article Google Scholar
Whang K Y, Vander-Zanden B T, Taylor H M. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems (TODS), 1990, 15(2): 208–229
Article MATH Google Scholar
Swamidass S J, Baldi P. Mathematical correction for fingerprint similarity measures to improve chemical retrieval. Journal of Chemical Information and Modeling, 2007, 47(3): 952–964
Article MATH Google Scholar
Papapetrou O, Siberski W, Nejdl W. Cardinality estimation and dynamic length adaptation for bloom filters. Distributed and Parallel Databases, 2010, 28(2): 119–156
Article MATH Google Scholar
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on Theory of Computing. 1996, 20–29
MATH Google Scholar
Cormode G. Count-min sketch. In: Liu L, Özsu M T, eds. Encyclopedia of Database Systems. New York: Springer, 2009, 511–516
Chapter MATH Google Scholar
Gibbons P B, Tirthapura S. Estimating simple functions on the union of data streams. In: Proceedings of the 13th Annual ACM Symposium on Parallel Algorithms and Architectures. 2001, 281–291
MATH Google Scholar
Bar-Yossef Z, Jayram T S, Kumar R, Sivakumar D, Trevisan L. Counting distinct elements in a data stream. In: Proceedings of the 6th International Workshop on Randomization and Approximation Techniques in Computer Science. 2002, 1–10
MATH Google Scholar
Beyer K, Gemulla R, Haas P J, Reinwald B, Sismanis Y. Distinct-value synopses for multiset operations. Communications of the ACM, 2009, 52(10): 87–95
Article Google Scholar
Cohen E. Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences, 1997, 55(3): 441–453
Article MathSciNet MATH Google Scholar
Dasgupta A, Lang K J, Rhodes L, Thaler J. A framework for estimating stream expression cardinalities. In: Proceedings of the 19th International Conference on Database Theory. 2016
MATH Google Scholar
Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31(2): 182–209
Article MathSciNet MATH Google Scholar
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 1999, 58(1): 137–147
Article MathSciNet MATH Google Scholar
Durand M, Flajolet P. Loglog counting of large cardinalities. In: Proceedings of the 11th Annual European Symposium on Algorithms. 2003, 605–617
MATH Google Scholar
Flajolet P, Fusy É, Gandouet O, Meunier F. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of 2007 Conference on Analysis of Algorithms. 2007, 127–146
MATH Google Scholar
Hall A, Bachmann O, Büssow R, Gänceanu S, Nunkesser M. Processing a trillion cells per mouse click. Proceedings of the VLDB Endowment, 2012, 5(11): 1436–1446
Article Google Scholar
Heule S, Nunkesser M, Hall A. HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In: Proceedings of the 16th International Conference on Extending Database Technology. 2013, 683–692
Chapter MATH Google Scholar
Kane D M, Nelson J, Woodruff D P. An optimal algorithm for the distinct elements problem. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2010, 41–52
MATH Google Scholar
Ting D. Approximate distinct counts for billions of datasets. In: Proceedings of the 2019 International Conference on Management of Data. 2019, 69–86
Chapter MATH Google Scholar
Chiosa M, Preußer T B, Alonso G. SKT: a one-pass multi-sketch data analytics accelerator. Proceedings of the VLDB Endowment, 2021, 14(11): 2369–2382
Article Google Scholar
Ertl O. SetSketch: filling the gap between MinHash and HyperLogLog. Proceedings of the VLDB Endowment, 2021, 14(11): 2244–2257
Article MATH Google Scholar
Cormode G, Yi K. Small Summaries for Big Data. Cambridge: Cambridge University Press, 2020
Book MATH Google Scholar
Li J, Wei Z, Ding B, Dai X, Lu L, Zhou J. Sampling-based estimation of the number of distinct values in distributed environment. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 893–903
Chapter MATH Google Scholar
Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In: Proceedings of the 29th International Colloquium on International Colloquium on Automata, Languages, and Programming. 2002, 693–703
MATH Google Scholar
Wang P, Xie D, Zhao J, Li J, Li Z, Li R, Ren Y, Di J. Half-Xor: a fully-dynamic sketch for estimating the number of distinct values in big tables. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(7): 3111–3125
Article MATH Google Scholar
Wang F, Chen Q, Li Y, Yang T, Tu Y, Yu L, Cui B. JoinSketch: a sketch algorithm for accurate and unbiased inner-product estimation. Proceedings of the ACM on Management of Data, 2023, 1(1): 81
Article MATH Google Scholar
Shekelyan M, Cormode G. Sequential random sampling revisited: hidden shuffle method. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. 2021, 3628–3636
MATH Google Scholar
Wei C, Salloum S, Emara T Z, Zhang X, Huang J Z, He Y. A two-stage data processing algorithm to generate random sample partitions for big data analysis. In: Proceedings of the 11th International Conference on Cloud Computing. 2018, 347–364
MATH Google Scholar
Salloum S, Huang J Z, He Y. Random sample partition: a distributed data model for big data analysis. IEEE Transactions on Industrial Informatics, 2019, 15(11): 5846–5854
Article MATH Google Scholar
Debnath S K, Dutta R. Secure and efficient private set intersection cardinality using bloom filter. In: Proceedings of the 18th International Conference on Information Security. 2015, 209–226
Chapter MATH Google Scholar
Guo D, Wu J, Chen H, Yuan Y, Luo X. The dynamic bloom filters. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(1): 120–133
Article MATH Google Scholar
Ertl O. New cardinality estimation algorithms for HyperLogLog sketches. 2017, arXiv preprint arXiv: 1702.01284
Ertl O. UltraLogLog: a practical and more space-efficient alternative to HyperLogLog for approximate distinct counting. Proceedings of the VLDB Endowment, 2024, 17(7): 1655–1668
Article MATH Google Scholar
Tsan B, Datta A, Izenov Y, Rusu F. Approximate sketches. Proceedings of the ACM on Management of Data, 2024, 2(1): 66
Article Google Scholar
Wang D, Pettie S. Better cardinality estimators for HyperLogLog, PCSA, and beyond. In: Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 2023, 317–327
Chapter MATH Google Scholar
Pettie S, Wang D, Yin L. Non-mergeable sketching for cardinality estimation. In: Proceedings of the 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). 2021
MATH Google Scholar
Qiu Y, Wang Y, Yi K, Li F, Wu B, Zhan C. Weighted distinct sampling: cardinality estimation for SPJ queries. In: Proceedings of 2021 International Conference on Management of Data. 2021, 1465–1477
Chapter MATH Google Scholar
Dai B, Hu X, Yi K. Reservoir sampling over joins. Proceedings of the ACM on Management of Data, 2024, 2(3): 1–26
MATH Google Scholar
Kim K, Jung J, Seo I, Han W S, Choi K, Chong J. Learned cardinality estimation: an in-depth study. In: Proceedings of 2022 International Conference on Management of Data. 2022, 1214–1227
Chapter MATH Google Scholar
Kipf A, Kipf T, Radke B, Leis V, Boncz P A, Kemper A. Learned cardinalities: estimating correlated joins with deep learning. In: Proceedings of the 9th Biennial Conference on Innovative Data Systems Research. 2019
Google Scholar
Sun J, Li G. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 2019, 13(3): 307–319
Article MathSciNet MATH Google Scholar
Negi P, Wu Z, Kipf A, Tatbul N, Marcus R, Madden S, Kraska T, Alizadeh M. Robust query driven cardinality estimation under changing workloads. Proceedings of the VLDB Endowment, 2023, 16(6): 1520–1533
Article Google Scholar
Hilprecht B, Schmidt A, Kulessa M, Molina A, Kersting K, Binnig C. DeepDB: learn from data, not from queries! Proceedings of the VLDB Endowment, 2020, 13(7): 992–1005
Article Google Scholar
Yang Z, Kamsetty A, Luan S, Liang E, Duan Y, Chen X, Stoica I. NeuroCard: one cardinality estimator for all tables. Proceedings of the VLDB Endowment, 2020, 14(1): 61–73
Article Google Scholar
Chien E, Milenkovic O, Nedich A. Support estimation with sampling artifacts and errors. In: Proceedings of 2021 IEEE International Symposium on Information Theory (ISIT). 2021, 244–249
MATH Google Scholar
Hao Y, Orlitsky A. Unified sample-optimal property estimation in near-linear time. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 996
MATH Google Scholar
Shan J, Fu Y, Ni G, Luo J, Wu Z. Fast counting the cardinality of flows for big traffic over sliding windows. Frontiers of Computer Science, 2017, 11(1): 119–129
Article MATH Google Scholar

Download references

Acknowledgements

This research was supported in part by the National Science and Technology Major Project (2022ZD0114802), the National Natural Science Foundation of China (Grant Nos. U2241212, 61932001), the Beijing Natural Science Foundation (No. 4222028), by the Beijing Outstanding Young Scientist Program (No.BJJWZYJH012019100020098), the Huawei-Renmin University joint program on Information Retrieval. We also wish to acknowledge the support provided by the fund for building world-class universities (disciplines) of Renmin University of China, by Engineering Research Center of Next-Generation Intelligent Search and Recommendation, Ministry of Education, by Intelligent Social Governance Interdisciplinary Platform, Major Innovation & Planning Interdisciplinary Platform for the “Double-First Class” Initiative, Public Policy and Decision-making Research Lab, and Public Computing Cloud, Renmin University of China.

Author information

Authors and Affiliations

Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, 100872, China
Jiajun Li, Runlin Lei & Zhewei Wei

Authors

Jiajun Li
View author publications
Search author on:PubMed Google Scholar
Runlin Lei
View author publications
Search author on:PubMed Google Scholar
Zhewei Wei
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Zhewei Wei.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Jiajun LI is a PhD candidate at Gaolin School of Artificial Intelligence, Renmin University of China, China advised by Professor Zhewei Wei. He received his BE degree at School of Statistics, Renmin University of China, China in 2019. His research focuses on the approximate computing algorithm, AI for databases (AI4DB), and the design of data stream and sketch algorithms.

Runlin LEI is a PhD candidate at Gaolin School of Artificial Intelligence, Renmin University of China, China advised by Professor Zhewei Wei. He received his BE degree at School of Information and Technology, Shanghai University of Finance and Economics, China in 2022. His research focuses on graph neural networks and AI for databases.

Zhewei WEI is currently a professor at Gaoling School of Artificial Intelligence, Renmin University of China, China. He obtained his PhD degree at Department of Computer Science and Engineering, The Hong Kong University of Science and Technology (HKUST), China in 2012. He received the BSc degree in the School of Mathematical Sciences at Peking University, China in 2008. His research interests include graph algorithms, massive data algorithms, and streaming algorithms. He was the Proceeding Chair of SIGMOD/PODS2020 and ICDT2021, the Area Chair of ICML 2022/2023, NeurIPS 2022/2023, ICLR 2023, WWW 2023. He is also the PC member of various top conferences, such as VLDB, KDD, ICDE, ICML, and NeurIPS.

Electronic supplementary material