Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning Techniques

Liu, Peng-Ju; Li, Cui-Ping; Chen, Hong

doi:10.1007/s11390-024-3538-1

Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning Techniques

Survey
Regular Paper
Published: 06 June 2024

Volume 39, pages 346–368, (2024)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Peng-Ju Liu (刘鹏举)^1,2,
Cui-Ping Li (李翠平)^1,2 &
Hong Chen (陈　红)^1,2

326 Accesses
1 Altmetric
Explore all metrics

Abstract

Data partitioning techniques are pivotal for optimal data placement across storage devices, thereby enhancing resource utilization and overall system throughput. However, the design of effective partition schemes faces multiple challenges, including considerations of the cluster environment, storage device characteristics, optimization objectives, and the balance between partition quality and computational efficiency. Furthermore, dynamic environments necessitate robust partition detection mechanisms. This paper presents a comprehensive survey structured around partition deployment environments, outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are addressed. We discuss partitioning features pertaining to database schema, table data, workload, and runtime metrics. We then delve into the partition generation process, segmenting it into initialization and optimization stages. A comparative analysis of partition generation and update algorithms is provided, emphasizing their suitability for different scenarios and optimization objectives. Additionally, we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and solutions. This survey aims to foster the implementation, deployment, and updating of high-quality partitions for specific system scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Melnik S, Gubarev A, Long J J et al. Dremel: A decade of interactive SQL analysis at web scale. Proceedings of the VLDB Endowment, 2020, 13(12): 3461–3472. DOI: https://doi.org/10.14778/3415478.3415568.
Article Google Scholar
Bayer R, McCreight E. Organization and maintenance of large ordered indices. In Proc. the 1970 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control, Nov. 1970, pp.107–141. DOI: https://doi.org/10.1145/1734663.1734671.
Bentley J L. Multidimensional binary search trees used for associative searching. Communications of the ACM, 1975, 18(9): 509–517. DOI: https://doi.org/10.1145/361002.361007.
Article Google Scholar
Guttman A. R-trees: A dynamic index structure for spatial searching. In Proc. the 1984 ACM SIGMOD International Conference on Management of Data, Jun. 1984, pp.47–57. DOI: https://doi.org/10.1145/602259.602266.
Yuan H T, Li G L, Feng L, Sun J, Han Y. Automatic view generation with deep learning and reinforcement learning. In Proc. the 36th IEEE International Conference on Data Engineering, Apr. 2020, pp.1501–1512. DOI: https://doi.org/10.1109/ICDE48307.2020.00133.
Han Y, Li G L, Yuan H T, Sun J. An autonomous materialized view management system with deep reinforcement learning. In Proc. the 37th IEEE International Conference on Data Engineering, Apr. 2021, pp.2159–2164. DOI: https://doi.org/10.1109/ICDE51399.2021.00217.
Zhang H, Chen G, Ooi B C, Tan K L, Zhang M H. Inmemory big data management and processing: A survey. IEEE Trans. Knowledge and Data Engineering, 2015, 27(7): 1920–1948. DOI: https://doi.org/10.1109/TKDE.2015.2427795.
Article Google Scholar
Mahmud M S, Huang J Z, Salloum S et al. A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining and Analytics, 2020, 3(2): 85–101. DOI: https://doi.org/10.26599/BDMA.2019.9020015.
Article Google Scholar
Sun L W, Franklin M J, Krishnan S, Xin R S. Finegrained partitioning for aggressive data skipping. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, Jun. 2014, pp.1115–1126. DOI: https://doi.org/10.1145/2588555.2610515.
Aly A M, Mahmood A R, Hassan M S, Aref W G, Ouzzani M, Elmeleegy H, Qadah T. AQWA: Adaptive query workload aware partitioning of big spatial data. Proceedings of the VLDB Endowment, 2015, 8(13): 2062–2073. DOI: https://doi.org/10.14778/2831360.2831361.
Article Google Scholar
Aly A M, Elmeleegy H, Qi Y, Aref W. Kangaroo: Workload-aware processing of range data and range queries in Hadoop. In Proc. the 9th ACM International Conference on Web Search and Data Mining, Feb. 2016, pp.397–406. DOI: https://doi.org/10.1145/2835776.2835841.
Shanbhag A, Jindal A, Madden S, Quiane J, Elmore A J. A robust partitioning scheme for ad-hoc query workloads. In Proc. the 2017 Symposium on Cloud Computing, Sept. 2017, pp.229–241. DOI: https://doi.org/10.1145/3127479.3131613.
Lu Y, Shanbhag A, Jindal A, Madden S. AdaptDB: Adaptive partitioning for distributed joins. Proceedings of the VLDB Endowment, 2017, 10(5): 589–600. DOI: https://doi.org/10.14778/3055540.3055551.
Article Google Scholar
Yang Z H, Chandramouli B, Wang C et al. Qd-tree: Learning data layouts for big data analytics. In Proc. the 2020 ACM SIGMOD International Conference on Management of Data, Jun. 2020, pp.193–208. DOI: https://doi.org/10.1145/3318464.3389770.
Ding J L, Minhas U F, Chandramouli B et al. Instance-optimized data layouts for cloud analytics workloads. In Proc. the 2021 International Conference on Management of Data, Jun. 2021, pp.418–431. DOI: https://doi.org/10.1145/3448016.3457270.
Li Z, Yiu M L, Chan T N. PAW: Data partitioning meets workload variance. In Proc. the 38th IEEE International Conference on Data Engineering, May 2022, pp.123–135. DOI: https://doi.org/10.1109/icde53745.2022.00014.
Rao J, Zhang C, Megiddo N, Lohman G. Automating physical database design in a parallel database. In Proc. the 2002 ACM SIGMOD International Conference on Management of Data, Jun. 2002, pp.558–569. DOI: https://doi.org/10.1145/564691.564757.
Agrawal S, Chu E, Narasayya V. Automatic physical design tuning: Workload as a sequence. In Proc. the 2006 ACM SIGMOD International Conference on Management of Data, Jun. 2006, pp.683–694. DOI: https://doi.org/10.1145/1142473.1142549.
Eadon G, Chong E I, Shankar S, Raghavan A, Srinivasan J, Das S. Supporting table partitioning by reference in oracle. In Proc. the 2008 ACM SIGMOD International Conference on Management of Data, Jun. 2008, pp.1111–1122. DOI: https://doi.org/10.1145/1376616.1376727.
Hauglid J O, Ryeng N H, Nørvåg K. DYFRAM: Dynamic fragmentation and replica management in distributed database systems. Distributed and Parallel Databases, 2010, 28(2): 157–185. DOI: https://doi.org/10.1007/s10619-010-7068-1.
Article Google Scholar
Curino C, Jones E, Zhang Y, Madden S. Schism: A workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment, 2010, 3(1/2): 48–57. DOI: https://doi.org/10.14778/1920841.1920853.
Article Google Scholar
Nehme R, Bruno N. Automated partitioning design in parallel database systems. In Proc. the 2011 ACM SIGMOD International Conference on Management of Data, Jun. 2011, pp.1137–1148. DOI: https://doi.org/10.1145/1989323.1989444.
Pavlo A, Curino C, Zdonik S. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In Proc. the 2012 ACM SIGMOD International Conference on Management of Data, May 2012, pp.61–72. DOI: https://doi.org/10.1145/2213836.2213844.
Liroz-Gistau M, Akbarinia R, Pacitti E et al. Dynamic workload-based partitioning algorithms for continuously growing databases. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XII, Hameurlain A, Küng J, Wagner R (eds.), Springer, 2013, pp.105–128. DOI: https://doi.org/10.1007/978-3-642-45315-15.
Quamar A, Kumar K A, Deshpande A. 2013. SWORD: Scalable workload-aware data placement for transactional workloads. In Proc. the 16th International Conference on Extending Database Technology, Mar. 2013, pp.430–441. DOI: https://doi.org/10.1145/2452376.2452427.
Taft R, Mansour E, Serafini M, Duggan J, Elmore A J, Aboulnaga A, Pavlo A, Stonebraker M. E- store: Finegrained elastic partitioning for distributed transaction processing systems. Proceedings of the VLDB Endowment, 2014, 8(3): 245–256. DOI: https://doi.org/10.14778/2735508.2735514.
Article Google Scholar
Chen K J, Zhou Y L, Cao Y. Online data partitioning in distributed database systems. In Proc. the 18th International Conference on Extending Database Technology, Mar. 2015, pp.1–12. DOI: https://doi.org/10.5441/002/edbt.2015.02.
Zamanian E, Binnig C, Salama A. Locality-aware partitioning in parallel database systems. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 2015, pp.17–30. DOI: https://doi.org/10.1145/2723372.2723718.
Fetai I, Murezzan D, Schuldt H. Workload-driven adaptive data partitioning and distribution—The Cumulus approach. In Proc. the 2015 IEEE International Conference on Big Data, Oct. 29–Nov. 1, 2015, pp.1688–1697. DOI: https://doi.org/10.1109/BigData.2015.7363940.
Serafini M, Taft R, Elmore A J et al. Clay: Fine-grained adaptive partitioning for general database schemas. Proceedings of the VLDB Endowment, 2016, 10(4): 445–456. DOI: https://doi.org/10.14778/3025111.3025125.
Article Google Scholar
Marcus R, Papaemmanouil O, Semenova S, Garber S. NashDB: An end-to-end economic method for elastic database fragmentation, replication, and provisioning. In Proc. the 2018 International Conference on Management of Data, May 2018, pp.1253–1267. DOI: https://doi.org/10.1145/3183713.3196935.
Nam Y M, Kim M S, Han D. A graph-based database partitioning method for parallel OLAP query processing. In Proc. the 34th IEEE International Conference on Data Engineering, Apr. 2018, pp.1025–1036. DOI: https://doi.org/10.1109/ICDE.2018.00096.
Parchas P, Naamad Y, Van Bouwel P, Faloutsos C, Petropoulos M. Fast and effective distribution-key recommendation for amazon redshift. Proceedings of the VLDB Endowment, 2020, 13(12): 2411–2423. DOI: https://doi.org/10.14778/3407790.3407834.
Article Google Scholar
Hilprecht B, Binnig C, Röhm U. Learning a partitioning advisor for cloud databases. In Proc. the 2020 ACM SIGMOD International Conference on Management of Data, Jun. 2020, pp.143–157. DOI: https://doi.org/10.1145/3318464.3389704.
Brendle M, Weber N, Valiyev M, May N, Schulze R, Böhm A, Moerkotte G, Grossniklaus M. SAHARA: Memory footprint reduction of cloud databases with automated table partitioning. In Proc. the 25th International Conference on Extending Database Technology, Mar. 29-Apr. 1, 2022. DOI: https://doi.org/10.5441/002/edbt.2022.02.
Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In Proc. the 20th International Conference on Very Large Data Bases, Sept. 1994, pp.487–499.
Ward J H Jr. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 1963, 58(301): 236–244. DOI: https://doi.org/10.1080/01621459.1963.10500845.
Article MathSciNet Google Scholar
Roussopoulos N, Kelley S, Vincent F. Nearest neighbor queries. In Proc. the 1995 ACM SIGMOD International Conference on Management of Data, May 1995, pp.71–79. DOI: https://doi.org/10.1145/223784.223794.
Sacca D, Wiederhold G. Database partitioning in a cluster of processors. ACM Trans. Database Systems, 1985, 10(1): 29–56. DOI: https://doi.org/10.1145/3148.3161.
Article Google Scholar
Copeland G, Alexander W, Boughter E, Keller T. Data placement in Bubba. In Proc. the 1988 ACM SIGMOD International Conference on Management of Data, Jun. 1988, pp.99–108. DOI: https://doi.org/10.1145/50202.50213.
Stöhr T, Märtens H, Rahm E. Multi-dimensional database allocation for parallel data warehouses. In Proc. the 26th International Conference on Very Large Data Bases, Sept. 2000, pp.273–284.
Bruno N, Chaudhuri S. An online approach to physical design tuning. In Proc. the 23rd IEEE International Conference on Data Engineering, Apr. 2007, pp.826–835. DOI: https://doi.org/10.1109/ICDE.2007.367928.
Garcia-Alvarado C, Raghavan V, Narayanan S, Waas F M. Automatic data placement in MPP databases. In Proc. the IEEE 28th International Conference on Data Engineering Workshops, Apr. 2012, pp.322–327. DOI: https://doi.org/10.1109/ICDEW.2012.45.
Karypis G, Kumar V. METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. Technical Report, TR 97-061, Univeristy of Minnesota, 1997. https://hdl.handle.net/11299/215346, Mar. 2024.
Kuhn H W. The Hungarian method for the assignment problem. In 50 Years of Integer Programming 1958–2008, Jünger M, Liebling T M, Naddef D, Nemhauser G L, Pulleyblank W R, Reinelt G, Rinaldi G, Wolsey L A (eds.), Springer, 2010, pp.29–47. DOI: https://doi.org/10.1007/978-3-540-68279-0_2.
Costa E, Costa C, Santos M Y. Evaluating partitioning and bucketing strategies for hive-based big data warehousing systems. Journal of Big Data, 2019, 6(1): 34. DOI: https://doi.org/10.1186/s40537-019-0196-1.
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing Atari with deep reinforcement learning. arXiv: 1312.5602, 2013. https://arxiv.org/abs/1312.5602, Mar. 2024.
Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, Zdonik S, Jones E P C, Madden S, Stonebraker M, Zhang Y, Hugg J, Abadi D J. H-store: A high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment, 2008, 1(2): 1496–1499. DOI: https://doi.org/10.14778/1454159.1454211.
Article Google Scholar
Hoffer J A, Severance D G. The use of cluster analysis in physical data base design. In Proc. the 1st International Conference on Very Large Data Bases, Sept. 1975, pp.69–86. DOI: https://doi.org/10.1145/1282480.1282486.
Navathe S, Ceri S, Wiederhold G, Dou J L. Vertical partitioning algorithms for database design. ACM Trans. Database Systems, 1984, 9(4): 680–710. DOI: https://doi.org/10.1145/1994.2209.
Article Google Scholar
Navathe S B, Ra M. Vertical partitioning for database design: A graphical algorithm. ACM SIGMOD Record, 1989, 18(2): 440–450. DOI: https://doi.org/10.1145/66926.66966.
Article Google Scholar
Chu W W, Ieong I T. A transaction-based approach to vertical partitioning for relational database systems. IEEE Trans. Software Engineering, 1993, 19(8): 804–812. DOI: https://doi.org/10.1109/32.238583.
Article Google Scholar
Ng V, Law D M, Gorla N, Chan C K. Applying genetic algorithms in database partitioning. In Proc. the 2003 ACM Symposium on Applied Computing, Mar. 2003, pp.544–549. DOI: https://doi.org/10.1145/952532.952639.
Hankins R A, Patel J M. Data morphing: An adaptive, cache-conscious storage technique. In Proceedings 2003 VLDB Conference, Freytag J C, Lockemann P, Abiteboul S, Carey M, Selinger P, Heuer A (eds.), Elsevier, 2003, pp.417–428. DOI: https://doi.org/10.1016/B978-012722442-8/50044-6.
Papadomanolakis S, Ailamaki A. AutoPart: Automating schema design for large scientific databases using data partitioning. In Proc. the 16th International Conference on Scientific and Statistical Database Management, Jun. 2004, pp.383–392. DOI: https://doi.org/10.1109/SSDM.2004.1311234.
Agrawal S, Narasayya V, Yang B. Integrating vertical and horizontal partitioning into automated physical database design. In Proc. the 2004 ACM SIGMOD International Conference on Management of Data, Jun. 2004, pp.359–370. DOI: https://doi.org/10.1145/1007568.1007609.
Gorla N, Betty P W Y. Vertical fragmentation in databases using data-mining technique. International Journal of Data Warehousing and Mining (IJDWM), 2008, 4(3): 35–53. DOI: https://doi.org/10.4018/jdwm.2008070103.
Article Google Scholar
Rodríguez L, Li X O. A support-based vertical partitioning method for database design. In Proc. the 8th International Conference on Electrical Engineering, Computing Science and Automatic Control, Oct. 2011. DOI: https://doi.org/10.1109/ICEEE.2011.6106682.
Jindal A, Dittrich J. Relax and let the database do the partitioning online. In Proc. the 5th International Workshop on Enabling Real-Time Business Intelligence, Sept. 2011, pp.65–80. DOI: https://doi.org/10.1007/978-3-642-33500-6_5.
Li L Z, Gruenwald L. Self-managing online partitioner for databases (SMOPD): A vertical database partitioning system with a fully automatic online approach. In Proc. the 17th International Database Engineering & Applications Symposium, Oct. 2013, pp.168–173. DOI: https://doi.org/10.1145/2513591.2513649.
Rodríguez L, Li X O, Cuevas-Rasgado A D, García-Lamont F. DYVEP: An active database system with vertical partitioning functionality. In Proc. the 10th IEEE International Conference on Networking, Sensing and Control, Apr. 2013, pp.457–462. DOI: https://doi.org/10.1109/ICNSC.2013.6548782.
Li L Z, Gruenwald L. SMOPD-C: An autonomous vertical partitioning technique for distributed databases on cluster computers. In Proc. the 15th IEEE International Conference on Information Reuse and Integration, Aug. 2014, pp.171–178. DOI: https://doi.org/10.1109/IRI.2014.7051887.
Sun L W, Franklin M J, Wang J N, Wu E. Skipping-oriented partitioning for columnar layouts. Proceedings of the VLDB Endowment, 2016, 10(4): 421–432. DOI: https://doi.org/10.14778/3025111.3025123.
Article Google Scholar
Huang Y F, Lai C J. Integrating frequent pattern clustering and branch-and-bound approaches for data partitioning. Information Sciences, 2016, 328: 288–301. DOI: https://doi.org/10.1016/j.ins.2015.08.047.
Article Google Scholar
Rodríguez-Mazahua L, Alor-Hernández G, Li X O, Cervantes J, López-Chau A. Active rule base development for dynamic vertical partitioning of multimedia databases. Journal of Intelligent Information Systems, 2017, 48(2): 421–451. DOI: https://doi.org/10.1007/s10844-016-0420-9.
Article Google Scholar
Durand G C, Pinnecke M, Piriyev R et al. GridFormation: Towards self-driven online data partitioning using reinforcement learning. In Proc. the 1st International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Jun. 2018, Article No. 1. DOI: https://doi.org/10.1145/3211954.3211956.
Durand G C, Piriyev R, Pinnecke M et al. Automated vertical partitioning with deep reinforcement learning. In New Trends in Databases and Information Systems, Welzer T et al. (eds.), Springer, 2019, pp.126–134. DOI: https://doi.org/10.1007/978-3-030-30278-8_16.
Liu P J, Li H Y, Wang T Y et al. Multi-stage method for online vertical data partitioning based on spectral clustering. Journal of Software, 2023, 34(6): 2804–2832. DOI: https://doi.org/10.13328/j.cnki.jos.006496.
Google Scholar
Grund M, Krüger J, Plattner H, Zeier A, Cudre-Mauroux P, Madden S. HYRISE: A main memory hybrid storage engine. Proceedings of the VLDB Endowment, 2010, 4(2): 105–116. DOI: https://doi.org/10.14778/1921071.1921077.
Article Google Scholar
Jindal A, Quiané-Ruiz J A, Dittrich J. Trojan data layouts: Right shoes for a running elephant. In Proc. the 2nd ACM Symposium on Cloud Computing, Oct. 2011, Article No. 21. DOI: https://doi.org/10.1145/2038916.2038937.
Gu X Y, Yang X F, Wang W P, Jin Y, Meng D. CHAC: An effective attribute clustering algorithm for large-scale data processing. In Proc. the 7th International Conference on Networking, Architecture, and Storage, Jun. 2012, pp.94–98. DOI: https://doi.org/10.1109/NAS.2012.16.
Arulraj J, Pavlo A, Menon P. Bridging the archipelago between row-stores and column-stores for hybrid workloads. In Proc. the 2016 International Conference on Management of Data, Jun. 2016, pp.583–598. DOI: https://doi.org/10.1145/2882903.2915231.
Athanassoulis M, Bøgh K S, Idreos S. Optimal column layout for hybrid workloads. Proceedings of the VLDB Endowment, 2019, 12(33): 2393–2407. https://doi.org/10.14778/3358701.3358707.
Article Google Scholar
McCormick W T, Schweitzer P J, White T W. Problem decomposition and data reorganization by a clustering technique. Operations Research, 1972, 20(5): 993–1009. DOI: https://doi.org/10.1287/opre.20.5.993.
Article Google Scholar
Ailamaki A, DeWitt D J, Hill M D, Skounakis M. Weaving relations for cache performance. In Proc. the 27th International Conference on Very Large Data Bases, Sept. 2001, pp.169–180.
Li L Z, Gruenwald L. Autonomous database partitioning using data mining on single computers and cluster computers. In Proc. the 16th International Database Engineering & Applications Sysmposium, Aug. 2012, pp.32–41. DOI: https://doi.org/10.1145/2351476.2351481.
van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In Proc. the 30th AAAI Conference on Artificial Intelligence, Feb. 2016, pp.2094–2100. DOI: https://doi.org/10.1609/aaai.v30i1.10295.
Jindal A, Palatinus E, Pavlov V, Dittrich J. A comparison of knives for bread slicing. Proceedings of the VLDB Endowment, 2013, 6(6): 361–372. DOI: https://doi.org/10.14778/2536336.2536338.
Article Google Scholar
Al-Kateb M, Sinclair P, Au G, Ballinger C. Hybrid row-column partitioning in Teradata®. Proceedings of the VLDB Endowment, 2016, 9(13): 1353–1364. DOI: https://doi.org/10.14778/3007263.3007273.
Article Google Scholar
Pinnecke M, Durand G C, Broneske D, Zoun R, Saake G. GridTables: A One-Size-Fits-Most H²TAP data store. Datenbank-Spektrum, 2020, 20(1): 43–56. DOI: https://doi.org/10.1007/s13222-019-00330-x.
Article Google Scholar
Kang D H, Jiang R C, Blanas S. Jigsaw: A data storage and query processing engine for irregular table partitioning. In Proc. the 2021 International Conference on Management of Data, Jun. 2021, pp.898–911. DOI: https://doi.org/10.1145/3448016.3457547.
Abebe M, Lazu H, Daudjee K. Proteus: Autonomous adaptive storage for mixed workloads. In Proc. the 2022 International Conference on Management of Data, Jun. 2022, pp.700–714. DOI: https://doi.org/10.1145/3514221.3517834.
Wang J Y, Chai C L, Liu J B, Li G L. FACE: A normalizing flow based cardinality estimator. Proceedings of the VLDB Endowment, 2021, 15(1): 72–84. DOI: https://doi.org/10.14778/3485450.3485458.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, 100872, China
Peng-Ju Liu (刘鹏举), Cui-Ping Li (李翠平) & Hong Chen (陈　红)
Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education, Beijing, 100872, China
Peng-Ju Liu (刘鹏举), Cui-Ping Li (李翠平) & Hong Chen (陈　红)

Authors

Peng-Ju Liu (刘鹏举)
View author publications
You can also search for this author in PubMed Google Scholar
Cui-Ping Li (李翠平)
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chen (陈　红)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cui-Ping Li (李翠平).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

The work was supported by the National Key Research and Development Program of China under Grant No. 2023YFB4503603, the National Natural Science Foundation of China under Grant Nos. 62072460, 62076245, and 62172424, and the Beijing Natural Science Foundation under Grant No. 4212022.

Peng-Ju Liu received his B.S. degree in information management and information system from Dalian Maritime University, Dalian, in 2020. He is currently pursuing his Ph.D. degree at the School of Information, Renmin University of China, Beijing. His research interests include adaptable data partitioning, load forecasting, and learning-based query optimization.

Cui-Ping Li is currently a professor at Renmin University of China, Beijing. She received her Ph.D. degree from Chinese Academy of Sciences, Beijing, in 2003. Before that, she received her B.S. and M.S. degrees from Xi’an Jiaotong University, Xi’an, in 1994 and 1997, respectively. She received the Second Prize of the National Award for Science and Technology Progress in 2018. Her main research interests include social network analysis, social recommendation, and big data analysis.

Hong Chen is currently a professor at Renmin University of China, Beijing. She received her Ph.D. degree from Chinese Academy of Sciences, Beijing, in 2000. Before that, she received her B.S. and M.S. degrees from Renmin University of China, Beijing, in 1986 and 1989, respectively. She received the Second Prize of the National Award for Science and Technology Progress in 2018. Her research interests include database technology and high-performance computing.

Electronic supplementary material