Skip to main content

Advertisement

Log in

Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning Techniques

  • Survey
  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Data partitioning techniques are pivotal for optimal data placement across storage devices, thereby enhancing resource utilization and overall system throughput. However, the design of effective partition schemes faces multiple challenges, including considerations of the cluster environment, storage device characteristics, optimization objectives, and the balance between partition quality and computational efficiency. Furthermore, dynamic environments necessitate robust partition detection mechanisms. This paper presents a comprehensive survey structured around partition deployment environments, outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are addressed. We discuss partitioning features pertaining to database schema, table data, workload, and runtime metrics. We then delve into the partition generation process, segmenting it into initialization and optimization stages. A comparative analysis of partition generation and update algorithms is provided, emphasizing their suitability for different scenarios and optimization objectives. Additionally, we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and solutions. This survey aims to foster the implementation, deployment, and updating of high-quality partitions for specific system scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Melnik S, Gubarev A, Long J J et al. Dremel: A decade of interactive SQL analysis at web scale. Proceedings of the VLDB Endowment, 2020, 13(12): 3461–3472. DOI: https://doi.org/10.14778/3415478.3415568.

    Article  Google Scholar 

  2. Bayer R, McCreight E. Organization and maintenance of large ordered indices. In Proc. the 1970 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control, Nov. 1970, pp.107–141. DOI: https://doi.org/10.1145/1734663.1734671.

  3. Bentley J L. Multidimensional binary search trees used for associative searching. Communications of the ACM, 1975, 18(9): 509–517. DOI: https://doi.org/10.1145/361002.361007.

    Article  Google Scholar 

  4. Guttman A. R-trees: A dynamic index structure for spatial searching. In Proc. the 1984 ACM SIGMOD International Conference on Management of Data, Jun. 1984, pp.47–57. DOI: https://doi.org/10.1145/602259.602266.

  5. Yuan H T, Li G L, Feng L, Sun J, Han Y. Automatic view generation with deep learning and reinforcement learning. In Proc. the 36th IEEE International Conference on Data Engineering, Apr. 2020, pp.1501–1512. DOI: https://doi.org/10.1109/ICDE48307.2020.00133.

  6. Han Y, Li G L, Yuan H T, Sun J. An autonomous materialized view management system with deep reinforcement learning. In Proc. the 37th IEEE International Conference on Data Engineering, Apr. 2021, pp.2159–2164. DOI: https://doi.org/10.1109/ICDE51399.2021.00217.

  7. Zhang H, Chen G, Ooi B C, Tan K L, Zhang M H. Inmemory big data management and processing: A survey. IEEE Trans. Knowledge and Data Engineering, 2015, 27(7): 1920–1948. DOI: https://doi.org/10.1109/TKDE.2015.2427795.

    Article  Google Scholar 

  8. Mahmud M S, Huang J Z, Salloum S et al. A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining and Analytics, 2020, 3(2): 85–101. DOI: https://doi.org/10.26599/BDMA.2019.9020015.

    Article  Google Scholar 

  9. Sun L W, Franklin M J, Krishnan S, Xin R S. Finegrained partitioning for aggressive data skipping. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, Jun. 2014, pp.1115–1126. DOI: https://doi.org/10.1145/2588555.2610515.

  10. Aly A M, Mahmood A R, Hassan M S, Aref W G, Ouzzani M, Elmeleegy H, Qadah T. AQWA: Adaptive query workload aware partitioning of big spatial data. Proceedings of the VLDB Endowment, 2015, 8(13): 2062–2073. DOI: https://doi.org/10.14778/2831360.2831361.

    Article  Google Scholar 

  11. Aly A M, Elmeleegy H, Qi Y, Aref W. Kangaroo: Workload-aware processing of range data and range queries in Hadoop. In Proc. the 9th ACM International Conference on Web Search and Data Mining, Feb. 2016, pp.397–406. DOI: https://doi.org/10.1145/2835776.2835841.

  12. Shanbhag A, Jindal A, Madden S, Quiane J, Elmore A J. A robust partitioning scheme for ad-hoc query workloads. In Proc. the 2017 Symposium on Cloud Computing, Sept. 2017, pp.229–241. DOI: https://doi.org/10.1145/3127479.3131613.

  13. Lu Y, Shanbhag A, Jindal A, Madden S. AdaptDB: Adaptive partitioning for distributed joins. Proceedings of the VLDB Endowment, 2017, 10(5): 589–600. DOI: https://doi.org/10.14778/3055540.3055551.

    Article  Google Scholar 

  14. Yang Z H, Chandramouli B, Wang C et al. Qd-tree: Learning data layouts for big data analytics. In Proc. the 2020 ACM SIGMOD International Conference on Management of Data, Jun. 2020, pp.193–208. DOI: https://doi.org/10.1145/3318464.3389770.

  15. Ding J L, Minhas U F, Chandramouli B et al. Instance-optimized data layouts for cloud analytics workloads. In Proc. the 2021 International Conference on Management of Data, Jun. 2021, pp.418–431. DOI: https://doi.org/10.1145/3448016.3457270.

  16. Li Z, Yiu M L, Chan T N. PAW: Data partitioning meets workload variance. In Proc. the 38th IEEE International Conference on Data Engineering, May 2022, pp.123–135. DOI: https://doi.org/10.1109/icde53745.2022.00014.

  17. Rao J, Zhang C, Megiddo N, Lohman G. Automating physical database design in a parallel database. In Proc. the 2002 ACM SIGMOD International Conference on Management of Data, Jun. 2002, pp.558–569. DOI: https://doi.org/10.1145/564691.564757.

  18. Agrawal S, Chu E, Narasayya V. Automatic physical design tuning: Workload as a sequence. In Proc. the 2006 ACM SIGMOD International Conference on Management of Data, Jun. 2006, pp.683–694. DOI: https://doi.org/10.1145/1142473.1142549.

  19. Eadon G, Chong E I, Shankar S, Raghavan A, Srinivasan J, Das S. Supporting table partitioning by reference in oracle. In Proc. the 2008 ACM SIGMOD International Conference on Management of Data, Jun. 2008, pp.1111–1122. DOI: https://doi.org/10.1145/1376616.1376727.

  20. Hauglid J O, Ryeng N H, Nørvåg K. DYFRAM: Dynamic fragmentation and replica management in distributed database systems. Distributed and Parallel Databases, 2010, 28(2): 157–185. DOI: https://doi.org/10.1007/s10619-010-7068-1.

    Article  Google Scholar 

  21. Curino C, Jones E, Zhang Y, Madden S. Schism: A workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment, 2010, 3(1/2): 48–57. DOI: https://doi.org/10.14778/1920841.1920853.

    Article  Google Scholar 

  22. Nehme R, Bruno N. Automated partitioning design in parallel database systems. In Proc. the 2011 ACM SIGMOD International Conference on Management of Data, Jun. 2011, pp.1137–1148. DOI: https://doi.org/10.1145/1989323.1989444.

  23. Pavlo A, Curino C, Zdonik S. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In Proc. the 2012 ACM SIGMOD International Conference on Management of Data, May 2012, pp.61–72. DOI: https://doi.org/10.1145/2213836.2213844.

  24. Liroz-Gistau M, Akbarinia R, Pacitti E et al. Dynamic workload-based partitioning algorithms for continuously growing databases. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XII, Hameurlain A, Küng J, Wagner R (eds.), Springer, 2013, pp.105–128. DOI: https://doi.org/10.1007/978-3-642-45315-15.

  25. Quamar A, Kumar K A, Deshpande A. 2013. SWORD: Scalable workload-aware data placement for transactional workloads. In Proc. the 16th International Conference on Extending Database Technology, Mar. 2013, pp.430–441. DOI: https://doi.org/10.1145/2452376.2452427.

  26. Taft R, Mansour E, Serafini M, Duggan J, Elmore A J, Aboulnaga A, Pavlo A, Stonebraker M. E- store: Finegrained elastic partitioning for distributed transaction processing systems. Proceedings of the VLDB Endowment, 2014, 8(3): 245–256. DOI: https://doi.org/10.14778/2735508.2735514.

    Article  Google Scholar 

  27. Chen K J, Zhou Y L, Cao Y. Online data partitioning in distributed database systems. In Proc. the 18th International Conference on Extending Database Technology, Mar. 2015, pp.1–12. DOI: https://doi.org/10.5441/002/edbt.2015.02.

  28. Zamanian E, Binnig C, Salama A. Locality-aware partitioning in parallel database systems. In Proc. the 2015 ACM SIGMOD International Conference on Management of Data, May 2015, pp.17–30. DOI: https://doi.org/10.1145/2723372.2723718.

  29. Fetai I, Murezzan D, Schuldt H. Workload-driven adaptive data partitioning and distribution—The Cumulus approach. In Proc. the 2015 IEEE International Conference on Big Data, Oct. 29–Nov. 1, 2015, pp.1688–1697. DOI: https://doi.org/10.1109/BigData.2015.7363940.

  30. Serafini M, Taft R, Elmore A J et al. Clay: Fine-grained adaptive partitioning for general database schemas. Proceedings of the VLDB Endowment, 2016, 10(4): 445–456. DOI: https://doi.org/10.14778/3025111.3025125.

    Article  Google Scholar 

  31. Marcus R, Papaemmanouil O, Semenova S, Garber S. NashDB: An end-to-end economic method for elastic database fragmentation, replication, and provisioning. In Proc. the 2018 International Conference on Management of Data, May 2018, pp.1253–1267. DOI: https://doi.org/10.1145/3183713.3196935.

  32. Nam Y M, Kim M S, Han D. A graph-based database partitioning method for parallel OLAP query processing. In Proc. the 34th IEEE International Conference on Data Engineering, Apr. 2018, pp.1025–1036. DOI: https://doi.org/10.1109/ICDE.2018.00096.

  33. Parchas P, Naamad Y, Van Bouwel P, Faloutsos C, Petropoulos M. Fast and effective distribution-key recommendation for amazon redshift. Proceedings of the VLDB Endowment, 2020, 13(12): 2411–2423. DOI: https://doi.org/10.14778/3407790.3407834.

    Article  Google Scholar 

  34. Hilprecht B, Binnig C, Röhm U. Learning a partitioning advisor for cloud databases. In Proc. the 2020 ACM SIGMOD International Conference on Management of Data, Jun. 2020, pp.143–157. DOI: https://doi.org/10.1145/3318464.3389704.

  35. Brendle M, Weber N, Valiyev M, May N, Schulze R, Böhm A, Moerkotte G, Grossniklaus M. SAHARA: Memory footprint reduction of cloud databases with automated table partitioning. In Proc. the 25th International Conference on Extending Database Technology, Mar. 29-Apr. 1, 2022. DOI: https://doi.org/10.5441/002/edbt.2022.02.

  36. Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In Proc. the 20th International Conference on Very Large Data Bases, Sept. 1994, pp.487–499.

  37. Ward J H Jr. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 1963, 58(301): 236–244. DOI: https://doi.org/10.1080/01621459.1963.10500845.

    Article  MathSciNet  Google Scholar 

  38. Roussopoulos N, Kelley S, Vincent F. Nearest neighbor queries. In Proc. the 1995 ACM SIGMOD International Conference on Management of Data, May 1995, pp.71–79. DOI: https://doi.org/10.1145/223784.223794.

  39. Sacca D, Wiederhold G. Database partitioning in a cluster of processors. ACM Trans. Database Systems, 1985, 10(1): 29–56. DOI: https://doi.org/10.1145/3148.3161.

    Article  Google Scholar 

  40. Copeland G, Alexander W, Boughter E, Keller T. Data placement in Bubba. In Proc. the 1988 ACM SIGMOD International Conference on Management of Data, Jun. 1988, pp.99–108. DOI: https://doi.org/10.1145/50202.50213.

  41. Stöhr T, Märtens H, Rahm E. Multi-dimensional database allocation for parallel data warehouses. In Proc. the 26th International Conference on Very Large Data Bases, Sept. 2000, pp.273–284.

  42. Bruno N, Chaudhuri S. An online approach to physical design tuning. In Proc. the 23rd IEEE International Conference on Data Engineering, Apr. 2007, pp.826–835. DOI: https://doi.org/10.1109/ICDE.2007.367928.

  43. Garcia-Alvarado C, Raghavan V, Narayanan S, Waas F M. Automatic data placement in MPP databases. In Proc. the IEEE 28th International Conference on Data Engineering Workshops, Apr. 2012, pp.322–327. DOI: https://doi.org/10.1109/ICDEW.2012.45.

  44. Karypis G, Kumar V. METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. Technical Report, TR 97-061, Univeristy of Minnesota, 1997. https://hdl.handle.net/11299/215346, Mar. 2024.

  45. Kuhn H W. The Hungarian method for the assignment problem. In 50 Years of Integer Programming 1958–2008, Jünger M, Liebling T M, Naddef D, Nemhauser G L, Pulleyblank W R, Reinelt G, Rinaldi G, Wolsey L A (eds.), Springer, 2010, pp.29–47. DOI: https://doi.org/10.1007/978-3-540-68279-0_2.

  46. Costa E, Costa C, Santos M Y. Evaluating partitioning and bucketing strategies for hive-based big data warehousing systems. Journal of Big Data, 2019, 6(1): 34. DOI: https://doi.org/10.1186/s40537-019-0196-1.

    Article  Google Scholar 

  47. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing Atari with deep reinforcement learning. arXiv: 1312.5602, 2013. https://arxiv.org/abs/1312.5602, Mar. 2024.

  48. Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, Zdonik S, Jones E P C, Madden S, Stonebraker M, Zhang Y, Hugg J, Abadi D J. H-store: A high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment, 2008, 1(2): 1496–1499. DOI: https://doi.org/10.14778/1454159.1454211.

    Article  Google Scholar 

  49. Hoffer J A, Severance D G. The use of cluster analysis in physical data base design. In Proc. the 1st International Conference on Very Large Data Bases, Sept. 1975, pp.69–86. DOI: https://doi.org/10.1145/1282480.1282486.

  50. Navathe S, Ceri S, Wiederhold G, Dou J L. Vertical partitioning algorithms for database design. ACM Trans. Database Systems, 1984, 9(4): 680–710. DOI: https://doi.org/10.1145/1994.2209.

    Article  Google Scholar 

  51. Navathe S B, Ra M. Vertical partitioning for database design: A graphical algorithm. ACM SIGMOD Record, 1989, 18(2): 440–450. DOI: https://doi.org/10.1145/66926.66966.

    Article  Google Scholar 

  52. Chu W W, Ieong I T. A transaction-based approach to vertical partitioning for relational database systems. IEEE Trans. Software Engineering, 1993, 19(8): 804–812. DOI: https://doi.org/10.1109/32.238583.

    Article  Google Scholar 

  53. Ng V, Law D M, Gorla N, Chan C K. Applying genetic algorithms in database partitioning. In Proc. the 2003 ACM Symposium on Applied Computing, Mar. 2003, pp.544–549. DOI: https://doi.org/10.1145/952532.952639.

  54. Hankins R A, Patel J M. Data morphing: An adaptive, cache-conscious storage technique. In Proceedings 2003 VLDB Conference, Freytag J C, Lockemann P, Abiteboul S, Carey M, Selinger P, Heuer A (eds.), Elsevier, 2003, pp.417–428. DOI: https://doi.org/10.1016/B978-012722442-8/50044-6.

  55. Papadomanolakis S, Ailamaki A. AutoPart: Automating schema design for large scientific databases using data partitioning. In Proc. the 16th International Conference on Scientific and Statistical Database Management, Jun. 2004, pp.383–392. DOI: https://doi.org/10.1109/SSDM.2004.1311234.

  56. Agrawal S, Narasayya V, Yang B. Integrating vertical and horizontal partitioning into automated physical database design. In Proc. the 2004 ACM SIGMOD International Conference on Management of Data, Jun. 2004, pp.359–370. DOI: https://doi.org/10.1145/1007568.1007609.

  57. Gorla N, Betty P W Y. Vertical fragmentation in databases using data-mining technique. International Journal of Data Warehousing and Mining (IJDWM), 2008, 4(3): 35–53. DOI: https://doi.org/10.4018/jdwm.2008070103.

    Article  Google Scholar 

  58. Rodríguez L, Li X O. A support-based vertical partitioning method for database design. In Proc. the 8th International Conference on Electrical Engineering, Computing Science and Automatic Control, Oct. 2011. DOI: https://doi.org/10.1109/ICEEE.2011.6106682.

  59. Jindal A, Dittrich J. Relax and let the database do the partitioning online. In Proc. the 5th International Workshop on Enabling Real-Time Business Intelligence, Sept. 2011, pp.65–80. DOI: https://doi.org/10.1007/978-3-642-33500-6_5.

  60. Li L Z, Gruenwald L. Self-managing online partitioner for databases (SMOPD): A vertical database partitioning system with a fully automatic online approach. In Proc. the 17th International Database Engineering & Applications Symposium, Oct. 2013, pp.168–173. DOI: https://doi.org/10.1145/2513591.2513649.

  61. Rodríguez L, Li X O, Cuevas-Rasgado A D, García-Lamont F. DYVEP: An active database system with vertical partitioning functionality. In Proc. the 10th IEEE International Conference on Networking, Sensing and Control, Apr. 2013, pp.457–462. DOI: https://doi.org/10.1109/ICNSC.2013.6548782.

  62. Li L Z, Gruenwald L. SMOPD-C: An autonomous vertical partitioning technique for distributed databases on cluster computers. In Proc. the 15th IEEE International Conference on Information Reuse and Integration, Aug. 2014, pp.171–178. DOI: https://doi.org/10.1109/IRI.2014.7051887.

  63. Sun L W, Franklin M J, Wang J N, Wu E. Skipping-oriented partitioning for columnar layouts. Proceedings of the VLDB Endowment, 2016, 10(4): 421–432. DOI: https://doi.org/10.14778/3025111.3025123.

    Article  Google Scholar 

  64. Huang Y F, Lai C J. Integrating frequent pattern clustering and branch-and-bound approaches for data partitioning. Information Sciences, 2016, 328: 288–301. DOI: https://doi.org/10.1016/j.ins.2015.08.047.

    Article  Google Scholar 

  65. Rodríguez-Mazahua L, Alor-Hernández G, Li X O, Cervantes J, López-Chau A. Active rule base development for dynamic vertical partitioning of multimedia databases. Journal of Intelligent Information Systems, 2017, 48(2): 421–451. DOI: https://doi.org/10.1007/s10844-016-0420-9.

    Article  Google Scholar 

  66. Durand G C, Pinnecke M, Piriyev R et al. GridFormation: Towards self-driven online data partitioning using reinforcement learning. In Proc. the 1st International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Jun. 2018, Article No. 1. DOI: https://doi.org/10.1145/3211954.3211956.

  67. Durand G C, Piriyev R, Pinnecke M et al. Automated vertical partitioning with deep reinforcement learning. In New Trends in Databases and Information Systems, Welzer T et al. (eds.), Springer, 2019, pp.126–134. DOI: https://doi.org/10.1007/978-3-030-30278-8_16.

  68. Liu P J, Li H Y, Wang T Y et al. Multi-stage method for online vertical data partitioning based on spectral clustering. Journal of Software, 2023, 34(6): 2804–2832. DOI: https://doi.org/10.13328/j.cnki.jos.006496.

    Google Scholar 

  69. Grund M, Krüger J, Plattner H, Zeier A, Cudre-Mauroux P, Madden S. HYRISE: A main memory hybrid storage engine. Proceedings of the VLDB Endowment, 2010, 4(2): 105–116. DOI: https://doi.org/10.14778/1921071.1921077.

    Article  Google Scholar 

  70. Jindal A, Quiané-Ruiz J A, Dittrich J. Trojan data layouts: Right shoes for a running elephant. In Proc. the 2nd ACM Symposium on Cloud Computing, Oct. 2011, Article No. 21. DOI: https://doi.org/10.1145/2038916.2038937.

  71. Gu X Y, Yang X F, Wang W P, Jin Y, Meng D. CHAC: An effective attribute clustering algorithm for large-scale data processing. In Proc. the 7th International Conference on Networking, Architecture, and Storage, Jun. 2012, pp.94–98. DOI: https://doi.org/10.1109/NAS.2012.16.

  72. Arulraj J, Pavlo A, Menon P. Bridging the archipelago between row-stores and column-stores for hybrid workloads. In Proc. the 2016 International Conference on Management of Data, Jun. 2016, pp.583–598. DOI: https://doi.org/10.1145/2882903.2915231.

  73. Athanassoulis M, Bøgh K S, Idreos S. Optimal column layout for hybrid workloads. Proceedings of the VLDB Endowment, 2019, 12(33): 2393–2407. https://doi.org/10.14778/3358701.3358707.

    Article  Google Scholar 

  74. McCormick W T, Schweitzer P J, White T W. Problem decomposition and data reorganization by a clustering technique. Operations Research, 1972, 20(5): 993–1009. DOI: https://doi.org/10.1287/opre.20.5.993.

    Article  Google Scholar 

  75. Ailamaki A, DeWitt D J, Hill M D, Skounakis M. Weaving relations for cache performance. In Proc. the 27th International Conference on Very Large Data Bases, Sept. 2001, pp.169–180.

  76. Li L Z, Gruenwald L. Autonomous database partitioning using data mining on single computers and cluster computers. In Proc. the 16th International Database Engineering & Applications Sysmposium, Aug. 2012, pp.32–41. DOI: https://doi.org/10.1145/2351476.2351481.

  77. van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In Proc. the 30th AAAI Conference on Artificial Intelligence, Feb. 2016, pp.2094–2100. DOI: https://doi.org/10.1609/aaai.v30i1.10295.

  78. Jindal A, Palatinus E, Pavlov V, Dittrich J. A comparison of knives for bread slicing. Proceedings of the VLDB Endowment, 2013, 6(6): 361–372. DOI: https://doi.org/10.14778/2536336.2536338.

    Article  Google Scholar 

  79. Al-Kateb M, Sinclair P, Au G, Ballinger C. Hybrid row-column partitioning in Teradata®. Proceedings of the VLDB Endowment, 2016, 9(13): 1353–1364. DOI: https://doi.org/10.14778/3007263.3007273.

    Article  Google Scholar 

  80. Pinnecke M, Durand G C, Broneske D, Zoun R, Saake G. GridTables: A One-Size-Fits-Most H2TAP data store. Datenbank-Spektrum, 2020, 20(1): 43–56. DOI: https://doi.org/10.1007/s13222-019-00330-x.

    Article  Google Scholar 

  81. Kang D H, Jiang R C, Blanas S. Jigsaw: A data storage and query processing engine for irregular table partitioning. In Proc. the 2021 International Conference on Management of Data, Jun. 2021, pp.898–911. DOI: https://doi.org/10.1145/3448016.3457547.

  82. Abebe M, Lazu H, Daudjee K. Proteus: Autonomous adaptive storage for mixed workloads. In Proc. the 2022 International Conference on Management of Data, Jun. 2022, pp.700–714. DOI: https://doi.org/10.1145/3514221.3517834.

  83. Wang J Y, Chai C L, Liu J B, Li G L. FACE: A normalizing flow based cardinality estimator. Proceedings of the VLDB Endowment, 2021, 15(1): 72–84. DOI: https://doi.org/10.14778/3485450.3485458.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cui-Ping Li  (李翠平).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

The work was supported by the National Key Research and Development Program of China under Grant No. 2023YFB4503603, the National Natural Science Foundation of China under Grant Nos. 62072460, 62076245, and 62172424, and the Beijing Natural Science Foundation under Grant No. 4212022.

Peng-Ju Liu received his B.S. degree in information management and information system from Dalian Maritime University, Dalian, in 2020. He is currently pursuing his Ph.D. degree at the School of Information, Renmin University of China, Beijing. His research interests include adaptable data partitioning, load forecasting, and learning-based query optimization.

Cui-Ping Li is currently a professor at Renmin University of China, Beijing. She received her Ph.D. degree from Chinese Academy of Sciences, Beijing, in 2003. Before that, she received her B.S. and M.S. degrees from Xi’an Jiaotong University, Xi’an, in 1994 and 1997, respectively. She received the Second Prize of the National Award for Science and Technology Progress in 2018. Her main research interests include social network analysis, social recommendation, and big data analysis.

Hong Chen is currently a professor at Renmin University of China, Beijing. She received her Ph.D. degree from Chinese Academy of Sciences, Beijing, in 2000. Before that, she received her B.S. and M.S. degrees from Renmin University of China, Beijing, in 1986 and 1989, respectively. She received the Second Prize of the National Award for Science and Technology Progress in 2018. Her research interests include database technology and high-performance computing.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, PJ., Li, CP. & Chen, H. Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning Techniques. J. Comput. Sci. Technol. 39, 346–368 (2024). https://doi.org/10.1007/s11390-024-3538-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-024-3538-1

Keywords