skip to main content
10.1145/3035918.3035934acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curves

Published:09 May 2017Publication History

ABSTRACT

Recently, massive data management plays an increasingly important role in data analytics because data access is a major bottleneck. Data skipping is a promising technique to reduce the number of data accesses. Data skipping partitions data into pages and accesses only pages that contain data to be retrieved by a query. Therefore, effective data partitioning is required to minimize the number of page accesses. However, it is an NP-hard problem to obtain optimal data partitioning given query pattern and data distribution.

We propose a framework that involves a multidimensional indexing technique based on a space-filling curve. A space-filling curve is a way to define which portion of data can be stored in the same page. Therefore, the problem can be interpreted as selecting a curve that distributes data to be accessed by a query to minimize the number of page accesses. To solve this problem, we analyzed how different space-filling curves affect the number of page accesses. We found that it is critical for a curve to fit a query pattern and be robust against any data distribution. We propose a cost model for measuring how well a space-filling curve fits a given query pattern and tolerates data skew. Also we propose a method for designing a query-aware and skew-tolerant curve for a given query pattern.

We prototyped our framework using the defined query-aware and skew-tolerant curve. We conducted experiments using a skew data set, and confirmed that our framework can reduce the number of page accesses by an order of magnitude for data warehousing (DWH) and geographic information systems (GIS) applications with real-world data.

References

  1. Decimal degree. https://en.wikipedia.org/wiki/Decimal_degrees.Google ScholarGoogle Scholar
  2. libspatialindex. https://github.com/libspatialindex/libspatialindex.Google ScholarGoogle Scholar
  3. uzaygezen. https://github.com/aioaneid/uzaygezen.Google ScholarGoogle Scholar
  4. M. Bader. Space-Filling Curves: An Introduction with Applications in Scientific Computing, volume 9 of Texts in Computational Science and Engineering. Springer Berlin Heidelberg, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Faloutsos. Gray codes for partial match and range queries. IEEE Transactions on Software Engineering, 14(10):1381--1393, Oct. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Faloutsos. Multiattribute hashing using gray codes. In the ACM SIGMOD Conference, pages 227--238, May 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. H. Hamilton and A. Rau-Chaplin. Compact hilbert indices: Space-filling curves for domains with unequal side lengths. Information Processing Letters, 105:155--163, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Hazewinkel, editor. Encyclopedia of Mathematics, chapter Multinomial coefficient. Springer, 2001. http://www.encyclopediaofmath.org/index.php/Multinomial_coefficient.Google ScholarGoogle Scholar
  9. HBase: Bigtable-like structured storage for Hadoop HDFS, 2010. http://hadoop.apache.org/hbase/.Google ScholarGoogle Scholar
  10. D. Hilbert. Ueber stetige abbildung einer linie auf flächenstück. Mathematische Annalen, 38:459--460, 1891.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Huang, B. Wang, J. Zhu, G. Wang, and G. Yu. R-hbase: A multi-dimensional indexing framework for cloud computing environment. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on, pages 569--574, Dec 2014.Google ScholarGoogle ScholarCross RefCross Ref
  12. R. Kimball and M. Ross. The Data Warehouse Toolkit: the complete guide to dimensional modeling. Wiley Computer Publishing, second edition, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. K. Lawder. Querying multi-dimensional data indexed using the hilbert space-filling curve. SIGMOD Record, 30:2001, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Liu and G. F. Schrack. A new ordering strategy applied to spatial data processing. International Journal Geographical Information Science, 12(1):3--22, Jan. 1998.Google ScholarGoogle ScholarCross RefCross Ref
  15. V. Markl. MISTRAL: Processing Relational Queries using a Multidimensional Access Technique. PhD thesis, TU München, 1999.Google ScholarGoogle Scholar
  16. V. Markl and R. Bayer. Processing Relational OLAP Queries with UB-Trees and Multidimensional Hierarchical Clustering. Proceedings of the International Workshop on Design and Management of Data Warehouses, 2000:1--10, 2000.Google ScholarGoogle Scholar
  17. M. F. Mokbel and W. G. Aref. Irregularity in multidimensional space-filling curves with applications in multimedia databases. In In Proceedings of the International Conference on Information and Knowledge Managemen, CIKM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. F. Mokbel and W. G. Aref. On query processing and optimality using spectral locality-preserving mappings. Advances in Spatial and Temporal Databases Lecture Notes in Computer Science, 2750:102--121, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. F. Mokbel, W. G. Aref, and I. Kamel. Analysis of multi-dimensional space-filling curves. Geoinformatica, 7(3):179--209, Sept. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Moon, H. V. Jagadish, C. Faloutsos, and J. Salz. Analysis of the clustering properties of hilbert space-filling curve. IEEE Trans. Knowl. Data Eng., TKDE, 13(1):124--141, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Moore. Fast hilbert curve generation, sorting, and range queries. http://www.tiac.net/ sw/2008/10/Hilbert/moore/index.html.Google ScholarGoogle Scholar
  22. G. M. Morton. A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM Ltd., 1966.Google ScholarGoogle Scholar
  23. S. Nishimura, S. Das, D. Agrawal, and A. El Abbadi. MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services. In Proceedings of 12th IEEE International Conference on Mobile Data Management, MDM, pages 7--16, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. NYC Taxi & Limousine Commission. TLC Trip Record Data. http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml.Google ScholarGoogle Scholar
  25. V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, and L. Zhang. Db2 with blu acceleration: So much more than just a column store. Proc. VLDB Endow., 6(11):1080--1091, Aug. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Ramsak, V. Markl, R. Fenk, M. Zirkel, K. Elhardt, and R. Bayer. Integrating the ub-tree into a database system kernel. In 26th International Conference on Very Large Data Bases, pages 263--272, Sep. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Sagan. Space-Filling Curves. Springer-Verlag, 1994.Google ScholarGoogle Scholar
  28. H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Schrack and X. Liu. The spatial u-order and some of its mathematical characteristics. In the Pacific Rim Conference on Cummunications, Computers, and Signal Processing, pages 416--419, May 1995.Google ScholarGoogle ScholarCross RefCross Ref
  30. T. Skopal, M. Krátký, J. s. Pokorný, and V. Snášel. A new range query algorithm for Universal B-trees. Information Systems, 31(6):489--511, Sept. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Sun, M. J. Franklin, S. Krishnan, and R. S. Xin. Fine-grained partitioning for aggressive data skipping. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 1115--1126, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sybase, Inc. Performance and Tuning: Basics, Aug. 2003. Chapter 13: Indexing for Performance.Google ScholarGoogle Scholar
  33. H. Tropf and H. Herzong. Multidimensional range search in dynamically balanced trees. Angewandte Informatik, 23(2):71--77, Feb. 1981.Google ScholarGoogle Scholar
  34. M. White. N-trees: large ordered indexes for multi-dimensional space. Technical report, Statistical Research Division, US Bureau of the Census, 1982.Google ScholarGoogle Scholar
  35. P. Xu and S. Tirthapura. On the optimality of clustering properties of space filling curves. In Proceedings of the 31st Symposium on Principles of Database Systems, PODS '12, pages 215--224, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Zou, J. Liu, S. Wang, L. Zha, and Z. Xu. Ccindex: A complemental clustering index on distributed ordered tables for multi-dimensional range queries. Network and Parallel Computing, 6289:247--261, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curves

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
            May 2017
            1810 pages
            ISBN:9781450341974
            DOI:10.1145/3035918

            Copyright © 2017 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 May 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader