skip to main content
research-article

A Scalable Execution Engine for Package Queries

Published: 12 May 2017 Publication History

Abstract

Many modern applications and real-world problems involve the design of item collections, or packages: from planning your daily meals all the way to mapping the universe. Despite the pervasive need for packages, traditional data management does not offer support for their definition and computation. This is because traditional database queries follow a powerful, but very simple model: a query defines constraints that each tuple in the result must satisfy. However, a system tasked with the design of packages cannot consider items independently; rather, the system needs to determine if a set of items collectively satisfy given criteria.
In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. First, we design PaQL, a SQL-based query language that supports the declarative specification of package queries. Second, we present a fundamental strategy for evaluating package queries that combines the capabilities of databases and constraint optimization solvers. The core of our approach is a set of translation rules that transform a package query to an integer linear program. Third, we introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. Fourth, we introduce SKETCHREFINE, an efficient and scalable algorithm for package evaluation, which offers strong approximation guarantees. Finally, we present extensive experiments over real-world data. Our results demonstrate that SKETCHREFINE is effective at deriving high-quality package results, and achieves runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.

References

[1]
S. Basu Roy, S. Amer-Yahia, A. Chawla, G. Das, and C. Yu. Constructing and exploring composite items. In SIGMOD, pages 843--854, 2010.
[2]
A. Baykasoglu, T. Dereli, and S. Das. Project team selection using fuzzy optimization approach. Cybernetic Systems, 38(2):155--185, 2007.
[3]
J. Bisschop. AIMMS Optimization Modeling. Paragon Decision Technology, 2006.
[4]
M. Brucato, J. F. Beltran, A. Abouzied, and A. Meliou. Scalable package queries in relational database systems. PVLDB, 9(7):576--587, 2016.
[5]
M. Brucato, R. Ramakrishna, A. Abouzied, and A. Meliou. PackageBuilder: From tuples to packages. PVLDB, 7(13):1593--1596, 2014.
[6]
W. Cook and M. Hartmann. On the complexity of branch and cut methods for the traveling salesman problem. Polyhedral Combinatorics, 1:75--82, 1990.
[7]
M. De Choudhury, M. Feldman, S. Amer-Yahia, N. Golbandi, R. Lempel, and C. Yu. Automatic construction of travel itineraries using social breadcrumbs. In HyperText, pages 35--44, 2010.
[8]
T. Deng, W. Fan, and F. Geerts. On the complexity of package recommendation problems. In PODS, pages 261--272, 2012.
[9]
R. A. Finkel and J. L. Bentley. Quad trees a data structure for retrieval on composite keys. Acta informatica, 4(1):1--9, 1974.
[10]
M. X. Goemans and D. P. Williamson. The primal-dual method for approximation algorithms and its application to network design problems. Approximation algorithms for NP-hard problems, pages 144--191, 1997.
[11]
S. Guha, D. Gunopulos, N. Koudas, D. Srivastava, and M. Vlachos. Efficient approximation of optimization queries under parametric aggregation constraints. In VLDB, pages 778--789, 2003.
[12]
IBM CPLEX Optimization Studio. http://www.ibm.com/software/commerce/optimization/cplex-optimizer/.
[13]
A. Kalinin, U. Çetintemel, and S. B. Zdonik. Interactive data exploration using semantic windows. In SIGMOD, pages 505--516, 2014.
[14]
A. Kalinin, U. Çetintemel, and S. B. Zdonik. Searchlight: Enabling integrated search and exploration over large multidimensional data. PVLDB, 8(10):1094--1105, 2015.
[15]
P. Kanellakis, G. Kuper, and P. Revesz. Constraint query languages. Journal of Computer and System Sciences, 1(51):26--52, 1995.
[16]
T. Lappas, K. Liu, and E. Terzi. Finding a team of experts in social networks. In SIGKDD, pages 467--476, 2009.
[17]
A. Meliou and D. Suciu. Tiresias: The database oracle for how-to queries. In SIGMOD, pages 337--348, 2012.
[18]
B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. Distributed submodular maximization: Identifying representative elements in massive data. In NIPS, 2013.
[19]
M. Padberg and G. Rinaldi. A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Review, 33(1):60--100, 1991.
[20]
A. G. Parameswaran, P. Venetis, and H. Garcia-Molina. Recommendation systems with complex constraints: A course recommendation perspective. ACM TOIS, 29(4):1--33, 2011.
[21]
F. Pinel and L. R. Varshney. Computational creativity for culinary recipes. In CHI, pages 439--442, 2014.
[22]
The Sloan Digital Sky Survey. http://www.sdss.org/.
[23]
X. Wang, X. L. Dong, and A. Meliou. Data X-Ray: A diagnostic tool for data errors. In SIGMOD, pages 1231--1245, 2015.
[24]
D. P. Williamson and D. B. Shmoys. The design of approximation algorithms. Cambridge University Press, 2011.

Cited By

View all
  • (2021)Data-induced predicates for sideways information passing in query optimizersThe VLDB Journal10.1007/s00778-021-00693-231:6(1263-1290)Online publication date: 29-Aug-2021
  • (2019)Pushing data-induced predicates through joins in big-data clustersProceedings of the VLDB Endowment10.14778/3368289.336829213:3(252-265)Online publication date: 1-Nov-2019
  • (2019)Evaluation of an Implementation of Cross-Row Constraints Using Materialized ViewsACM SIGMOD Record10.1145/3377391.337739648:3(23-28)Online publication date: 20-Dec-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 46, Issue 1
March 2017
46 pages
ISSN:0163-5808
DOI:10.1145/3093754
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2017
Published in SIGMOD Volume 46, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Data-induced predicates for sideways information passing in query optimizersThe VLDB Journal10.1007/s00778-021-00693-231:6(1263-1290)Online publication date: 29-Aug-2021
  • (2019)Pushing data-induced predicates through joins in big-data clustersProceedings of the VLDB Endowment10.14778/3368289.336829213:3(252-265)Online publication date: 1-Nov-2019
  • (2019)Evaluation of an Implementation of Cross-Row Constraints Using Materialized ViewsACM SIGMOD Record10.1145/3377391.337739648:3(23-28)Online publication date: 20-Dec-2019
  • (2019)Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping StudyIEEE Access10.1109/ACCESS.2018.28822447(10691-10717)Online publication date: 2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media