ABSTRACT
We consider a fundamental flow maximization problem that arises during the evaluation of multiple overlapping queries defined on a data stream, in a heterogenous parallel environment. Each query is a conjunction of boolean filters, and each filter could be shared across multiple queries. We are required to design an evaluation plan that evaluates filters against stream items in order to determine the set of queries satisfied by each item. The evaluation plan specifies for each item: (i) the subset of filters evaluated for this item and the order of their evaluations, and (ii) the processor on which each filter evaluation occurs. Our goal is to design an evaluation plan which maximizes the total throughput (flow) of the stream handled by the plan, without violating the processor capacities.
Filter ordering has received extensive attention in single-processor settings, with the objective of minimizing the total cost of filter evaluations: in particular, efficient (approximation) algorithms are known for various important versions of min-cost filter ordering. Min-cost filter ordering problem for a single processor is a special case of our flow maximization for parallel processors. Our main contribution in this work is a generic flow-maximization algorithm, which assumes the availability of a min-cost filter ordering algorithm for a single processor, and uses this to iteratively construct a solution to the flow-maximization problem for heterogenous parallel processors. We show that the approximation ratio of our flow-maximization strategy is essentially the same as that of the underlying min-cost filter ordering algorithm. Our result, along with existing results on min-cost filter ordering, enables the optimization of several important versions of filter ordering in parallel environments.
- Ron Avnur and Joseph M. Hellerstein. Eddies: continuously adaptive query processing. SIGMOD Rec., 29(2):261--272, 2000.]] Google ScholarDigital Library
- Shivnath Babu, Rajeev Motwani, Kamesh Munagala, Itaru Nishizawa, and Jennifer Widom. Adaptive ordering of pipelined stream filters. In SIGMOD, pages 407--418, New York, NY, USA, 2004. ACM Press.]] Google ScholarDigital Library
- Amotz Bar-Noy, Mihir Bellare, Magn´us M. Halld´orsson, Hadas Shachnai, and Tami Tamir. On chromatic sums and distributed resource allocation. Inf. Comput, 140(2):183--202, 1998.]] Google ScholarDigital Library
- Surajit Chaudhuri, Umeshwar Dayal, and Tak W. Yan. Join queries with external text sources: execution and optimization techniques. In SIGMOD '95: Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pages 410--422, New York, NY, USA, 1995. ACM.]] Google ScholarDigital Library
- H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493--509, 1952.]]Google ScholarCross Ref
- Edith Cohen, Amos Fiat, and Haim Kaplan. Efficient sequences of trials. In SODA, pages 737--746, 2003.]] Google ScholarDigital Library
- Anne Condon, Amol Deshpande, Lisa Hellerstein, and Ning Wu. Flow algorithms for two pipelined filter ordering problems. In PODS, pages 193--202, New York, NY, USA, 2006. ACM Press.]] Google ScholarDigital Library
- Amol Deshpande, Carlos Guestrin, Wei Hong, and Samuel Madden. Exploiting correlated attributes in acquisitional query processing. In ICDE '05: Proceedings of the 21st International Conference on Data Engineering, pages 143--154, Washington, DC, USA, 2005. IEEE Computer Society.]] Google ScholarDigital Library
- Oren Etzioni, Steve Hanks, Tao Jiang, Richard M. Karp, Omid Madani, and Orli Waarts. Efficient information gathering on the internet (extended abstract). In FOCS, pages 234--243, 1996.]] Google ScholarDigital Library
- Uriel Feige and Prasad Tetali. Approximating min sum set cover. Algorithmica, 40(4):219--234, 2004.]] Google ScholarDigital Library
- N. Garg and J. Koenemann. Faster and simpler algorithms for multicommodity flow and other fractional packing problems. In Proceedings of the 39th Annual Symposium on Foundations of Computer Science, page 300. IEEE Computer Science Society, 1998.]] Google ScholarDigital Library
- Roy Goldman and Jennifer Widom. Wsq/dsq: a practical approach for combined querying of databases and the web. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 285--296, New York, NY, USA, 2000. ACM.]] Google ScholarDigital Library
- J. Hellerstein and M. Stonebraker. Predicate migration: Optimizing queries with expensive predicates. In Proc. SIGMOD, 1993.]] Google ScholarDigital Library
- W. Hoeffding. Probability inequalities for sums of bounded random variables. American Statistical Association Journal, 58:13--30, 1963.]]Google ScholarCross Ref
- T. Ibaraki and T. Kameda. On the optimal nesting order for computing n-relational joins. ACM Trans. on Database Systems, 9(3):482--502, 1984.]] Google ScholarDigital Library
- Haim Kaplan, Eyal Kushilevitz, and Yishay Mansour. Learning with attribute costs. In STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 356--365, New York, NY, USA, 2005. ACM.]] Google ScholarDigital Library
- Murali S. Kodialam. The throughput of sequential testing. In Proceedings of the 8th International IPCO Conference on Integer Programming and Combinatorial Optimization, pages 280--292, London, UK, 2001. Springer-Verlag.]] Google ScholarDigital Library
- Ravi Krishnamurthy, Haran Boral, and Carlo Zaniolo. Optimization of nonrecursive queries. In Proc. VLDB, pages 128--137, 1986.]] Google ScholarDigital Library
- Zhen Liu, Srinivasan Parthasarathy, Anand Ranganathan, and Hao Yang. Near-optimal algorithms for shared filter evaluation in data stream systems. In Proc. of ACM SIGMOD (to appear), 2008.]] Google ScholarDigital Library
- Kamesh Munagala, Shivnath Babu, Rajeev Motwani, and Jennifer Widom. The pipelined set cover problem. In 10th International Conference on Database Theory - ICDT, pages 83--98, 2005.]] Google ScholarDigital Library
- Kamesh Munagala, Utkarsh Srivastava, and Jennifer Widom. Optimization of continuous queries with shared expensive filters. In PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 215--224, New York, NY, USA, 2007. ACM.]] Google ScholarDigital Library
- Serge A. Plotkin, David B. Shmoys, and Éva Tardos. Fast approximation algorithms for fractional packing and covering problems. In Proceedings of the 32nd annual symposium on Foundations of computer science, pages 495--504, Los Alamitos, CA, USA, 1991. IEEE Computer Society Press.]] Google ScholarDigital Library
- H. Simon and J. Kadane. Optimal problem-solving search: All-or-none solutions. Artificial Intelligence, 6:235--247, 1975.]]Google ScholarCross Ref
- Vijay V. Vazirani. Approximation algorithms. Springer-Verlag New York, Inc., New York, NY, USA, 2001.]] Google ScholarDigital Library
- Neal E. Young. Sequential and parallel algorithms for mixed packing and covering. In Proceedings of IEEE Symposium on Foundations of Computer Science, 2001.]] Google ScholarDigital Library
Index Terms
- A generic flow algorithm for shared filter ordering problems
Recommendations
Flow algorithms for two pipelined filter ordering problems
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsPipelined filter ordering is a central problem in database query optimization, and has received renewed attention recently in the context of environments such as the web, continuous high-speed data streams and sensor networks. We present algorithms for ...
Parallel pipelined filter ordering with precedence constraints
In the parallel pipelined filter ordering problem, we are given a set of n filters that run in parallel. The filters need to be applied to a stream of elements, to determine which elements pass all filters. Each filter has a rate limit ri on the number ...
Near-optimal algorithms for shared filter evaluation in data stream systems
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataWe consider the problem of evaluating multiple overlapping queries defined on data streams, where each query is a conjunction of multiple filters and each filter may be shared across multiple queries. Efficient support for overlapping queries is a ...
Comments