skip to main content
10.1145/1836089.1836095acmotherconferencesArticle/Chapter ViewAbstractPublication PagesppdpConference Proceedingsconference-collections
research-article

Deriving predicate statistics in datalog

Authors Info & Claims
Published:26 July 2010Publication History

ABSTRACT

Database query optimizers rely on data statistics in selecting query execution plans. Similar query optimization techniques are desirable for deductive databases and, to make this happen, we need to be able to collect data statistics for Datalog predicates. The difficulty is, however, that Datalog predicates can be recursive. In this paper, we propose an algorithm, called SDP, that estimates Datalog query sizes efficiently by maintaining the statistical dependency information for derived predicates. Base predicate statistics are computed and summarized using dependency matrices, and derived predicate statistics are computed by evaluating rules in an abstract way with rule bodies replaced with algebraic expressions over the dependency matrices. Recursive rules are handled by a fixed point evaluation. Our experimental study validates that: 1) SDP produces better query size estimates than using base predicate statistics and propagating them to derived predicates using the argument independence assumption; 2) the estimates largely preserve the relative order of real query sizes and thus can be used to guide cost based query optimizers.

References

  1. A. Baddeley and R. Turner. Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software, 12 (6): 1-42, 2005. URL: www.jstatsoft.org, ISSN: 1548--7660.Google ScholarGoogle ScholarCross RefCross Ref
  2. N. Bruno and S. Chaudhuri. Exploiting statistics on query expressions for optimization. In SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 263--274, New York, NY, USA, 2002. ACM. ISBN 1-58113-497-5. http://doi.acm.org/10.1145/564691.564722. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Christodoulakis. Implications of certain assumptions in database performance evauation. ACM Trans. Database Syst., 9 (2): 163--186, 1984. ISSN 0362-5915. http://doi.acm.org/10.1145/329.318578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Deshpande, M. Garofalakis, and R. Rastogi. Independence is good: dependency-based histogram synopses for high-dimensional data. SIGMOD Rec., 30 (2): 199--210, 2001. ISSN 0163-5808. http://doi.acm.org/10.1145/376284.375685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Furtado and H. Madeira. Summary grids: Building accurate multidimensional histograms, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Ioannidis. The history of histograms (abridged). In Proc. of VLDB Conference, Berlin, Germany, 2003. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. SIGMOD Rec., 20 (2): 268--277, 1991. ISSN 0163-5808. http://doi.acm.org/10.1145/119995.115835. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. E. Ioannidis and V. Poosala. Balancing histogram optimality and practicality for query result size estimation. In SIGMOD '95: Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pages 233--244, New York, NY, USA, 1995. ACM. ISBN 0-89791-731--6. http://doi.acm.org/10.1145/223784.223841. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Kifer, A. Bernstein, and P. M. Lewis. Database Systems: An Application Oriented Approach, Compete Version. Addison-Wesley, Boston, MA, 2006. ISBN 9780321268457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. J. Lipton and J. F. Naughton. Estimating the size of generalized transitive closures. In VLDB '89: Proceedings of the 15th international conference on Very large data bases, pages 165--171, San Francisco, CA, USA, 1989. Morgan Kaufmann Publishers Inc. ISBN 1-55860-101-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Muralikrishna and D. J. DeWitt. Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In H. Boral and P.-Å. Larson, editors, Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, June 1-3, 1988, pages 28--36. ACM Press, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Poosala and Y. E. Ioannidis. Selectivity estimation without the attribute value independence assumption. In VLDB '97: Proceedings of the 23rd International Conference on Very Large Data Bases, pages 486--495, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc. ISBN 1-55860-470-7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Poosala, P. J. Haas, Y. E. Ioannidis, and E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In SIGMOD '96: Proceedings of the 1996 ACM SIGMOD international conference on Management of data, pages 294--305, New York, NY, USA, 1996. ACM. ISBN 0-89791-794-4. http://doi.acm.org/10.1145/233269.233342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In SIGMOD '79: Proceedings of the 1979 ACM SIGMOD international conference on Management of data, pages 23--34, New York, NY, USA, 1979. ACM. ISBN 0-89791-001-X. http://doi.acm.org/10.1145/582095.582099. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Sereni, P. Avgustinov, and O. de Moor. Adding magic to an optimising datalog compiler. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 553--566, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-102-6. http://doi.acm.org/10.1145/1376616.1376673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Seshadri and J. F. Naughton. On the expected size of recursive datalog queries. In PODS '91: Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 268--279, New York, NY, USA, 1991. ACM. ISBN 0-89791-430-9. http://doi.acm.org/10.1145/113413.113438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Spiegel and N. Polyzotis. Graph-based synopses for relational selectivity estimation. In SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 205--216, New York, NY, USA, 2006. ACM. ISBN 1-59593-434-0. http://doi.acm.org/10.1145/1142473.1142497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. Leo - db2's learning optimizer. In VLDB '01: Proceedings of the 27th International Conference on Very Large Data Bases, pages 19--28, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. ISBN 1-55860-804-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. T. Tekle and Y. A. Liu. Precise complexity analysis for efficient datalog queries. In PPDP, Hagenberg, Austria, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deriving predicate statistics in datalog

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            PPDP '10: Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming
            July 2010
            266 pages
            ISBN:9781450301329
            DOI:10.1145/1836089

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 July 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            PPDP '10 Paper Acceptance Rate21of57submissions,37%Overall Acceptance Rate230of486submissions,47%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader