Abstract
Metaquerying is a data mining technology by which hidden dependencies among several database relations can be discovered. This tool has already been successfully applied to several real-world applications, but only preliminary results about the complexity of metaquerying can be found in the literature. In this article, we define several variants of metaquerying that encompass, as far as we know, all the variants that have been defined in the literature. We study both the combined complexity and the data complexity of these variants. We show that under the combined complexity measure metaquerying is generally intractable (unless P = NP), lying sometimes quite high in the complexity hierarchies (as high as NPPP), depending on the characteristics of the plausibility index. Nevertheless, we are able to single out some tractable and interesting metaquerying cases, whose combined complexity is LOGCFL-complete. As for the data complexity of metaquerying, we prove that, in general, it is within TC0, but lies within AC0 in some simpler cases. Finally, we discuss the implementation of metaqueries by providing algorithms that answer them.
- Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of databases. Addison-Wesley, Reading Mass. Google ScholarDigital Library
- Agrawal, M., Allender, E., and Datta, S. 2000. On TC0, AC0 and arithmetic circuits. J. Comput. Syst. Sci. 60, 2, 395--421. Google ScholarCross Ref
- Agrawal, R., Imielinski, T., and Swami, A. N. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (Washington, D.C.). P. Buneman and S. Jajodia, Eds. ACM, New York, 207--216. Google ScholarDigital Library
- Ambainis, A., Barrington, D. M., and LêThanh, H. 1998. On counting AC0 circuits with negative constants. In Proceedings of the 23rd International Symposium on Mathematical Foundations of Computer Science (Brno, Czech Republic). 409--417. Google ScholarDigital Library
- Angluin, D. 1980. On counting problems and the polynomial-time hierarchy. Theoret. Comput. Sci. 12, 161--173.Google ScholarCross Ref
- Barrington, D. A. M., Immerman, N., and Straubing, H. 1990. On uniformity within NC1. J. Comput. Syst. Sci. 41, 3, 274--306. Google ScholarDigital Library
- Beeri, C., Fagin, R., Yannakakis, M., and Maier, D. 1983. On the desirability of acyclic database schemas. J. ACM 30, 3, 479--513. Google ScholarDigital Library
- Ben-Eliyahu-Zohary, R. and Gudes, E. 1999. Towards efficient metaquerying. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (Stockholm, Sweden). 800--805. Google ScholarDigital Library
- Bernstein, P. and Goodman, N. 1981. The power of natural semijoins. SIAM J. Comput. 10, 4, 751--771.Google ScholarDigital Library
- Chandra, A. K. and Merlin, P. M. 1977. Optimal implementation of conjunctive queries in relational data bases. In Conference Record of the 9th Annual ACM Symposium on Theory of Computing (Boulder, Col.). ACM, New York, 77--90. Google ScholarDigital Library
- Domshlak, C., Gershkovich, D., Gudes, E., Liusternik, N., Meisels, A., Rosen, T., and Shimony, S. E.1998a. FlexiMine-homepage. Ben-Gurion University, Mathematics and Computer Science. Tel-Aviv, Israel, URL: www.cs.bgu.ac.il/kdd.Google Scholar
- Domshlak, C., Gershkovich, D., Gudes, E., Liusternik, N., Meisels, A., Rosen, T., and Shimony, S. E. 1998b. FlexiMine---A flexible platform for KDD research and application construction. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) (New York City, New York).Google Scholar
- Dyer, M. E. and Frieze, A. M. 1988. On the complexity of computing the volume of a polyhedron. SIAM J. Comput. 17, 5, 967--974. Google ScholarDigital Library
- Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. 1996. Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press. Google ScholarDigital Library
- Fu, Y. and Han, J. 1995. Meta-rule-guided mining of association rules in relational databases. In DOOD95 Workshop on Integration of Knowledge Discovery with Deductive and Object Oriented Databases (Singapore). 1--8.Google Scholar
- Garey, M. and Johnson, D. 1979. Computers and Intractability, A Guide to the Theory of NP-Completeness. Freeman. Google ScholarDigital Library
- Gottlob, G., Leone, N., and Scarcello, F. 2002. Hypertree decompositions and tractable queries. J. Comput. Syst. Sci. 64, 3, 579--627.Google ScholarDigital Library
- Gottlob, G., Leone, N., and Scarcello, F. 2001. The complexity of acyclic conjunctive queries. J. ACM 48, 3. Google ScholarDigital Library
- Johnson, D. S. 1990. A Catalog of Complexity Classes, Chap. 2. In Handbook of Theoretical Computer Science, J. van Leenwen, Ed. Elsevier and MIT Press. 69--161. Google ScholarDigital Library
- Kero, B., Russell, L., Tsur, S., and Shen, W. M. 1995. An overview of data mining technologies. In Workshop on Integration of Knowledge Discovery with Deductive and Object Oriented Databases (DOOO95). (Singapore).Google Scholar
- Leng, B. and Shen, W. 1996. A metapattern-based automated discovery loop for integrated data mining---Unsupervised learning of relational patterns. IEEE Trans. Knowl. Data Eng. 8, 6, 898--910. Google ScholarDigital Library
- Mitbander, B. G., Ong, K., Shen, W., and Zaniolo, C. 1996. Metaqueries for data mining, Chap. 15. In Advances in Knowledge Discovery and Data Mining, V. Fayyad, G. Piatetsky-Shapiro P. Smyth, and R. Uthurusamy, Eds., 375--398. Google ScholarDigital Library
- Ruzzo, W. L. 1981. On uniform circuit complexity. J. Comput. Syst. Sci. 22, 3, 365--383.Google ScholarCross Ref
- Shen, W. M. 1992. Discovering regularities from knowledge bases. Int. Syst. 7, 7, 623--636.Google Scholar
- Simon, J. 1975. On some central problems in computational complexity. Ph.D. dissertation. Dept. of Computer Science, Cornell University, Ithaca, NY. Google ScholarDigital Library
- Stockmeyer, L. J. 1976. The polynomial-time hierarchy. Theoret. Comput. Sci. 3, 1, 1--22.Google ScholarCross Ref
- Torán, J. 1988. An oracle characterization of the counting hierarchy. In Proceedings of the 3rd conference on Structure in Complexity Theory (Washington D.C.). 213--223.Google ScholarCross Ref
- Ullman, J. D. 1988. Principle of database and knowledge-base systems. Principle of computer science series. Computer Science Press, Inc. Google ScholarDigital Library
- Valiant, L. G. 1979a. The complexity of computing the permanent. Theoret. Comput. Sci. 8, 189--201.Google ScholarCross Ref
- Valiant, L. G. 1979b. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3, 410--421.Google ScholarDigital Library
- van Leeuwen (Ed.), J. 1990. Handbook of Theoretical Computer Science. Elsevier and MIT Press. Google ScholarDigital Library
- Vardi, M. Y. 1982. The complexity of relational query languages. In Proceedings of the 14th ACM SIGACT Symposium on Theory of Computing (San Francisco, Calif.). ACM, New York, 137--146. Google ScholarDigital Library
- Wagner, K. 1986. The complexity of combinatorial problems with succinct input representation. Acta Inf. 23, 325--356. Google ScholarDigital Library
Index Terms
- Computational properties of metaquerying problems
Recommendations
Computational properties of metaquerying problems
PODS '00: Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsMetaquerying is a datamining technology by which hidden dependencies among several database relations can be discovered. This tool has already been successfully applied to several real-world applications. Recent papers provide only very preliminary ...
Constructing NP-intermediate problems by blowing holes with parameters of various properties
The search for natural NP-intermediate problems is one of the holy grails within computational complexity. Ladner's original diagonalization technique for generating NP-intermediate problems, blowing holes, has a serious shortcoming: it creates problems ...
Strong computational lower bounds via parameterized complexity
We develop new techniques for deriving strong computational lower bounds for a class of well-known NP-hard problems. This class includes weighted satisfiability, dominating set, hitting set, set cover, clique, and independent set. For example, although ...
Comments