On the hardness of learning queries from tree structured data

Liu, Xianmin; Li, Jianzhong

doi:10.1007/s10878-013-9609-9

On the hardness of learning queries from tree structured data

Published: 10 April 2013

Volume 29, pages 670–684, (2015)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Xianmin Liu¹ &
Jianzhong Li¹

226 Accesses
Explore all metrics

Abstract

The problem of learning queries from tree structured data is studied by this paper. A tree structured data is modeled as a node-labeled tree $T$, and applying a query $q$ on $T$ will return a set $q(T)$ which is a subset of nodes in $T$. For a tree-node pair $(T,t)$ where $t$ is a node in $T$, $q$ is called to accept the pair if $t\in {q(T)}$, and reject the pair if $t\notin {q(T)}$. For some query class $\mathcal{L }$, given tree-node pair sets $E_p$ and $E_n$, the tree query learning problem is to find a query $q\in \mathcal{L }$ such that (1) $q$ rejects all pairs in $E_n$, and (2) the size of pairs in $E_p$ accepted by $q$ is maximized. On four different query classes $\mathcal Q ^{\tiny /}$, $\mathcal Q ^{\tiny /,*}$, $\mathcal Q ^{\tiny /,//}$ and $\mathcal Q ^{\tiny /,[]}$, this paper studies the hardness of the corresponding tree query learning problems. For $\mathcal Q ^{\tiny /}$, a PTime algorithm is given. For $\mathcal Q ^{\tiny /,*}$ and $\mathcal Q ^{\tiny /,//}$, the NP-complete results are shown. For $\mathcal Q ^{\tiny /,[]}$, the problem is shown to be NP-hard by considering two constrained fragments of $\mathcal Q ^{\tiny /,[]}$. Also, for $\mathcal Q ^{\tiny /,*}$, $\mathcal Q ^{\tiny /,[]}$ and $\mathcal Q ^{\tiny /,//}$, it is shown that there are no $n^{1-\epsilon }$-approximation algorithms for any $\epsilon >0$.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Tree Search Problem with Non-uniform Costs

The Complexity of Tree Partitioning

Article 28 March 2020

Exact Learning of Multitrees and Almost-Trees Using Path Queries

References

Abiteboul S, Buneman P, Suciu D (2000) Data on the web: from relations to semistructured data and xml. Morgan Kaufmann, San Francisco
Google Scholar
Amer-Yahia S, Cho S, Lakshmanan LVS, Srivastava D (2002) Tree pattern query minimization. VLDB J 11(4):315–331
Google Scholar
Angluin D (1980) Inductive inference of formal languages from positive data. Inf Control 45(2):117–135
Article MATH MathSciNet Google Scholar
Angluin D (1987) Learning regular sets from queries and counterexamples. Inf Comput 75:87–106
Article MATH MathSciNet Google Scholar
Angluin D (1990) Negative results for equivalence queries. Mach Learn 5(2):121–150
Google Scholar
Bex GJ, Neven F, Schwentick T, Vansummeren S (2010) Inference of concise regular expressions and dtds. ACM Trans Database Syst (TODS) 35(2):11:1–11:47
Article Google Scholar
Boag S, Chamberlin D, Fernandez M, Florescu D, Robie J, Simeon J, Stefanescu M (2002) Xquery 1.0: an xml query language, http://www.w3.org/TR/xquery
Carme J, Ceresna M, Goebel M (2006) Query-based learning of xpath expressions. In: ICGI, pp 342–343
Carme J, Gilleron R, Lemay A, Niehren J (2007) Interactive learning of node selecting tree transducer. Mach Learn 66(1):33–67
Article Google Scholar
Deutch A, Fernandez M, Florescu D, Levy A, Suciu D (1999) A query language for xml. In: Proceedings of WWW
Garey MR, Johnson DS (1990) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New York
Gold EM (1967) Language identification in the limit. Inf Control 10(5):447–474
Google Scholar
Gonzalez G, Tari L, Gitter A, Leaman R, Nikkila S, Wendt R, Zeigler A, Baral C (2007) Integrating knowledge from biomedical literature: Normalization and evidence statements for interactions. In: Proceedings of the second bioCreative challenge evaluation workshop, pp 227–236
Higuera Cdl (1997) Characteristic sets for polynomial grammatical inference. Machine Learn 27(2):125–138
Google Scholar
Jagadish HV, Milo T, Srivastava D, Vista D (1999) Querying network directories. In: SIGMOD, pp 133–144
Jiang T, Lin G, Ma B, Zhang K (2002) A general edit distance between rna structures. J Comput Biol 9:371–388
Article Google Scholar
Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge
Google Scholar
Lemay A, Niehren J, Gilleron R (2006) Learning n-ary node selecting tree transducers from completely annotated examples. In: ICGI, pp 253–267
Miyano S, Shinohara A, Shinohara T (2000) Polynomial-time learning of elementary formal systems. New Gen Comput 18(3):217–242
Article Google Scholar
Sarma AD, Parameswaran A, Garcia-Molina H, Widom J (2010) Synthesizing view definitions from data. In: ICDT
Staworko S, Wieczorek P (2012) Learning twig and path queries. In: ICDT
Weis M, Naumann F (2005) Dogmatix tracks down duplicates in xml. In: SIGMOD
Zuckerman D (2006) Linear degree extractors and the inapproximability of max clique and chromatic number. In: STOC

Download references

Acknowledgments

The work in this paper was partially supported by the National Basic Research (973) Program of China under Grant No. 2012CB316202, the National Natural Science Foundation of China under Grant No. 61003046 and No. 6111113089. We would like to thank Dongjing Miao of Harbin Institute of Technology for his valuable discussions.

Author information

Authors and Affiliations

Harbin Institute of Technology, Harbin, Heilongjiang, China
Xianmin Liu & Jianzhong Li

Authors

Xianmin Liu
View author publications
You can also search for this author inPubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xianmin Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Li, J. On the hardness of learning queries from tree structured data. J Comb Optim 29, 670–684 (2015). https://doi.org/10.1007/s10878-013-9609-9

Download citation

Published: 10 April 2013
Issue Date: April 2015
DOI: https://doi.org/10.1007/s10878-013-9609-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the hardness of learning queries from tree structured data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the Tree Search Problem with Non-uniform Costs

The Complexity of Tree Partitioning

Exact Learning of Multitrees and Almost-Trees Using Path Queries

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now