On the hardness of learning queries from tree structured data

Liu, Xianmin; Li, Jianzhong

doi:10.1007/s10878-013-9609-9

On the hardness of learning queries from tree structured data

Published: 10 April 2013

Volume 29, pages 670–684, (2015)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Xianmin Liu¹ &
Jianzhong Li¹

217 Accesses
Explore all metrics

Abstract

The problem of learning queries from tree structured data is studied by this paper. A tree structured data is modeled as a node-labeled tree \(T\), and applying a query \(q\) on \(T\) will return a set \(q(T)\) which is a subset of nodes in \(T\). For a tree-node pair \((T,t)\) where \(t\) is a node in \(T\), \(q\) is called to accept the pair if \(t\in {q(T)}\), and reject the pair if \(t\notin {q(T)}\). For some query class \(\mathcal{L }\), given tree-node pair sets \(E_p\) and \(E_n\), the tree query learning problem is to find a query \(q\in \mathcal{L }\) such that (1) \(q\) rejects all pairs in \(E_n\), and (2) the size of pairs in \(E_p\) accepted by \(q\) is maximized. On four different query classes \(\mathcal Q ^{\tiny /}\), \(\mathcal Q ^{\tiny /,*}\), \(\mathcal Q ^{\tiny /,//}\) and \(\mathcal Q ^{\tiny /,[]}\), this paper studies the hardness of the corresponding tree query learning problems. For \(\mathcal Q ^{\tiny /}\), a PTime algorithm is given. For \(\mathcal Q ^{\tiny /,*}\) and \(\mathcal Q ^{\tiny /,//}\), the NP-complete results are shown. For \(\mathcal Q ^{\tiny /,[]}\), the problem is shown to be NP-hard by considering two constrained fragments of \(\mathcal Q ^{\tiny /,[]}\). Also, for \(\mathcal Q ^{\tiny /,*}\), \(\mathcal Q ^{\tiny /,[]}\) and \(\mathcal Q ^{\tiny /,//}\), it is shown that there are no \(n^{1-\epsilon }\)-approximation algorithms for any \(\epsilon >0\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal classification trees

Article 03 April 2017

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Article 15 February 2022

The p-Median Problem

References

Abiteboul S, Buneman P, Suciu D (2000) Data on the web: from relations to semistructured data and xml. Morgan Kaufmann, San Francisco
Google Scholar
Amer-Yahia S, Cho S, Lakshmanan LVS, Srivastava D (2002) Tree pattern query minimization. VLDB J 11(4):315–331
Google Scholar
Angluin D (1980) Inductive inference of formal languages from positive data. Inf Control 45(2):117–135
Article MATH MathSciNet Google Scholar
Angluin D (1987) Learning regular sets from queries and counterexamples. Inf Comput 75:87–106
Article MATH MathSciNet Google Scholar
Angluin D (1990) Negative results for equivalence queries. Mach Learn 5(2):121–150
Google Scholar
Bex GJ, Neven F, Schwentick T, Vansummeren S (2010) Inference of concise regular expressions and dtds. ACM Trans Database Syst (TODS) 35(2):11:1–11:47
Article Google Scholar
Boag S, Chamberlin D, Fernandez M, Florescu D, Robie J, Simeon J, Stefanescu M (2002) Xquery 1.0: an xml query language, http://www.w3.org/TR/xquery
Carme J, Ceresna M, Goebel M (2006) Query-based learning of xpath expressions. In: ICGI, pp 342–343
Carme J, Gilleron R, Lemay A, Niehren J (2007) Interactive learning of node selecting tree transducer. Mach Learn 66(1):33–67
Article Google Scholar
Deutch A, Fernandez M, Florescu D, Levy A, Suciu D (1999) A query language for xml. In: Proceedings of WWW
Garey MR, Johnson DS (1990) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New York
Gold EM (1967) Language identification in the limit. Inf Control 10(5):447–474
Google Scholar
Gonzalez G, Tari L, Gitter A, Leaman R, Nikkila S, Wendt R, Zeigler A, Baral C (2007) Integrating knowledge from biomedical literature: Normalization and evidence statements for interactions. In: Proceedings of the second bioCreative challenge evaluation workshop, pp 227–236
Higuera Cdl (1997) Characteristic sets for polynomial grammatical inference. Machine Learn 27(2):125–138
Google Scholar
Jagadish HV, Milo T, Srivastava D, Vista D (1999) Querying network directories. In: SIGMOD, pp 133–144
Jiang T, Lin G, Ma B, Zhang K (2002) A general edit distance between rna structures. J Comput Biol 9:371–388
Article Google Scholar
Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge
Google Scholar
Lemay A, Niehren J, Gilleron R (2006) Learning n-ary node selecting tree transducers from completely annotated examples. In: ICGI, pp 253–267
Miyano S, Shinohara A, Shinohara T (2000) Polynomial-time learning of elementary formal systems. New Gen Comput 18(3):217–242
Article Google Scholar
Sarma AD, Parameswaran A, Garcia-Molina H, Widom J (2010) Synthesizing view definitions from data. In: ICDT
Staworko S, Wieczorek P (2012) Learning twig and path queries. In: ICDT
Weis M, Naumann F (2005) Dogmatix tracks down duplicates in xml. In: SIGMOD
Zuckerman D (2006) Linear degree extractors and the inapproximability of max clique and chromatic number. In: STOC

Download references

Acknowledgments

The work in this paper was partially supported by the National Basic Research (973) Program of China under Grant No. 2012CB316202, the National Natural Science Foundation of China under Grant No. 61003046 and No. 6111113089. We would like to thank Dongjing Miao of Harbin Institute of Technology for his valuable discussions.

Author information

Authors and Affiliations

Harbin Institute of Technology, Harbin, Heilongjiang, China
Xianmin Liu & Jianzhong Li

Authors

Xianmin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianmin Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Li, J. On the hardness of learning queries from tree structured data. J Comb Optim 29, 670–684 (2015). https://doi.org/10.1007/s10878-013-9609-9

Download citation

Published: 10 April 2013
Issue Date: April 2015
DOI: https://doi.org/10.1007/s10878-013-9609-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the hardness of learning queries from tree structured data

Abstract

Access this article

Similar content being viewed by others

Optimal classification trees

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

The p-Median Problem

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the hardness of learning queries from tree structured data

Abstract

Access this article

Similar content being viewed by others

Optimal classification trees

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

The p-Median Problem

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation