Abstract
Highly heterogeneous XML collections are thematic collections exploiting different structures: the parent-child or ancestor-descendant relationships are not preserved and vocabulary discrepancies in the element names can occur. In this setting current approaches return answers with low precision. By means of similarity measures and semantic inverted indices we present an approach for improving the precision of query answers without compromising performance.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Amer-Yahia, S., et al.: Tree Pattern Relaxation. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 496–513. Springer, Heidelberg (2002)
Amer-Yahia, S., et al.: Structure and Content Scoring for XML. In: VLDB (2005)
Buneman, P., et al.: Adding Structure to Unstructured Data. In: ICDT (1997)
Damiani, E., Tanca, L.: Blind Queries to XML Data. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 345–356. Springer, Heidelberg (2000)
Grust, T.: Accelerating XPath Location Steps. In: SIGMOD, pp. 109–120 (2002)
Kanza, Y., Sagiv, Y.: Flexible Queries Over Semistructured Data. In: PODS (2001)
Kilpeläinen, P.: Tree Matching Problems with Applications to Structured Text Databases. Ph.D thesis, University of Helsinki (1992)
Luk, R.W., et al.: A Survey in Indexing and Searching XML Documents. JASIS 53, 415–438 (2002)
Marian, A., et al.: Adaptive Processing of Top-k Queries in XML. In: ICDE (2005)
Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: WebDB, pp. 61–66 (2002)
Sanz, I., et al.: Approximate Subtree Identification in Heterogeneous XML Documents Collections. In: Bressan, S., Ceri, S., Hunt, E., Ives, Z.G., Bellahsène, Z., Rys, M., Unland, R. (eds.) XSym 2005. LNCS, vol. 3671, pp. 192–206. Springer, Heidelberg (2005)
Sanz, I., et al.: Highly Heterogeneous XML Collections: How to find good results? TR University of Genova (2006)
Blanken, H.M., Grabs, T., Schek, H.-J., Schenkel, R., Weikum, G. (eds.): Intelligent Search on XML Data. LNCS, vol. 2818, pp. 119–131. Springer, Heidelberg (2003)
Schlieder, T., Naumann, F.: Approximate Tree Embedding for Querying XML Data. In: ACM SIGIR Workshop on XML and IR (2000)
Schlieder, T.: Schema-Driven Evaluation of Approximate Tree-Pattern Queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 514–532. Springer, Heidelberg (2002)
Shasha, D., et al.: ATreeGrep: Approximate Searching in Unordered Trees. In: 14th Conf. on Scientific and Statistical Database Management, pp. 89–98 (2002)
Theobald, A., Weikum, G.: The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 477–495. Springer, Heidelberg (2002)
Wagner, R.A., Fischer, M.J.: The String-to-string Correction Problem. J. of the ACM 21, 168–173 (1974)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sanz, I., Mesiti, M., Guerrini, G., Llavori, R.B. (2006). Highly Heterogeneous XML Collections: How to Retrieve Precise Results?. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds) Flexible Query Answering Systems. FQAS 2006. Lecture Notes in Computer Science(), vol 4027. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11766254_20
Download citation
DOI: https://doi.org/10.1007/11766254_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34638-8
Online ISBN: 978-3-540-34639-5
eBook Packages: Computer ScienceComputer Science (R0)