Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3818))

Included in the following conference series:

Abstract

The world of data has been developed from two main points of view: the structured relational data model and the unstructured text model. The two distinct cultures of databases and information retrieval now have a natural meeting place in theWeb with its semi-structured XML model. As web-style searching becomes an ubiquitous tool, the need for integrating these two viewpoints becomes even more important.

This tutorial will provide an overview of the different issues and approaches put forward by the Information Retrieval and the Database communities and survey the DB-IR integration efforts with a focus on techniques applicable to XML retrieval. A variety of application scenarios for DB-IR integration will be covered, including examples of current industrial tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G.: Automatic information organization and retrieval. McGraw-Hill, New York (1968)

    Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Harlow (1999)

    Google Scholar 

  3. Crestani, F., Lalmas, M., van Rijsbergen, C.J., Campbell, I.: Is this document relevant? …probably: A survey of probabilistic models in information retrieval. ACM Computing Surveys 30, 528–552 (1998)

    Article  Google Scholar 

  4. W3C: XQuery and XPath full-text requirements, W3C Working Draft (2003), http://www.w3.org/TR/xmlquery-full-text-requirements

  5. W3C: XQuery and XPath full-text use cases, W3C Working Draft (2003), http://www.w3.org/TR/xmlquery-full-text-use-cases

  6. Salminen, A., Tompa, F.W.: PAT expressions: An algebra for text search. Acta Linguistica Hungarica 41, 277–306 (1993)

    Google Scholar 

  7. Consens, M., Milo, T.: Algebras for querying text regions. In: Proceedings of the Symposium on Principles of Database Systems, San Jose, California, USA, pp. 11–22 (1995)

    Google Scholar 

  8. Clarke, C., Cormack, G., Burkowski, F.: An algebra for structured text search and a framework for its implementation. The Computer Journal 38, 43–56 (1995)

    Google Scholar 

  9. Navarro, G., Baeza-Yates, R.: Integrating content and structure in text retrieval. SIGMOD Record 25, 67–79 (1996)

    Article  Google Scholar 

  10. Navarro, G., Baeza-Yates, R.: Proximal nodes: A model to query document databases by contents and structure. ACM Transactions on Information Systems 15, 401–435 (1997)

    Article  Google Scholar 

  11. Lee, Y.K., Yoo, S.-J., Yoon, K., Berra, P.B.: Index structures for structured documents. In: Proceedings of the 1st ACM International Conference on Digital Libraries, pp. 91–99 (1996)

    Google Scholar 

  12. Navarro, G., Baeza-Yates, R.A.: Proximal nodes: A model to query document databases by content and structure. TOIS 15, 400–435 (1997)

    Article  Google Scholar 

  13. Goldman, R., Shivakumar, N., Venkatasubramanian, S., Garcia-Molina, H.: Proximity search in databases. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 26–37 (1998)

    Google Scholar 

  14. Florescu, D., Kossmann, D., Manolescu, I.: Integrating keyword search into XML query processing. In: Proceedings of International World Wide Web Conference (2000)

    Google Scholar 

  15. Kanza, Y., Sagiv, Y.: Flexible queries over semistructured data. In: Proceedings of the Symposium on Principles of Database Systems, pp. 40–51 (2001)

    Google Scholar 

  16. Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: A system for keyword-based search over relational databases. In: Proceedings of International Conference on Data Engineering. (2002)

    Google Scholar 

  17. Bhalotia, G., Hulgeri, A., Nakhey, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In: Proceedings of International Conference on Data Engineering (2002)

    Google Scholar 

  18. Hristidis, V., Papakonstantinou, Y.: DISCOVER: Keyword search in relational databases. In: Proceedings of the International Conference on Very Large Data Bases (2002)

    Google Scholar 

  19. Amer-Yahia, S., Cho, S., Srivastava, D.: Tree pattern relaxation. In: Proceedings of Conference on Extending Database Technology, pp. 496–513 (2002)

    Google Scholar 

  20. Amer-Yahia, S., Fernandez, M., Srivastava, D., Xu, Y.: Pix: exact and approximate phrase matching in xml. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, p. 664 (2003)

    Google Scholar 

  21. Kabra, N., Ramakrishnan, R., Ercegovac, V.: The QUIQ engine: A hybrid IR-DB system. In: Proceedings of the 19th International Conference on Data Engineering, p. 741 (2003)

    Google Scholar 

  22. Amer-Yahia, S., Koudas, N., Srivastava, D.: Approximate matching in xml. In: Proceedings of the 19th International Conference on Data Engineering, p. 803 (2003)

    Google Scholar 

  23. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on XML graphs. In: Proceedings of International Conference on Data Engineering (2003)

    Google Scholar 

  24. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked keyword search over XML documents. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2003)

    Google Scholar 

  25. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSearch: a semantic search engine for XML. In: Proceedings of the 29th International Conference on Very Large Data Bases (2003)

    Google Scholar 

  26. Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: Proceedings of the International Conference on Very Large Data Bases (2003)

    Google Scholar 

  27. Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: Flexible structure and full-text querying for XML. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 83–94 (2004)

    Google Scholar 

  28. Luk, R.: A survey of search engines for XML documents. In: SIGIR Workshop on XML and IR (2000)

    Google Scholar 

  29. Fuhr, N., Grobjohann, K.: XIRQL: An extension of XQL for information retrieval. In: ACM SIGIR Workshop on XML and Information Retrieval, pp. 11–17 (2000)

    Google Scholar 

  30. Theobald, A., Weikum, G.: Adding relevance to XML. In: Proceedings of International Workshop on the Web and Databases, pp. 35–40 (2000)

    Google Scholar 

  31. Fuhr, N., Grobjohann, K.: A query language for information retrieval in XML documents. In: Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 172–180 (2001)

    Google Scholar 

  32. Chinenyanga, T.T., Kushmerick, N.: Expressive and efficient ranked querying of XML data. In: Proceedings of International Workshop on the Web and Databases (2001)

    Google Scholar 

  33. Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Proceedings of Conference on Extending Database Technology, pp. 477–495 (2002)

    Google Scholar 

  34. Chinenyanga, T.T., Kushmerick, N.: An expressive and efficient language for XML information retrieval. Journal of the American Society for Information Science and Technology 53, 438–453 (2002)

    Article  Google Scholar 

  35. Grabs, T., Schek, H.-J.: Flexible information retrieval from XML with PowerDB-XML. In: Proceedings of the Third INEX Workshop (2003)

    Google Scholar 

  36. Mass, Y., Mandelbrod, M., Amitay, E., Carmel, D., Maarek, Y., Soffer, A.: JuruXML - an XML retrieval system at INEX 02. In: Proceedings of the First INEX Workshop (2002)

    Google Scholar 

  37. Fuhr, N., Grobjohann, K.: XIRQL: An XML query language based on information retrieval concepts. ACM Trans. Inf. Syst. 22, 313–356 (2004)

    Article  Google Scholar 

  38. Schenkel, R., Theobald, A., Weikum, G.: Semantic similarity search on semistructured data with the XXL search engine. Information Retrieval 8, 521–545 (2005)

    Article  Google Scholar 

  39. Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 436–445 (1997)

    Google Scholar 

  40. Nestorov, S., Ullman, J.D., Wiener, J.L., Chawathe, S.S.: Representative objects: Concise representations of semistructured, hierarchial data. In: Proceedings of the 13th International Conference on Data Engineering, pp. 79–90 (1997)

    Google Scholar 

  41. Milo, T., Suciu, D.: Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory, pp. 277–295 (1999)

    Google Scholar 

  42. Cooper, B., Sample, N., Franklin, M.J., Hjaltason, G.R., Shadmon, M.: A fast index for semistructured data. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 341–350 (2001)

    Google Scholar 

  43. Natsev, A., Chang, Y.-C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked ranked inputs. In: Proceedings of the International Conference on Very Large Data Bases (2001)

    Google Scholar 

  44. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the Symposium on Principles of Database Systems (2001)

    Google Scholar 

  45. Rizzolo, F., Mendelzon, A.O.: Indexing XML data with ToXin. In: Proceedings of 4th International Workshop on the Web and Databases, pp. 49–54 (2001)

    Google Scholar 

  46. Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 361–370 (2001)

    Google Scholar 

  47. Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 133–144 (2002)

    Google Scholar 

  48. Chung, C.W., Min, J.-K., Shim, K.: APEX: An adaptive path index for XML data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 121–132 (2002)

    Google Scholar 

  49. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings of the 18th International Conference on Data Engineering, pp. 129–140 (2002)

    Google Scholar 

  50. Kaushik, R., Bohannon, P., Naughton, J.F., Shenoy, P.: Updates for structure indexes. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 239–250 (2002)

    Google Scholar 

  51. Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural joins: A primitive for efficient XML query pattern matching. In: Proceedings of the 18th International Conference on Data Engineering, p. 141 (2002)

    Google Scholar 

  52. Chien, S.Y., Vagena, Z., Zhang, D., Tsotras, V.J., Zaniolo, C.: Efficient structural joins on indexed XML documents. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 263–274 (2002)

    Google Scholar 

  53. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: Optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 310–321 (2002)

    Google Scholar 

  54. Hristidis, V., Papakonstantinou, Y.: Algorithms and applications for answering ranked queeries using ranked views. In: Proceedings of the International Conference on Very Large Data Bases (2003)

    Google Scholar 

  55. Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. In: Proceedings of the International Conference on Very Large Data Bases (2003)

    Google Scholar 

  56. Bremer, J.M.: Next-Generation Information Retrieval: Integrating Document and Data Retrieval Based on XML. PhD thesis, Department of Computer Science, University of California at Davis (2003)

    Google Scholar 

  57. Bremer, J.M., Gertz, M.: An efficient XML node identification and indexing scheme. Technical Report CSE-2003-04, Department of Computer Science, University of California at Davis (2003)

    Google Scholar 

  58. Chen, Z., Jagadish, H.V., Lakshmanan, L.V.S., Paparizos, S.: From tree patterns to generalized tree patterns: On efficient evaluation of XQuery. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 237–248 (2003)

    Google Scholar 

  59. Al-Khalifa, S., Yu, C., Jagadish, H.V.: Querying structured text in an XML database. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2003)

    Google Scholar 

  60. Qun, C., Lim, A., Ong, K.W.: D(K)-index: An adaptive structural summary for graph-structured data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 134–144 (2003)

    Google Scholar 

  61. Ramanan, P.: Covering indexes for XML queries: Bisimulation - simulation = negation. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 165–176 (2003)

    Google Scholar 

  62. Zezula, P., Amato, G., Debole, F., Rabitti, F.: Tree signatures for XML querying and navigation. In: Bellahsène, Z., Chaudhri, A.B., Rahm, E., Rys, M., Unland, R. (eds.) XSym 2003. LNCS, vol. 2824, pp. 149–163. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  63. Wang, H., Park, S., Fan, W., Yu, P.S.: ViST: A dynamic index method for querying XML data by tree structures. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 110–121 (2003)

    Google Scholar 

  64. Jiang, H., Wang, W., Lu, H., Yu, J.X.: Holistic twig joins on indexed XML documents. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 273–284 (2003)

    Google Scholar 

  65. Jiang, H., Lu, H., Wang, W., Ooi, B.C.: XR-Tree: Indexing XML data for efficient structural joins. In: Proceedings of the 19th International Conference on Data Engineering, pp. 253–263 (2003)

    Google Scholar 

  66. Li, Q., Moon, B.: Partition based path join algorithms for XML data. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 160–170. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  67. Weigel, F., Meuss, H., Bry, F., Schulz, K.U.: Content-aware dataGuides: Interleaving IR and DB indexing techniques for efficient retrieval of textual XML data. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 378–393. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  68. Kaushik, R., Krishnamurthy, R., Naughton, J.F., Ramakrishnan, R.: On the integration of structure indexes and inverted lists. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 779–790 (2004)

    Google Scholar 

  69. Amato, G., Debole, F., Rabitti, F., Savino, P., Zezula, P.: A signature-based approach for efficient relationship search on XML data collections. In: Bellahsène, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds.) XSym 2004. LNCS, vol. 3186, pp. 82–96. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  70. Rao, P., Moon, B.: PRIX: Indexing and querying XML using Prüfer sequences. In: Proceedings of the 20th International Conference on Data Engineering, pp. 288–300 (2004)

    Google Scholar 

  71. Vagena, Z., Moro, M.M., Tsotras, V.J.: Efficient processing of XML containment queries using partition-based schemes. In: Proceedings of the 8th International Database Engineering and Applications Symposium, IDEAS 2004, pp. 161–170 (2004)

    Google Scholar 

  72. Wang, H., Meng, X.: On the sequencing of tree structures for XML indexing. In: Proceedings of the 21st International Conference on Data Engineering (2005)

    Google Scholar 

  73. Bremer, J.-M., Gertz, M.: Next-generation information retrieval. VLDB Journal (2006) (to appear)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Consens, M.P., Baeza-Yates, R. (2005). Database and Information Retrieval Techniques for XML. In: Grumbach, S., Sui, L., Vianu, V. (eds) Advances in Computer Science – ASIAN 2005. Data Management on the Web. ASIAN 2005. Lecture Notes in Computer Science, vol 3818. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596370_4

Download citation

  • DOI: https://doi.org/10.1007/11596370_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30767-9

  • Online ISBN: 978-3-540-32249-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics