Skip to main content

Part of the book series: Informatik aktuell ((INFORMAT))

  • 198 Accesses

Zusammenfassung

Anfragesprachen für XML, wie z.B. XPATH oder XML-QL, unterstützen Boolesches Retrieval; Anfrageergebnisse sind dabei ungeordnete Mengen von XML-Elementen, die die regulären Suchmuster einer Anfrage erfüllen. Dieses Suchparadigma ist für stark schematisierte, “geschlossene“ XML-Dokumentkollektionen, z.B. elektronische Kataloge, geeignet. Für die Suche nach Informationen im World Wide Web oder in “offenen“ Umgebungen, z.B. Intranets großer Unternehmen, ist jedoch Ranked Retrieval vorzuziehen; Anfrageergebnisse sind dabei Ranglisten von XML- Elementen, die nach absteigender Relevanz sortiert sind. Web-Suchmaschinen, die auf Information-Retrieval-Konzepten basieren, sind andererseits nicht in der Lage, die zusätzlichen Informationen, die sich aus der Struktur von XML-Dokumenten und der semantischen Annotation durch Elementnamen ergeben, effektiv auszunutzen. Im vorliegenden Beitrag werden Konzepte vorgestellt, die die Suchmöglichkeiten von XML-Anfragesprachen mit Ranked Retrieval verbinden. Insbesondere werden Möglichkeiten diskutiert, wie das Suchen auf XML-Daten mit Hilfe von Ontologien und speziellen Indexstrukturen in seiner Effektivität und Effizienz verbessert werden kann. Die vorgestellten Konzepte werden in der laufenden Implementierung der Anfragesprache XXL verwendet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Literatur

  1. S. Abiteboul, P. Buneman, D. Suciu: Data on the Web—From Relations to Semistructured Data and XML. San Francisco: Morgan Kaufmann Publishers, 2000.

    Google Scholar 

  2. S. Abiteboul, D. Quass, J. McHugh, J. Widom, J.L. Wiener: The Lorel Query Language for Semistructured Data. International Journal of Digital Libraries 1(1): 68–88 (1997).

    Google Scholar 

  3. K. Böhm, K. Aberer, E.J. Neuhold, X. Yang: Structured Document Storage and Refined Declarative and Navigational Access Mechanisms in HyperStorM, VLDB Journal 6(4), 1997.

    Google Scholar 

  4. E. Bertino, G. Guerrini, I. Merlo, M. Mesiti: An Approach to Classify Semi-Structured Objects. 13th European Conference on Object-Oriented Programming (ECOOP), Lisbon, Portugal, 1999.

    Google Scholar 

  5. R. Braumandl, M. Keidel, A. Kemper, D. Kossmann, A. Kreutz, S. Pröltz, S. Seltzsam, K. Stocker: ObjectGlobe: Ubiquitous Query Processing on the Internet, International Workshop on Technologies for E-Services, Cairo, 2000.

    Google Scholar 

  6. S. Brin, L. Page: The Anatomy of a Large Scale Hypertextual Web Search Engine, 7th WWW Conference, 1998.

    Google Scholar 

  7. [BrOO] BrightPlanet.com: The Deep Web: Surfacing Hidden Value, White Paper, 2000, http://www.completeplanet.com/Tutorials/DeepWeb/index.asp

  8. R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval, Addison Wesley, 1999.

    Google Scholar 

  9. S. Ceri, S. Comai, E. Damiani, P. Fraternali, S. Paraboschi, L. Tanca: XML-GL: A Graphical Language for Querying and Restructuring XML Documents. WWW8/Computer Networks 31(11-16): 1171–1187 (1999).

    Article  Google Scholar 

  10. S. Chakrabarti, B. Dom, R. Agrawal, P. Raghavan: Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases into Hierarchical Topic Taxonomies, The VLDB Journal Vol. 7, No. 3, 1998.

    Google Scholar 

  11. H. Chen, S. Dumais: Bringing Order to the Web: Automatically Categorizing Search Results. CHI 2000, Human Factors in Computing Systems, 2000, 145–152.

    Google Scholar 

  12. W.W. Cohen: Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity, ACM SIGMOD Conference, 1998.

    Google Scholar 

  13. W. W. Cohen: Recognizing Structure in Web Pages using Similarity Queries. 16. Nat. Conf. on Artif. Intelligence (AAAI) / 11th Conf. on Innovative Appl. Of Artif. Intelligence (IAAI), pp. 59–66, 1999.

    Google Scholar 

  14. D.D. Chamberlin, J. Robie, D. Florescu: Quilt: An XML Query Language for Heterogeneous Data Sources, 3rd International Workshop on the Web and Databases, 2000.

    Google Scholar 

  15. M. Cutler, Y. Shih, W. Meng: Using the Structure of HTML Documents to Improve Retrieval, USENIX Symposium on Internet Technologies and Systems, Monterey, California, 1997.

    Google Scholar 

  16. S. Chakrabarti, M. van den Berg, B. Dom: Focused Crawling: A New Approach to Topic-specific Web Resource Discovery, WWW Conference, Toronto, 1999.

    Google Scholar 

  17. S. Dumais, H. Chen: Hierarchical Classification of Web Content, ACM SIGIR Conference, 2000.

    Google Scholar 

  18. A. Deutsch, M. F. Fernandez, D. Florescu, A. Y. Levy, D. Suciu: A Query Language for XML. WWW8/Computer Networks 31(11-16): 1155–1169 (1999).

    Article  Google Scholar 

  19. N. Fuhr, K. Großjohann: XIRQL: An Extension of XQL for Information Retrieval, ACM SIGIR Workshop on XML and Information Retrieval, Athens, 2000.

    Google Scholar 

  20. D. Florescu, D. Kossmann, I. Manolescu: Integrating Keyword Search into XML Query Processing, WWW Conference, 2000.

    Google Scholar 

  21. T. Fiebig, G. Moerkotte: Evaluating Queries on Structure with Extended Access Support Relations. 3rd International Workshop on Web and Databases (WebDB), Dallas, USA, 2000.

    Google Scholar 

  22. N. Fuhr, T. Rölleke: HySpirit—a Probabilistic Inference Engine for Hypermedia Retrieval in Large Databases, 6th International Conference on Extending Database Technology (EDBT), Valencia, Spain, 1998.

    Google Scholar 

  23. N. Gupta, J.R. Haritsa, M. Ramanath: Distributed Query Processing on the Web, International Conference on Data Engineering (ICDE), 2000.

    Google Scholar 

  24. R. Goldman, J. Widom: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases, Proc. of the Very Large Data Base (VLDB) Conference, 1997.

    Google Scholar 

  25. M.A. Hearst (Editor), Tends and Controversies: Support Vector Machines, IEEE Intelligent Systems Vol. 13, No. 4, 1998.

    Google Scholar 

  26. Y. Hayashi, J. Tomita, G. Kikui: Searching Text-rich XML Documents with Relevance Ranking. ACM SIGIR 2000 Workshop on XML and Information Retrieval, Athens, Greece, 2000.

    Google Scholar 

  27. A. Heuer, G. Weber: SWING: Eine Suchmaschine mit Datenbankanschluß, Gl Workshop Internet-Datenbanken, Berlin, 2000.

    Google Scholar 

  28. Z.G. Ives, A.L. Levy, D.S. Weld, D. Florescu, M. Friedman: Adaptive Query Processing for Internet Applications, IEEE Data Engineering Bulletin Vol. 23, No. 2, 2000.

    Google Scholar 

  29. J.M. Kleinberg: Authoritative Sources in a Hyperlinked Environment, Journal of the ACM Vol. 46, No. 5, 1999.

    Google Scholar 

  30. A. Kemper, G. Moerkotte: Physical Object Management. Modern Database Systems 1995: 175-202, in: Won Kim (Ed.): Modern Database Systems: The Object Model, Interoperability, and Beyond. ACM Press and Addison-Wesley, 1995.

    Google Scholar 

  31. D. Kossmann (Editor), Special Issue on XML, IEEE Data Engineering Bulletin Vol. 22, No. 3, 1999.

    Google Scholar 

  32. S.R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal: The Web as a Graph, ACM Symposium on Principles of Database Systems (PODS), 2000.

    Google Scholar 

  33. D.D. Lewis: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval, European Conference on Machine Learning (ECML), 1998.

    Google Scholar 

  34. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A Database Management System for Semistructured Data. SIGMOD Record, 26(3): 54–66 (1997).

    Article  Google Scholar 

  35. S.-H. Myaeng, D.-H. Jang, M.-S. Kim, Z.-C. Zhoo: A Flexible Model for Retrieval of SGML Documents, ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998.

    Google Scholar 

  36. P. Mitra, G. Wiederhold, M.L. Kersten: Articulation of Ontology Interdependencies Using a Graph-Oriented Approach, Proceedings of the 7th International Conference on Extending Database Technology (EDBT), Constance, Germany, Springer, 2000.

    Google Scholar 

  37. J. Naughton, D. DeWitt, D. Maier, et al.: The Niagara Internet Query System. http://www.cs.wisc.edu/niagara/Publications.html

  38. Oracle 8i interMedia: Platform Service for Internet Media and Document Content, http://technet.oracle.com/products/intermedia/

  39. Oracle 8i interMedia Text Reference Release 8.1.5.

    Google Scholar 

  40. Raghavan, P.: Information Retrieval Algorithms: A Survey, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1997.

    Google Scholar 

  41. S. Russell, P. Norvig: Artificial Intelligence—A Modern Approach, Prentice-Hall, 1995.

    Google Scholar 

  42. A. Sugiura, O. Etzioni: Query Routing for Web Search Engines: Architecture and Experiments, 9th WWW Conference, 2000.

    Google Scholar 

  43. J. Shanmugasundaram, G. He, K. Tufte, C. Zhang, D. DeWitt, J. Naughton: Relational Databases for Querying XML Documents: Limitations and Opportunities. Proc. of the Very Large Data Base (VLDB) Conference, 1999.

    Google Scholar 

  44. CD. Manning, H. Schuetze: Foundations of Statistical Natural Language Processing, MIT Press, 1999.

    Google Scholar 

  45. A. Theobald, G. Weikum: Adding Relevance to XML, Proceedings of the 3rd International Workshop on the Web and Databases, LNCS, Springer, 2000.

    Google Scholar 

  46. V. Vapnik: The Nature of Statistical Learning Theory. Springer, New York, 1999.

    Google Scholar 

  47. XML-QL: A Query Language for XML, User’s Guide, Version 0.6, http://www.research.att.com/~mff/xmlql/doc

  48. XML Path Language (XPath) Version 1.0. W3C Recommendation, 1999, http://www.w3.org/TR/xpath

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sizov, S., Theobald, A., Weikum, G. (2001). Ähnlichkeitssuche auf XML-Daten. In: Heuer, A., Leymann, F., Priebe, D. (eds) Datenbanksysteme in Büro, Technik und Wissenschaft. Informatik aktuell. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56687-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-56687-5_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41707-1

  • Online ISBN: 978-3-642-56687-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics