Skip to main content

PowerDB-XML: A Platform for Data–Centric and Document–Centric XML Processing

  • Conference paper
Database and XML Technologies (XSym 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2824))

Included in the following conference series:

Abstract

Relational database systems are well-suited as a platform for data-centric XML processing. Data-centric applications process regularly structured XML documents using precise predicates. However, these approaches come too short when XML applications also require document-centric processing, i.e., processing of less rigidly structured documents using vague predicates in the sense of information retrieval. The PowerDB-XML project at ETH Zurich aims to address this drawback and to cover both these types of XML applications on a single platform. In this paper, we investigate the requirements of document-centric XML processing and propose to refine state-of-the-art retrieval models for unstructured flat document such that they meet the flexibility of the XML format. To do so, we rely on so-called query-specific statistics computed dynamically at query runtime to reflect the query scope. Moreover, we show that document-centric XML processing is efficiently feasible using relational database systems for storage management and standard SQL. This allows us to combine document-centric processing with data-centric XML-to-database mappings. Our XML engine named PowerDB-XML therefore supports the full range of XML applications on the same integrated platform.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web – From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  2. Bohannon, P., Freire, J., Haritsa, J.R., Ramanath, M., Roy, P., Siméon, J.: LegoDB: Customizing Relational Storage for XML Documents. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB 2002), Hongkong, China, pp. 1091–1094. Morgan Kaufmann, San Francisco (2002)

    Chapter  Google Scholar 

  3. Bohannon, P., Freire, J., Roy, P., Siméon, J.: From XML Schema to Relations: A Cost-based Approach to XML Storage. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, CA, USA. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  4. Carey, M.J., Kiernan, J., Shanmugasundaram, J., Shekita, E.J., Subramanian, S.N.: XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents. In: Abbadi, A.E., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.-Y. (eds.) Proceedings of 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 646–648. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  5. Deutsch, A., Fernandez, M.F., Suciu, D.: Storing Semistructured Data with STORED. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, pp. 431–442. ACM Press, New York (1999)

    Chapter  Google Scholar 

  6. Fernandez, M.F., Tan, W.C., Suciu, D.: SilkRoute: Trading between Relations and XML. WWW9 / Computer Networks 33(1-6), 723–745 (2000)

    Article  Google Scholar 

  7. Florescu, D., Kossmann, D.: Storing and Querying XML Data using an RDMBS. IEEE Data Engineering Bulletin 22(3), 27–34 (1999)

    Google Scholar 

  8. Florescu, D., Kossmann, D., Manolescu, I.: Integrating Keyword Search into XML Query Processing. In: Proceedings of the International WWW Conference. Elsevier, Amsterdam (May 2000)

    Google Scholar 

  9. Fox, E., Koll, M.: Practical Enhanced Boolean Retrieval: Experiments with the SMART and SIRE Systems. Information Processing and Management 24(3), 257–267 (1988)

    Article  Google Scholar 

  10. Frieder, O., Chowdhury, A., Grossman, D., McCabe, M.: On the Integration of Structured Data and Text: A Review of the SIRE Architecture. In: Proceedings of the 1st DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland. ERCIM, pp. 53–58 (2000)

    Google Scholar 

  11. Fuhr, N., Gövert, N., Kazai, G., Lalmas, M.: INEX: Initiative for the Evaluation of XML Retrieval. In: Baeza-Yates, R., Fuhr, N., Maarek, Y.S. (eds.) Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, Tampere, Finland, pp. 62–70. ACM Press, New York (2002)

    Google Scholar 

  12. Fuhr, N., Gövert, N., Kazai, G., Lalmas, M. (eds.)Proceedings of the First Workshop of the Initiative for the Evaluation of XML Retrieval (INEX). In: ERCIM DELOS, Schloss Dagstuhl, Germany, December 9–11 2002. ERCIM-03- W03 (2003)

    Google Scholar 

  13. Fuhr, N., Großjohann, K.: XIRQL: A Query Language for Information Retrieval in XML Documents. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of the 24th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, pp. 172–180. ACM Press, New York (2001)

    Chapter  Google Scholar 

  14. Goldman, R., McHugh, J., Widom, J.: From Semistructured Data to XML: Migrating the Lore Data Model and Query Language. In: ACM SIGMOD Workshop on The Web and Databases (WebDB 1999), Philadelphia, Pennsylvania, USA, June 3–4. INRIA, Informal Proceedings, pp. 25–30 (1999)

    Google Scholar 

  15. Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: Proceedings of 23rd International Conference on Very Large Data Bases, Athens, Greece, pp. 436–445. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  16. Grabs, T., Böhm, K., Schek, H.-J.: PowerDB-IR – Information Retrieval on Top of a Database Cluster. In: Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM 2001), Atlanta, GA, USA, pp. 411–418. ACM Press, New York (2001)

    Chapter  Google Scholar 

  17. Grabs, T., Schek, H.-J.: Generating Vector Spaces On-the-fly for Flexible XML Retrieval. In: Baeza-Yates, R., Fuhr, N., Maarek, Y.S. (eds.) Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, Tampere, Finland, pp. 4–13. ACM Press, New York (2002)

    Google Scholar 

  18. Grossman, D.A., Frieder, O., Holmes, D.O., Roberts, D.C.: Integrating Structured Data and Text: A Relational Approach. Journal of the American Society for information Science (JASIS) 48(2), 122–132 (1997)

    Article  Google Scholar 

  19. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: Halevy, A.Y., Ives, Z.G., Doan, A. (eds.) Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, pp. 16–27. ACM, New York (2003)

    Chapter  Google Scholar 

  20. Kaufmann, H., Schek, H.J.: Text Search Using Database Systems Revisited - Some Experiments -. In: Proceedings of the 13th British National Conference on Databases, pp. 18–20 (1995)

    Google Scholar 

  21. Rys, M.: Bringing the Internet to Your Database: Using SQLServer 2000 and XML to Build Loosely-Coupled Systems. In: Proceedings of the 17th International Conference on Data Engineering 2001, Heidelberg, Germany, pp. 465–472. IEEE Computer Society, Los Alamitos (2001)

    Chapter  Google Scholar 

  22. Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  23. Salton, G., Fox, E.A., Wu, H.: Extended Boolean Information Retrieval. Commun. ACM 26(12), 1022–1036 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  24. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  25. Shanmugasundaram, J., Kiernan, J., Shekita, E.J., Fan, C., Funderburk, J.: Querying XML Views of Relational Data. In: Apers, P.M.G., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.) Proceedings of 27th International Conference on Very Large Data Bases, Roma, Italy, pp. 261–270. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  26. Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Atkinson, M.P., Orlowska, M.E., Valduriez, P., Zdonik, S.B., Brodie, M.L. (eds.) Proceedings of 25th International Conference on Very Large Data Bases (VLDB 1999), pp. 302–314. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  27. Volz, M., Aberer, K., Böhm, K.: Applying a Flexible OODBMS-IRS-Coupling for Structured Document Handling. In: Su, S.Y.W. (ed.) Proceedings of the 12th International Conference on Data Engineering, New Orleans, Louisiana, USA, pp. 10–19. IEEE Computer Society, Los Alamitos (1996)

    Google Scholar 

  28. Clark, J., DeRose, S. (eds.): W3C – World Wide Web Consortium . XML Path Language (XPath) Version 1.0 (November 1999), http://www.w3.org/TR/xpath

  29. Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Siméon, J. (eds.): W3C – World Wide Web Consortium. XQuery 1.0: An XML Query Language (November 2002), http://www.w3.org/TR/xquery

  30. Bray, T., Paoli, J., Sperberg-McQueen, C.M. (eds.): W3C – World Wide Web Consortium. Extensible Markup Language (XML) 1.0 (February 1998), http://www.w3.org/TR/1998/REC-xml-19980210

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grabs, T., Schek, HJ. (2003). PowerDB-XML: A Platform for Data–Centric and Document–Centric XML Processing. In: Bellahsène, Z., Chaudhri, A.B., Rahm, E., Rys, M., Unland, R. (eds) Database and XML Technologies. XSym 2003. Lecture Notes in Computer Science, vol 2824. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39429-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39429-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20055-0

  • Online ISBN: 978-3-540-39429-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics