Skip to main content

A Document Model Based on Relevance Modeling Techniques for Semi-structured Information Warehouses

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3180))

Abstract

During the last decade, data warehouse and OLAP techniques have helped companies to gather, organize and analyze the structured data they produce. Simultaneously, digital libraries have applied Information Retrieval mechanisms to query their repositories of unstructured text-rich documents. In this paper we explain how XML allows for the convergence of these two approaches, making possible the development of warehouses for semi-structured information. So far, the proposals of extending data warehouse technology to manage semi-structured information have not been able to exploit the textual contents, mainly because they are not based on a proper document model. In our opinion, such a model must integrate IR and OLAP techniques. In this paper we present a set of requirements for semi-structured information warehouses, as well as a document model to support their construction. In this model, new Relevance Modeling mechanisms are used for ranking the facts described in the text of the documents according to their relevance to an IR – OLAP query. Preliminary evaluations show the usefulness of the document model.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kimball, R.: The Data Warehouse toolkit. John Wiley & Sons, Chichester (2002)

    Google Scholar 

  2. Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP to user-analysts: An IT mandate. Technical Report, E.F. Codd & Associates (1993)

    Google Scholar 

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  4. World Wide Web Consortium, http://www.w3.org

  5. Xyleme, L.: A dynamic warehouse for XML data of the Web. IEEE Data Engineering Bulletin 24(2), 40–47 (2001)

    Google Scholar 

  6. Pedersen, D., Riis, K., Pedersen, T.B.: XML-Extended OLAP Querying. In: Proc of the 14th International Conference on Scientific and Statistical Database Management, July 24-26, pp. 195–206 (2002)

    Google Scholar 

  7. Navarro, G., Baeza-Yates, R.: Proximal Nodes: A Model to Query Document Databases by Contents and Structure. ACM Trans. on Information Systems (1997)

    Google Scholar 

  8. Aramburu, M.J., Berlanga, R.: A Temporal Object-Oriented Model for Digital Librares of Documents. Concurrency: Practice and Experience 13(11), John Wiley (2001)

    Google Scholar 

  9. Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proc. of ACM SIGIR 1998 conference, pp. 275–281 (1998)

    Google Scholar 

  10. Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proc. of ACM SIGIR 1998 conference, pp. 267–275 (2001)

    Google Scholar 

  11. Llidó, D.M., Berlanga, R., Aramburu, M.J.: Extracting Temporal References to Assign Document Event-Time Periods. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 62–71. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  12. Pons, A., Berlanga, R., Ruíz-Shulcloper, J.: Building a Hierarchy of Events and Topics for Newspaper Digital Libraries. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 588–596. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: Supporting Imprecision in Multidimensional Databases Using Granularities. In: Proc. of the Eleventh International Conference on Scientific and Statistical Database Management, pp. 90–101 (1999)

    Google Scholar 

  14. Rundensteiner, E., Bic, L.: Evaluating Aggregates in Possibilistic Relational Databases. DKE 7(3), 239–267 (1992)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pérez, J.M., Berlanga, R., Aramburu, M.J. (2004). A Document Model Based on Relevance Modeling Techniques for Semi-structured Information Warehouses. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30075-5_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22936-0

  • Online ISBN: 978-3-540-30075-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics