Skip to main content

A Test Platform for the INEX Heterogeneous Track

  • Conference paper
Advances in XML Information Retrieval (INEX 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3493))

Abstract

This article presents our work within the INEX 2004 Heterogeneous Track. We focused on taming the structural diversity within the INEX heterogeneous bibliographic corpus.

We demonstrate how semantic models and associated inference techniques can be used to solve the problems raised by the structural diversity within a given XML corpus. The first step automatically extracts a set of concepts from each class of INEX heterogeneous documents. An unified set of concepts is then computed, which synthesizes the interesting concepts from the whole corpus. Individual corpora are connected to the unified set of concepts via conceptual mappings. This approach is implemented as an application of the KadoP platform for peer-to-peer warehousing of XML documents. While this work caters to the structural aspects of XML information retrieval, the extensibility of the KadoP system makes it an interesting test platform in which components developed by several INEX participants could be plugged, exploiting the opportunities of peer-to-peer data and service distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Benjelloun, O., Milo, T.: The ActiveXML project: an overview. Gemo research report no. 344 (2004)

    Google Scholar 

  2. Abiteboul, S., Cobéna, G., Nguyen, B., Poggi, A.: Construction and maintenance of a set of pages of interest (SPIN). In: Bases de Donnees Avancees, Informal proceedings only, Evry (2002)

    Google Scholar 

  3. Abiteboul, S., Manolescu, I., Preda, N.: Constructing and querying a peer-to-peer warehouse of XML resources. In: Proceedings of the Semantic Web and Databases Workshop (in collaboration with VLDB), Toronto, CA (2004)

    Google Scholar 

  4. Castano, S., De Antonellis, V., De capitani di Vimercati, S.: Global viewing of heterogeneous data sources. IEEE Transactions on Knowledge and Data Engineering 13(2), 277–297 (2001)

    Article  Google Scholar 

  5. Cluet, S., Veltri, P., Vodislav, D.: Views in a large scale XML repository. In: VLDB, pp. 271–280 (2001)

    Google Scholar 

  6. Delobel, C., Reynaud, C., Rousset, M.-C., Sirot, J.-P., Vodislav, D.: Semantic integration in Xyleme: a uniform tree-based approach. IEEE Data and Knowledge Engineering 44(3), 267–298 (2003)

    Article  Google Scholar 

  7. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  8. Giraldo, G.: Automatic ontology construction in mediator systems. Ph.D. thesis, University of Orsay, France (2005)

    Google Scholar 

  9. Goldman, R., Widom, J.: Dataguides: Enabling query formulation and optimization in semistructured databases. In: VLDB, Athens, Greece, pp. 436–445 (1997)

    Google Scholar 

  10. Haas, L.M., Miller, R.J., Niswonger, B., Roth, M.T., Schwarz, P.M., Wimmers, E.L.: Transforming Heterogeneous Data with Database Middleware: Beyond Integration. IEEE Data Engineering Bulletin 22(1), 31–36 (1999)

    Google Scholar 

  11. Alon, Y.: Levy. Logic-based techniques in data integration. Logic Based Artificial Intelligence (2000)

    Google Scholar 

  12. Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with Cupid. The VLDB Journal, 49–58 (2001)

    Google Scholar 

  13. Manolescu, I., Arion, A., Bonifati, A., Pugliese, A.: Path Sequence-Based XML Query Processing. In: Bases de Données Avancées (French database conference), Informal proceedings only, Montpellier, France (2004)

    Google Scholar 

  14. Mitra, P., Wiederhold, G., Jannink, J.: Semi-automatic integration of knowledge sources. In: Proc. of the 2nd Int. Conf. On Information FUSION 1999 (1999)

    Google Scholar 

  15. Mitra, P., Wiederhold, G., Kersten, M.: A graph-oriented model for articulation of ontology interdependencies. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, p. 86. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  16. Wiederhold, G.: Mediators in the architecture of future information systems. In: IEEE Computer, pp. 38–49 (1992)

    Google Scholar 

  17. The ActiveXML home page (2004), Available at http://www.activexml.net

  18. Gemo and PRiSM at the INEX heterogeneous track (2004), Available at http://www-rocq.inria.fr/gemo/Gemo/Projects/INEX-HET

  19. The FreePastry system (2001), Available at http://www.cs.rice.edu/CS/Systems/Pastry/FreePastry/

  20. XSum: The XML Summary Drawer (2004), Available at http://www-rocq.inria.fr/gemo/Gemo/Projects/SUMMARY

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abiteboul, S., Manolescu, I., Nguyen, B., Preda, N. (2005). A Test Platform for the INEX Heterogeneous Track. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds) Advances in XML Information Retrieval. INEX 2004. Lecture Notes in Computer Science, vol 3493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424550_29

Download citation

  • DOI: https://doi.org/10.1007/11424550_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26166-7

  • Online ISBN: 978-3-540-32053-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics