Abstract
This article presents our work within the INEX 2004 Heterogeneous Track. We focused on taming the structural diversity within the INEX heterogeneous bibliographic corpus.
We demonstrate how semantic models and associated inference techniques can be used to solve the problems raised by the structural diversity within a given XML corpus. The first step automatically extracts a set of concepts from each class of INEX heterogeneous documents. An unified set of concepts is then computed, which synthesizes the interesting concepts from the whole corpus. Individual corpora are connected to the unified set of concepts via conceptual mappings. This approach is implemented as an application of the KadoP platform for peer-to-peer warehousing of XML documents. While this work caters to the structural aspects of XML information retrieval, the extensibility of the KadoP system makes it an interesting test platform in which components developed by several INEX participants could be plugged, exploiting the opportunities of peer-to-peer data and service distribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abiteboul, S., Benjelloun, O., Milo, T.: The ActiveXML project: an overview. Gemo research report no. 344 (2004)
Abiteboul, S., Cobéna, G., Nguyen, B., Poggi, A.: Construction and maintenance of a set of pages of interest (SPIN). In: Bases de Donnees Avancees, Informal proceedings only, Evry (2002)
Abiteboul, S., Manolescu, I., Preda, N.: Constructing and querying a peer-to-peer warehouse of XML resources. In: Proceedings of the Semantic Web and Databases Workshop (in collaboration with VLDB), Toronto, CA (2004)
Castano, S., De Antonellis, V., De capitani di Vimercati, S.: Global viewing of heterogeneous data sources. IEEE Transactions on Knowledge and Data Engineering 13(2), 277–297 (2001)
Cluet, S., Veltri, P., Vodislav, D.: Views in a large scale XML repository. In: VLDB, pp. 271–280 (2001)
Delobel, C., Reynaud, C., Rousset, M.-C., Sirot, J.-P., Vodislav, D.: Semantic integration in Xyleme: a uniform tree-based approach. IEEE Data and Knowledge Engineering 44(3), 267–298 (2003)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Giraldo, G.: Automatic ontology construction in mediator systems. Ph.D. thesis, University of Orsay, France (2005)
Goldman, R., Widom, J.: Dataguides: Enabling query formulation and optimization in semistructured databases. In: VLDB, Athens, Greece, pp. 436–445 (1997)
Haas, L.M., Miller, R.J., Niswonger, B., Roth, M.T., Schwarz, P.M., Wimmers, E.L.: Transforming Heterogeneous Data with Database Middleware: Beyond Integration. IEEE Data Engineering Bulletin 22(1), 31–36 (1999)
Alon, Y.: Levy. Logic-based techniques in data integration. Logic Based Artificial Intelligence (2000)
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with Cupid. The VLDB Journal, 49–58 (2001)
Manolescu, I., Arion, A., Bonifati, A., Pugliese, A.: Path Sequence-Based XML Query Processing. In: Bases de Données Avancées (French database conference), Informal proceedings only, Montpellier, France (2004)
Mitra, P., Wiederhold, G., Jannink, J.: Semi-automatic integration of knowledge sources. In: Proc. of the 2nd Int. Conf. On Information FUSION 1999 (1999)
Mitra, P., Wiederhold, G., Kersten, M.: A graph-oriented model for articulation of ontology interdependencies. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, p. 86. Springer, Heidelberg (2000)
Wiederhold, G.: Mediators in the architecture of future information systems. In: IEEE Computer, pp. 38–49 (1992)
The ActiveXML home page (2004), Available at http://www.activexml.net
Gemo and PRiSM at the INEX heterogeneous track (2004), Available at http://www-rocq.inria.fr/gemo/Gemo/Projects/INEX-HET
The FreePastry system (2001), Available at http://www.cs.rice.edu/CS/Systems/Pastry/FreePastry/
XSum: The XML Summary Drawer (2004), Available at http://www-rocq.inria.fr/gemo/Gemo/Projects/SUMMARY
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abiteboul, S., Manolescu, I., Nguyen, B., Preda, N. (2005). A Test Platform for the INEX Heterogeneous Track. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds) Advances in XML Information Retrieval. INEX 2004. Lecture Notes in Computer Science, vol 3493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424550_29
Download citation
DOI: https://doi.org/10.1007/11424550_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26166-7
Online ISBN: 978-3-540-32053-1
eBook Packages: Computer ScienceComputer Science (R0)