Abstract
In this paper we propose a semi-automatic technique for deriving the similarity degree between two portions of heterogeneous, semistructured information sources (hereafter, sub-sources). The proposed technique consists of two phases: the first one selects the most promising pairs of sub-sources, whereas the second one computes the similarity degree relative to each promising pair. In addition, we show that the detection of sub-source similarities is a special case (and a very interesting one, for semi-structured information sources) of the more general problem of Scheme Match. Finally we discuss some possible applications which can benefit of derived sub-source similarities. A real example case is presented for better clarifying the proposed technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
P. A. Bernstein and E. Rahm. Data warehouse scenarios for model management. In Proc. of International Conference on Entity-Relationship Modeling (ER’00), pages 1–15, Salt Lake City, Utah, USA, 2000. Lecture Notes in Computer Science, Springer Verlag. 165
P. Fankhauser, M. Kracker, and E. J. Neuhold. Semantic vs. structural resemblance of classes. ACM SIGMOD RECORD, 20(4):59–63, 1991. 163
Z. Galil. Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys, 18:23–38, 1986. 170
W. Gotthard, P. C. Lockemann, and A. Neufeld. System-guided view integration for object-oriented databases. IEEE Transactions on Knowledge and Data Engineering, 4(1):1–22, 1992. 163
J. A. Larson, S. B. Navathe, and R. Elmastri. A theory of attribute equivalence in databases with application to schema integration. IEEE Transactions on Software Engineering, 15(4):449–463, 1989. 163
T. Milo and S. Zohar. Using schema matching to simplify heterogenous data translations. In Proc. of Conference on Very Large Data Bases (VLDB’98), pages 122–133, New York City, USA, 1998. Morgan Kaufmann. 163
L. Palopoli, D. Rosaci, G. Terracina, and D. Ursino. Un modello concettuale per rappresentare e derivare la semantica associata a sorgenti informative strutturate e semi-strutturate. Atti del Congresso sui Sistemi Evoluti per Basi di Dati (SEBD 2001). In Italian. Forthcoming., 2001. 166
L. Palopoli, D. Saccà, G. Terracina, and D. Ursino. A unified graph-based framework for deriving nominal interscheme properties, type conflicts and object cluster similarities. In Proc. of Fourth IFCIS Conference on Cooperative Information Systems (CoopIS’99), pages 34–45, Edinburgh, United Kingdom, 1999. IEEE Computer Society. 163
L. Palopoli, G. Terracina, and D. Ursino. A graph-based approach for extracting terminological properties of elements of XML documents. In Proc. of International Conference on Data Engineering (ICDE 2001), pages 330–340, Heildeberg, Germany, 2001. IEEE Computer Society. 165, 166, 167, 171
E. Rahm and P. A. Bernstein. On mathing schemas automatically. In Technical Report MSR-TR-2001-17, http://www.research.microsoft.com/scripts/pubs/view.asp?TR ID=MSR-TR-2001-17, 2001. 163, 165
N. Rishe, J. Yuan, R. Athauda, S.-C. Chen, X. Lu, X. Ma, A. Vaschillo, A. Shaposhnikov, and D. Vasilevsky. Semantic access: Semantic interface for querying databases. In Proc. of International Conference on Very Large Data Bases (VLDB 2000), pages 591–594, Il Cairo, Egypt, 2000. Morgan Kaufmann. 165
D. Rosaci, G. Terracina, and D. Ursino. An algorithm for obtaining a global representation from information sources having different nature and structure. In Proc. of International Conference on Database and Expert Systems Applications (DEXA 2001), Munich, Germany, 2001. Forthcoming. 165
S. Spaccapietra and C. Parent. View integration: A step forward in solving structural conflicts. IEEE Transactions on Knowledge and Data Engineering, 6(2):258–274, 1994. 163
G. Terracina and D. Ursino. Deriving synonymies and homonymies of object classes in semi-structured information sources. In Proc. of International Conference on Management of Data (COMAD 2000), pages 21–32, Pune, India, 2000. McGraw Hill. 164, 165, 166
J. A. Wald and P. G. Sorenson. Explaining ambiguity in a formal query language. ACM Transaction on Database Systems, 15(2):125–161, 1990. 165
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rosaci, D., Terracina, G., Ursino, D. (2001). Deriving “Sub-source” Similarities from Heterogeneous, Semi-structured Information Sources. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds) Cooperative Information Systems. CoopIS 2001. Lecture Notes in Computer Science, vol 2172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44751-2_14
Download citation
DOI: https://doi.org/10.1007/3-540-44751-2_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42524-3
Online ISBN: 978-3-540-44751-1
eBook Packages: Springer Book Archive