Skip to main content

Deriving “Sub-source” Similarities from Heterogeneous, Semi-structured Information Sources

  • Conference paper
  • First Online:
Cooperative Information Systems (CoopIS 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2172))

Included in the following conference series:

  • 383 Accesses

Abstract

In this paper we propose a semi-automatic technique for deriving the similarity degree between two portions of heterogeneous, semistructured information sources (hereafter, sub-sources). The proposed technique consists of two phases: the first one selects the most promising pairs of sub-sources, whereas the second one computes the similarity degree relative to each promising pair. In addition, we show that the detection of sub-source similarities is a special case (and a very interesting one, for semi-structured information sources) of the more general problem of Scheme Match. Finally we discuss some possible applications which can benefit of derived sub-source similarities. A real example case is presented for better clarifying the proposed technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. P. A. Bernstein and E. Rahm. Data warehouse scenarios for model management. In Proc. of International Conference on Entity-Relationship Modeling (ER’00), pages 1–15, Salt Lake City, Utah, USA, 2000. Lecture Notes in Computer Science, Springer Verlag. 165

    Google Scholar 

  2. P. Fankhauser, M. Kracker, and E. J. Neuhold. Semantic vs. structural resemblance of classes. ACM SIGMOD RECORD, 20(4):59–63, 1991. 163

    Article  Google Scholar 

  3. Z. Galil. Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys, 18:23–38, 1986. 170

    Article  MATH  MathSciNet  Google Scholar 

  4. W. Gotthard, P. C. Lockemann, and A. Neufeld. System-guided view integration for object-oriented databases. IEEE Transactions on Knowledge and Data Engineering, 4(1):1–22, 1992. 163

    Article  Google Scholar 

  5. J. A. Larson, S. B. Navathe, and R. Elmastri. A theory of attribute equivalence in databases with application to schema integration. IEEE Transactions on Software Engineering, 15(4):449–463, 1989. 163

    Article  MATH  Google Scholar 

  6. T. Milo and S. Zohar. Using schema matching to simplify heterogenous data translations. In Proc. of Conference on Very Large Data Bases (VLDB’98), pages 122–133, New York City, USA, 1998. Morgan Kaufmann. 163

    Google Scholar 

  7. L. Palopoli, D. Rosaci, G. Terracina, and D. Ursino. Un modello concettuale per rappresentare e derivare la semantica associata a sorgenti informative strutturate e semi-strutturate. Atti del Congresso sui Sistemi Evoluti per Basi di Dati (SEBD 2001). In Italian. Forthcoming., 2001. 166

    Google Scholar 

  8. L. Palopoli, D. Saccà, G. Terracina, and D. Ursino. A unified graph-based framework for deriving nominal interscheme properties, type conflicts and object cluster similarities. In Proc. of Fourth IFCIS Conference on Cooperative Information Systems (CoopIS’99), pages 34–45, Edinburgh, United Kingdom, 1999. IEEE Computer Society. 163

    Google Scholar 

  9. L. Palopoli, G. Terracina, and D. Ursino. A graph-based approach for extracting terminological properties of elements of XML documents. In Proc. of International Conference on Data Engineering (ICDE 2001), pages 330–340, Heildeberg, Germany, 2001. IEEE Computer Society. 165, 166, 167, 171

    Google Scholar 

  10. E. Rahm and P. A. Bernstein. On mathing schemas automatically. In Technical Report MSR-TR-2001-17, http://www.research.microsoft.com/scripts/pubs/view.asp?TR ID=MSR-TR-2001-17, 2001. 163, 165

  11. N. Rishe, J. Yuan, R. Athauda, S.-C. Chen, X. Lu, X. Ma, A. Vaschillo, A. Shaposhnikov, and D. Vasilevsky. Semantic access: Semantic interface for querying databases. In Proc. of International Conference on Very Large Data Bases (VLDB 2000), pages 591–594, Il Cairo, Egypt, 2000. Morgan Kaufmann. 165

    Google Scholar 

  12. D. Rosaci, G. Terracina, and D. Ursino. An algorithm for obtaining a global representation from information sources having different nature and structure. In Proc. of International Conference on Database and Expert Systems Applications (DEXA 2001), Munich, Germany, 2001. Forthcoming. 165

    Google Scholar 

  13. S. Spaccapietra and C. Parent. View integration: A step forward in solving structural conflicts. IEEE Transactions on Knowledge and Data Engineering, 6(2):258–274, 1994. 163

    Article  Google Scholar 

  14. G. Terracina and D. Ursino. Deriving synonymies and homonymies of object classes in semi-structured information sources. In Proc. of International Conference on Management of Data (COMAD 2000), pages 21–32, Pune, India, 2000. McGraw Hill. 164, 165, 166

    Google Scholar 

  15. J. A. Wald and P. G. Sorenson. Explaining ambiguity in a formal query language. ACM Transaction on Database Systems, 15(2):125–161, 1990. 165

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rosaci, D., Terracina, G., Ursino, D. (2001). Deriving “Sub-source” Similarities from Heterogeneous, Semi-structured Information Sources. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds) Cooperative Information Systems. CoopIS 2001. Lecture Notes in Computer Science, vol 2172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44751-2_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-44751-2_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42524-3

  • Online ISBN: 978-3-540-44751-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics