Skip to main content
Log in

A graph-based approach for extracting terminological properties from information sources with heterogeneous formats

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The problem of handling both the integration and the cooperation of a large number of information sources characterised by heterogeneous representation formats is a challenging issue. In this context, a central role can be played by the knowledge about the semantic relationships holding between concepts belonging to different information sources (intersource properties). In this paper, we propose a semiautomatic approach for extracting two kinds of intersource properties, namely synonymies and homonymies, from heterogeneous information sources. In order to carry out the extraction task, we introduce both a conceptual model, for representing involved sources, and a metrics, for measuring the strength of the semantic relationships holding among concepts represented within the same source.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abiteboul S (1997) Querying semi-structured data. In: Proc of international conference on database theory (ICDT’97), Delphi, Greece. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 1–18

  2. Abiteboul S, Quass D, McHugh J, Widom J, Wiener JL (1997) The lorel query language for semistructured data. Int Jl Digital Libr 1(1):68–88

    Google Scholar 

  3. Abiteboul S, Vianu V (1997) Queries and computation on the web. In: Proc of international conference on database theory (ICDT’97), Delphi, Greece. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 262–275

  4. Batini C, Lenzerini M (1984) A methodology for data schema integration in the entity relationship model. IEEE Trans Softw Eng 10(6):650–664

    Article  Google Scholar 

  5. Beneventano D, Bergamaschi S, Sartori C, Vincini M (1997) ODB-Tools: a description logics based tool for schema validation and semantic query optimization in object oriented databases. In: Proc of advances in artificial intelligence, 5th congress of the Italian association for artificial intelligence (AI*IA’97), Roma, Italy. Lecture notes in artificial intelligence, Springer, Berlin, Heidelberg, New York, pp 435–438

  6. Bergamaschi S, Castano S, Vincini M (1999) Semantic integration of semistructured and structured data sources. SIGMOD Rec 28(1):54–59

    Google Scholar 

  7. Bergamaschi S, Castano S, Vincini M, Beneventano D (2001) Semantic integration and query of heterogeneous information sources. Data Knowl Eng 36(3):215–249

    Article  MATH  Google Scholar 

  8. Bernstein PA, Rahm E (2000) Data warehouse scenarios for model management. In: Proc of international conference on conceptual modeling (ER’00), Salt Lake City, Utah, USA. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 1–15

  9. Buccafurri F, Lax G, Rosaci D, Ursino D (2002) A user behavior-based agent for improving web usage. In: Proc of international conference on ontologies, databases and applications of semantics (ODBASE 2002), Irvine, California, USA. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 1168–1185

  10. Buneman P (1997) Semistructured data. In: Proc of symposium on principles of database systems, (PODS’97), Tucson, Arizona, USA. ACM Press, pp 117–121

  11. Buneman P, Davidson S, Fernandez M, Suciu D (1997) Adding structure to unstructured data. In: Proc of international conference on database theory (ICDT’97), Delphi, Greece. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 336–350

  12. Calvanese D, De Giacomo G, Lenzerini M (1999) Modeling and querying semi-structured data. Netw Inf Syst J 2(2):253–273

    Google Scholar 

  13. Calvanese D, De Giacomo G, Lenzerini M, Nardi D, Rosati R (1998) Description logic framework for information integration. In: Proc of international conference on principles of knowledge representation and reasoning (KR’98), Trento, Italy. Morgan Kaufman, pp 2–13

  14. Castano S, De Antonellis V (1997) Semantic dictionary design for database interoperability. In: Proc of international conference on data engineering (ICDE’97), Birmingham, United Kingdom. IEEE Computer Society, pp 43–54

  15. Castano S, De Antonellis V, De Capitani di Vimercati S (2001) Global viewing of heterogeneous data sources. Trans Data Knowl Eng 13(2):277–297

    Article  Google Scholar 

  16. Castano S, De Antonellis V, Ferrara A, Kuruvilla G (2002) Ontology-based integration of heterogeneous XML datasources. In: Atti del decimo convegno nazionale su sistemi evoluti per basi di dati (SEBD’02), Portoferraio, Italy, pp 27–41

  17. Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. ACM SIGMOD REC 26(1):65–74

    Google Scholar 

  18. Comai S, Damiani E, Fraternali P (2001) Computing graphical queries over XML data. ACM Trans Inf Syst 19(4):371–430

    Article  Google Scholar 

  19. Doan A, Domingos P, Halevy A (2001) Reconciling schemas of disparate data sources: a machine-learning approach. In: Proc of the international conference on management of data (SIGMOD 2001), Santa Barbara, California, USA. ACM Press, pp 509–520

  20. Fankhauser P, Kracker M, Neuhold EJ (1991) Semantic vs structural resemblance of classes. ACM SIGMOD REC 20(4):59–63

    Google Scholar 

  21. Fernandez MF, Popa L, Suciu D (1997) A structure-based approach to querying semi-structured data. In: Proc of international workshop on database programming languages (DBLP’97), Estes Park, Colorado, USA. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 136–159

  22. Galil Z (1986) Efficient algorithms for finding maximum matching in graphs. ACM Comput Surv 18:23–38

    Article  MATH  MathSciNet  Google Scholar 

  23. Goldman R, McHugh J, Widom J (1999) From semistructured data to XML: migrating the lore data model and query languages. In: Proc of international workshop on the web and databases (WebDB’99), Philadelphia, Pennsylvania, pp 25–30

  24. Goldman R, Widom J (1997) Dataguides: enabling query formulation and optimization in semistructured databases. In: Proc of very large data bases (VLDB’97), Athens, Greece. Morgan Kaufman, pp 436–445

  25. Haas LM, Miller RJ, Niswonger B, Roth MT, Schwarz PM, Wimmers EL (1999) Transforming heterogeneous data with database middleware: beyond integration. IEEE Data Eng Bull 22(1):31–36

    Google Scholar 

  26. Larson JA, Navathe SB, Elmastri R (1989) A theory of attribute equivalence in databases with application to schema integration. IEEE Trans Softw Eng 15(4):449–463

    Article  MATH  MathSciNet  Google Scholar 

  27. Lim S, Ng Y (2001) Semantic integration of semistructured data. In: Proc of the international symposium on cooperative database systems and applications (CODAS’01), Beijing, China. IEEE Computer Society Press, pp 15–24

  28. Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: Proc of the international conference on very large data bases (VLDB 2001), Roma, Italy. Morgan Kaufmann, pp 49–58

  29. Mendelzon AO, Mihaila GA, Milo T (1996) Querying the world wide web. In: Proc of conference on parallel and distributed information systems (PDIS’96), Miami Beach (Florida). IEEE Computer Society, pp 80–91

  30. Miller AG (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  31. Milo T, Zohar S (1998) Using schema matching to simplify heterogenous data translations. In: Proc of the international conference on very large data bases (VLDB’98), New York City. Morgan Kaufmann, pp 122–133

  32. Mitra P, Wiederhold G, Jannink J (1999) Semi-automatic integration of knowledge sources. In: Proc of fusion’99, Sunnyvale, California

  33. Nestorov S, Ullman JD, Wiener JL, Chawathe SS (1997) Representative objects: concise representations of semistructured, hierarchical data. In: Proc of international conference on data engineering (ICDE’97), Birmingham, United Kingdom. IEEE Computer Society, pp 79–90

  34. Palopoli L, Pontieri L, Terracina G, Ursino D (2002) A novel three-level architecture for large data warehouses. J Syst Arch 47(11):937–958

    Article  Google Scholar 

  35. Palopoli L, Pontieri L, Ursino D (1999a) Automatic and semantic techniques for scheme integration and scheme abstraction. In: Proc of international conference on database and expert systems applications (DEXA’99), Firenze, Italy. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 511–520

  36. Palopoli L, Rosaci D, Terracina G, Ursino D (2001) Modeling web-search scenarios exploiting user and source profiles. AI Commun 14(4):215–230

    MATH  Google Scholar 

  37. Palopoli L, Saccà D, Terracina G, Ursino D (2003) Uniform techniques for deriving similarities of objects and subschemes in heterogeneous databases. IEEE Trans Knowl Data Eng 15(2):271–294

    Article  Google Scholar 

  38. Palopoli L, Saccà D, Ursino D (1999b) Semi-automatic techniques for deriving interscheme properties from database schemes. Data Knowl Eng 30(4):239–273

    Article  MATH  Google Scholar 

  39. Papakonstantinou Y, Garcia-Molina H, Widom J (1995) Object exchange across heterogeneous information sources. In: Proc of international conference on data engineering (ICDE’95), Taipei, Taiwan. IEEE Computer Society, pp 251–260

  40. Quass D, Rajaraman A, Sagiv Y, Ullman JD, Widom J (1995) Querying semistructured heterogeneous information. In: Proc of international conference on deductive and object-oriented databases (DOOD’95), Singapore. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 319–344

  41. Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350

    Article  MATH  Google Scholar 

  42. Richardson SD, Dolan WB, Vanderwende L (1998) MindNet: acquiring and structuring semantic information from text. In: Proc of international conference on computational linguistics (COLING-ACL’98), Montreal, Quebec, Canada. Morgan Kaufmann, pp 1098–1102

  43. Rishe N, Yuan J, Athauda R, Chen S-C, Lu X, Ma X, Vaschillo A, Shaposhnikov A, Vasilevsky D (2000) Semantic access: semantic interface for querying databases. In: Proc of international conference on very large data bases (VLDB 2000), Il Cairo, Egypt. Morgan Kaufmann, pp 591–594

  44. Rosaci D, Sarnè GML, Ursino D (2002) A multi-agent model for handling e-commerce activities. In: Proc of international database engineering and applications symposium (IDEAS 2002), Edmonton, Alberta, Canada. IEEE Computer Society, pp 202–211

  45. Rosaci D, Terracina G, Ursino D (2004) An approach for deriving a global representation of data sources having different formats and structures. Knowl Inf Syst 6(1):42–82

    Article  Google Scholar 

  46. Suciu D (1998) Semistructured data and XML. In: Proc of international conference on foundations of data organization (FODO’98), Kobe, Japan

  47. Tresch M, Palmer N, Luniewski A (1995) Type classification of semi-structured documents. In: Proc of international conference on very large databases (VLDB’95), Zurich, Switzerland. Morgan Kaufmann, pp 263–274

  48. Ursino D (1999) Deriving type conflicts and object cluster similarities in database schemes by an automatic and semantic approach. In: Proc of symposium on advances in databases and information systems (ADBIS’99), Maribor, Slovenia. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 46–60

  49. Wald JA, Sorenson PG (1990) Explaining ambiguity in a formal query language. ACM Trans Database Syst 15(2):125–161

    Article  Google Scholar 

  50. Widom J (1995) Research problems in data warehousing. In: Proc of international conference on information and knowledge management (CIKM’95), Baltimore, Maryland. ACM Press, pp 25–30

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgio Terracina.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palopoli, L., Rosaci, D., Terracina, G. et al. A graph-based approach for extracting terminological properties from information sources with heterogeneous formats. Knowl Inf Syst 8, 462–497 (2005). https://doi.org/10.1007/s10115-004-0185-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-004-0185-2

Keywords

Navigation