Abstract
The problem of handling both the integration and the cooperation of a large number of information sources characterised by heterogeneous representation formats is a challenging issue. In this context, a central role can be played by the knowledge about the semantic relationships holding between concepts belonging to different information sources (intersource properties). In this paper, we propose a semiautomatic approach for extracting two kinds of intersource properties, namely synonymies and homonymies, from heterogeneous information sources. In order to carry out the extraction task, we introduce both a conceptual model, for representing involved sources, and a metrics, for measuring the strength of the semantic relationships holding among concepts represented within the same source.
Similar content being viewed by others
References
Abiteboul S (1997) Querying semi-structured data. In: Proc of international conference on database theory (ICDT’97), Delphi, Greece. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 1–18
Abiteboul S, Quass D, McHugh J, Widom J, Wiener JL (1997) The lorel query language for semistructured data. Int Jl Digital Libr 1(1):68–88
Abiteboul S, Vianu V (1997) Queries and computation on the web. In: Proc of international conference on database theory (ICDT’97), Delphi, Greece. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 262–275
Batini C, Lenzerini M (1984) A methodology for data schema integration in the entity relationship model. IEEE Trans Softw Eng 10(6):650–664
Beneventano D, Bergamaschi S, Sartori C, Vincini M (1997) ODB-Tools: a description logics based tool for schema validation and semantic query optimization in object oriented databases. In: Proc of advances in artificial intelligence, 5th congress of the Italian association for artificial intelligence (AI*IA’97), Roma, Italy. Lecture notes in artificial intelligence, Springer, Berlin, Heidelberg, New York, pp 435–438
Bergamaschi S, Castano S, Vincini M (1999) Semantic integration of semistructured and structured data sources. SIGMOD Rec 28(1):54–59
Bergamaschi S, Castano S, Vincini M, Beneventano D (2001) Semantic integration and query of heterogeneous information sources. Data Knowl Eng 36(3):215–249
Bernstein PA, Rahm E (2000) Data warehouse scenarios for model management. In: Proc of international conference on conceptual modeling (ER’00), Salt Lake City, Utah, USA. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 1–15
Buccafurri F, Lax G, Rosaci D, Ursino D (2002) A user behavior-based agent for improving web usage. In: Proc of international conference on ontologies, databases and applications of semantics (ODBASE 2002), Irvine, California, USA. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 1168–1185
Buneman P (1997) Semistructured data. In: Proc of symposium on principles of database systems, (PODS’97), Tucson, Arizona, USA. ACM Press, pp 117–121
Buneman P, Davidson S, Fernandez M, Suciu D (1997) Adding structure to unstructured data. In: Proc of international conference on database theory (ICDT’97), Delphi, Greece. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 336–350
Calvanese D, De Giacomo G, Lenzerini M (1999) Modeling and querying semi-structured data. Netw Inf Syst J 2(2):253–273
Calvanese D, De Giacomo G, Lenzerini M, Nardi D, Rosati R (1998) Description logic framework for information integration. In: Proc of international conference on principles of knowledge representation and reasoning (KR’98), Trento, Italy. Morgan Kaufman, pp 2–13
Castano S, De Antonellis V (1997) Semantic dictionary design for database interoperability. In: Proc of international conference on data engineering (ICDE’97), Birmingham, United Kingdom. IEEE Computer Society, pp 43–54
Castano S, De Antonellis V, De Capitani di Vimercati S (2001) Global viewing of heterogeneous data sources. Trans Data Knowl Eng 13(2):277–297
Castano S, De Antonellis V, Ferrara A, Kuruvilla G (2002) Ontology-based integration of heterogeneous XML datasources. In: Atti del decimo convegno nazionale su sistemi evoluti per basi di dati (SEBD’02), Portoferraio, Italy, pp 27–41
Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. ACM SIGMOD REC 26(1):65–74
Comai S, Damiani E, Fraternali P (2001) Computing graphical queries over XML data. ACM Trans Inf Syst 19(4):371–430
Doan A, Domingos P, Halevy A (2001) Reconciling schemas of disparate data sources: a machine-learning approach. In: Proc of the international conference on management of data (SIGMOD 2001), Santa Barbara, California, USA. ACM Press, pp 509–520
Fankhauser P, Kracker M, Neuhold EJ (1991) Semantic vs structural resemblance of classes. ACM SIGMOD REC 20(4):59–63
Fernandez MF, Popa L, Suciu D (1997) A structure-based approach to querying semi-structured data. In: Proc of international workshop on database programming languages (DBLP’97), Estes Park, Colorado, USA. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 136–159
Galil Z (1986) Efficient algorithms for finding maximum matching in graphs. ACM Comput Surv 18:23–38
Goldman R, McHugh J, Widom J (1999) From semistructured data to XML: migrating the lore data model and query languages. In: Proc of international workshop on the web and databases (WebDB’99), Philadelphia, Pennsylvania, pp 25–30
Goldman R, Widom J (1997) Dataguides: enabling query formulation and optimization in semistructured databases. In: Proc of very large data bases (VLDB’97), Athens, Greece. Morgan Kaufman, pp 436–445
Haas LM, Miller RJ, Niswonger B, Roth MT, Schwarz PM, Wimmers EL (1999) Transforming heterogeneous data with database middleware: beyond integration. IEEE Data Eng Bull 22(1):31–36
Larson JA, Navathe SB, Elmastri R (1989) A theory of attribute equivalence in databases with application to schema integration. IEEE Trans Softw Eng 15(4):449–463
Lim S, Ng Y (2001) Semantic integration of semistructured data. In: Proc of the international symposium on cooperative database systems and applications (CODAS’01), Beijing, China. IEEE Computer Society Press, pp 15–24
Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: Proc of the international conference on very large data bases (VLDB 2001), Roma, Italy. Morgan Kaufmann, pp 49–58
Mendelzon AO, Mihaila GA, Milo T (1996) Querying the world wide web. In: Proc of conference on parallel and distributed information systems (PDIS’96), Miami Beach (Florida). IEEE Computer Society, pp 80–91
Miller AG (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Milo T, Zohar S (1998) Using schema matching to simplify heterogenous data translations. In: Proc of the international conference on very large data bases (VLDB’98), New York City. Morgan Kaufmann, pp 122–133
Mitra P, Wiederhold G, Jannink J (1999) Semi-automatic integration of knowledge sources. In: Proc of fusion’99, Sunnyvale, California
Nestorov S, Ullman JD, Wiener JL, Chawathe SS (1997) Representative objects: concise representations of semistructured, hierarchical data. In: Proc of international conference on data engineering (ICDE’97), Birmingham, United Kingdom. IEEE Computer Society, pp 79–90
Palopoli L, Pontieri L, Terracina G, Ursino D (2002) A novel three-level architecture for large data warehouses. J Syst Arch 47(11):937–958
Palopoli L, Pontieri L, Ursino D (1999a) Automatic and semantic techniques for scheme integration and scheme abstraction. In: Proc of international conference on database and expert systems applications (DEXA’99), Firenze, Italy. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 511–520
Palopoli L, Rosaci D, Terracina G, Ursino D (2001) Modeling web-search scenarios exploiting user and source profiles. AI Commun 14(4):215–230
Palopoli L, Saccà D, Terracina G, Ursino D (2003) Uniform techniques for deriving similarities of objects and subschemes in heterogeneous databases. IEEE Trans Knowl Data Eng 15(2):271–294
Palopoli L, Saccà D, Ursino D (1999b) Semi-automatic techniques for deriving interscheme properties from database schemes. Data Knowl Eng 30(4):239–273
Papakonstantinou Y, Garcia-Molina H, Widom J (1995) Object exchange across heterogeneous information sources. In: Proc of international conference on data engineering (ICDE’95), Taipei, Taiwan. IEEE Computer Society, pp 251–260
Quass D, Rajaraman A, Sagiv Y, Ullman JD, Widom J (1995) Querying semistructured heterogeneous information. In: Proc of international conference on deductive and object-oriented databases (DOOD’95), Singapore. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 319–344
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350
Richardson SD, Dolan WB, Vanderwende L (1998) MindNet: acquiring and structuring semantic information from text. In: Proc of international conference on computational linguistics (COLING-ACL’98), Montreal, Quebec, Canada. Morgan Kaufmann, pp 1098–1102
Rishe N, Yuan J, Athauda R, Chen S-C, Lu X, Ma X, Vaschillo A, Shaposhnikov A, Vasilevsky D (2000) Semantic access: semantic interface for querying databases. In: Proc of international conference on very large data bases (VLDB 2000), Il Cairo, Egypt. Morgan Kaufmann, pp 591–594
Rosaci D, Sarnè GML, Ursino D (2002) A multi-agent model for handling e-commerce activities. In: Proc of international database engineering and applications symposium (IDEAS 2002), Edmonton, Alberta, Canada. IEEE Computer Society, pp 202–211
Rosaci D, Terracina G, Ursino D (2004) An approach for deriving a global representation of data sources having different formats and structures. Knowl Inf Syst 6(1):42–82
Suciu D (1998) Semistructured data and XML. In: Proc of international conference on foundations of data organization (FODO’98), Kobe, Japan
Tresch M, Palmer N, Luniewski A (1995) Type classification of semi-structured documents. In: Proc of international conference on very large databases (VLDB’95), Zurich, Switzerland. Morgan Kaufmann, pp 263–274
Ursino D (1999) Deriving type conflicts and object cluster similarities in database schemes by an automatic and semantic approach. In: Proc of symposium on advances in databases and information systems (ADBIS’99), Maribor, Slovenia. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 46–60
Wald JA, Sorenson PG (1990) Explaining ambiguity in a formal query language. ACM Trans Database Syst 15(2):125–161
Widom J (1995) Research problems in data warehousing. In: Proc of international conference on information and knowledge management (CIKM’95), Baltimore, Maryland. ACM Press, pp 25–30
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Palopoli, L., Rosaci, D., Terracina, G. et al. A graph-based approach for extracting terminological properties from information sources with heterogeneous formats. Knowl Inf Syst 8, 462–497 (2005). https://doi.org/10.1007/s10115-004-0185-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-004-0185-2