skip to main content
10.1145/1007568.1007612acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

iMAP: discovering complex semantic matches between database schemas

Published: 13 June 2004 Publication History

Abstract

Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these works deal only with one-to-one (1-1) matches, such as address = location. They do not consider the important class of more complex matches, such as address = concat (city, state) and room-pric = room-rate* (1 + tax-rate).We describe the iMAP system which semi-automatically discovers both 1-1 and complex matches. iMAP reformulates schema matching as a search in an often very large or infinite match space. To search effectively, it employs a set of searchers, each discovering specific types of complex matches. To further improve matching accuracy, iMAP exploits a variety of domain knowledge, including past complex matches, domain integrity constraints, and overlap data. Finally, iMAP introduces a novel feature that generates explanation of predicted matches, to provide insights into the matching process and suggest actions to converge on correct matches quickly. We apply iMAP to several real-world domains to match relational tables, and show that it discovers both 1-1 and complex matches with high accuracy.

References

[1]
J. Berlin and A. Motro. Database schema matching using machine learning with feature selection. In Proc. of CAiSE-2002.
[2]
S. Castano and V. D. Antonellis. A schema analysis and reconciliation tool environment. In Proc. of IDEAS-1999.
[3]
C. Clifton, E. Housman, and A. Rosenthal. Experience with a combined approach to attribute-matching across heterogeneous databases. In Proc. of the IFIP Working Conference on Data Semantics (DS-7), 1997.
[4]
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, NY, 1991.
[5]
T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In Proc. of SIGMOD-2002.
[6]
R. Dhamankar. Semi-automated discovery of matches between schemas, ontologies, and data fragments of disparate data sources. M. S. Thesis, Dept. of CS, Univ. of Illinois. To appear.
[7]
H. Do, S. Melnik, and E. Rahm. Comparison of schema matching evaluations. In Proceedings of the 2nd Int. Workshop on Web Databases 2002.
[8]
H. Do and E. Rahm. Coma: A system for flexible combination of schema matching approaches. In Proc. of VLDB-2002.
[9]
A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine learning approach. In Proc. of SIGMOD-2001.
[10]
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, NY, 1973.
[11]
D. Embley, D. Jackman, and L. Xu. Multifaceted exploitation of metadata for attribute match discovery in information integration. In Proc. of the WIIW-01, 2001.
[12]
B. He and K. C.-C. Chang. Statistical schema matching across web query interfaces. In Proc. of SIGMOD-2003.
[13]
J. Kang and J. Naughton. On schema matching with opaque column names and data values. In Proc. of SIGMOD-2003.
[14]
M. Lenzerini. Data integration; a theoretical perspective. In Proc. of PODS-2002.
[15]
W. Li and C. Clifton. SEMINT: A tool for identifying attribute correspondence in heterogeneous databases using neural networks. Data and Knowledge Engineering, 33:49--84, 2000.
[16]
J. Madhavan, P. Bernstein, K. Chen, A. Halevy, and P. Shenoy. Matching schemas by learning from a schema corpus. In Proc. of the IJCAI-03 Workshop on Info. Integration, 2003.
[17]
J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In Proc. of VLDB-2001.
[18]
C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, US, 1999.
[19]
S. Melnik, H. Molina-Garcia, and E. Rahm. Similarity flooding: a versatile graph matching algorithm. In Proc. of ICDE-2002.
[20]
R. Miller. Using schematically heterogeneous structures. In Proc. of SIGMOD-1998.
[21]
T. Milo and S. Zohar. Using schema matching to simplify heterogeneous data translation. In Proc. of VLDB-1998.
[22]
P. Mitra, G. Wiederhold, and J. Jannink. Semi-automatic integration of knowledge sources. In Proc. of Fusion-1999.
[23]
M. Perkowitz and O. Etzioni. Category translation: Learning to understand information on the internet. In Proc. of Int. Conf. on AI (IJCAI), 1995.
[24]
E. Rahm and P. Bernstein. On matching schemas automatically. VLDB Journal, 10(4), 2001.
[25]
S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995.
[26]
L. Seligman, A. Rosenthal, P. Lehner, and A. Smith. Data integration: Where does the time go? IEEE Data Engineering Bulletin, 2002.
[27]
L. Todorovski and S. Dzeroski. Declarative bias in equation discovery. In Proc. of the Int. Conf. on Machine Learning (ICML), 1997.
[28]
L. Xu and D. Embley. Using domain ontologies to discover direct and indirect matches for schema elements. In Proc. of the Semantic Integration Workshop at ISWC-2003.
[29]
L. Yan, R. Miller, L. Haas, and R. Fagin. Data driven understanding and refinement of schema mappings. In Proc. of SIGMOD-2001.

Cited By

View all
  • (2024)Semantic Annotation of Relational Schemas Using a Probabilistic Generative ModelProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632414(127-135)Online publication date: 4-Jan-2024
  • (2024)Automatic semantic modeling of structured data sources with cross-modal retrievalPattern Recognition Letters10.1016/j.patrec.2023.11.014177(7-14)Online publication date: Jan-2024
  • (2024)Multi-view representation learning for tabular data integration using inter-feature relationshipsJournal of Biomedical Informatics10.1016/j.jbi.2024.104602151:COnline publication date: 1-Mar-2024
  • Show More Cited By
  1. iMAP: discovering complex semantic matches between database schemas

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
    June 2004
    988 pages
    ISBN:1581138598
    DOI:10.1145/1007568
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Semantic Annotation of Relational Schemas Using a Probabilistic Generative ModelProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632414(127-135)Online publication date: 4-Jan-2024
    • (2024)Automatic semantic modeling of structured data sources with cross-modal retrievalPattern Recognition Letters10.1016/j.patrec.2023.11.014177(7-14)Online publication date: Jan-2024
    • (2024)Multi-view representation learning for tabular data integration using inter-feature relationshipsJournal of Biomedical Informatics10.1016/j.jbi.2024.104602151:COnline publication date: 1-Mar-2024
    • (2023)Schema Matching using Pre-Trained Language Models2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00123(1558-1571)Online publication date: Apr-2023
    • (2023)Distributional constraint discovery for intelligent auditingKnowledge and Information Systems10.1007/s10115-023-01929-z65:12(5195-5229)Online publication date: 7-Aug-2023
    • (2022)Automatic Semantic Modeling for Structural Data Source with the Prior Knowledge from Knowledge BaseMathematics10.3390/math1024477810:24(4778)Online publication date: 15-Dec-2022
    • (2022)LIDERProceedings of the VLDB Endowment10.14778/3565816.356581916:2(154-166)Online publication date: 1-Oct-2022
    • (2022)COREProceedings of the VLDB Endowment10.14778/3538598.353861515:9(1951-1964)Online publication date: 27-Jul-2022
    • (2022)SancusProceedings of the VLDB Endowment10.14778/3538598.353861415:9(1937-1950)Online publication date: 1-May-2022
    • (2022)OLAP and NoSQL: Happily Ever AfterAdvances in Databases and Information Systems10.1007/978-3-031-15740-0_4(35-44)Online publication date: 29-Aug-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media