Abstract
In this paper, we describe and situate the tupelo system for data mapping in relational databases. Automating the discovery of mappings between structured data sources is a long standing and important problem in data management. Starting from user provided example instances of the source and target schemas, tupeloapproaches mapping discovery as search within the transformation space of these instances based on a set of mapping operators. tupelomapping expressions incorporate not only data-metadata transformations, but also simple and complex semantic transformations, resulting in significantly wider applicability than previous systems. Extensive empirical validation of tupelo, both on synthetic and real world datasets, indicates that the approach is both viable and effective.
The current paper is a continuation of work first explored in poster/demo presentations (IHIS05 and SIGMOD05) and a short workshop paper [11].
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bernstein, P.A., et al.: Interactive Schema Translation with Instance-Level Mappings (System Demo). In: Proc. VLDB Conf., Trondheim, Norway, pp. 1283–1286 (2005)
Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: Proc. IEEE ICDE, Tokyo, Japan, pp. 69–80 (2005)
Bossung, S., et al.: Automated Data Mapping Specification via Schema Heuristics and User Interaction. In: Proc. IEEE/ACM ASE, Linz, Austria, pp. 208–217 (2004)
Carreira, P., Galhardas, H.: Execution of Data Mappers. In: Proc. ACM SIGMOD Workshop IQIS, Paris, France, pp. 2–9 (2004)
Chang, K.C.-C., He, B., Li, C., Patel, M., Zhang, Z.: Structured Databases on the Web: Observations and Implications. SIGMOD Record 33(3), 61–70 (2004)
Dhamankar, R., et al.: iMAP: Discovering Complex Semantic Matches between Database Schemas. In: Proc. ACM SIGMOD, Paris, France, pp. 383–394 (2004)
Doan, A., Domingos, P., Halevy, A.: Learning to Match the Schemas of Databases: A Multistrategy Approach. Machine Learning 50(3), 279–301 (2003)
Doan, A., Noy, N., Halevy, A.: Special Section on Semantic Integration. SIGMOD Record 33(4) (2004)
Embley, D.W., Xu, L., Ding, Y.: Automatic Direct and Indirect Schema Mapping: Experiences and Lessons Learned, vol. 8, pp. 14–19
Euzenat, J., et al.: State of the Art on Ontology Alignment. In: Tech. Report D2.2.3, IST Knowledge Web NoE (2004)
Fletcher, G.H.L., Wyss, C.M.: Mapping Between Data Sources on the Web. In: Proc. IEEE ICDE Workshop WIRI, Tokyo, Japan (2005)
Fletcher, G.H.L., et al.: A Calculus for Data Mapping. In: Proc. COORDINATION Workshop InterDB, Namur, Belgium (2005)
Gottlob, G., et al.: The Lixto Data Extraction Project – Back and Forth between Theory and Practice. In: Proc. ACM PODS, Paris, France, pp. 1–12 (2004)
He, B., et al.: Discovering Complex Matchings Across Web Query Interfaces: A Correlation Mining Approach. In: Proc. ACM KDD (2004)
Ives, Z.G., Halevy, A.Y., Mork, P., Tatarinov, I.: Piazza: Mediation and Integration Infrastructure for Semantic Web Data. J. Web Sem 1(2), 155–175 (2004)
Kang, J., Naughton, J.F.: On Schema Matching with Opaque Column Names and Data Values. In: Proc. ACM SIGMOD, San Diego, CA, pp. 205–216 (2003)
Kolaitis, P.G.: Schema Mappings, Data Exchange, and Metadata Management. In: Proc. ACM PODS, Baltimore, MD, USA, pp. 61–75 (2005)
Krishnamurthy, R., et al.: Language Features for Interoperability of Databases with Schematic Discrepancies. In: Proc. ACM SIGMOD, Denver, CO, USA, pp. 40–49 (1991)
Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Proc. ACM PODS, Madison, WI, pp. 233–246 (2002)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)
Levy, A.Y., Ordille, J.J.: An Experiment in Integrating Internet Information Sources. In: Proc. AAAI Fall Symp. AI Apps. Knowl. Nav. Ret., Cambridge, MA, USA, pp. 92–96 (1995)
Li, W.-S., Clifton, C.: SEMINT: A Tool for Identifying Attribute Correspondences in Heterogeneous Databases Using Neural Networks. Data Knowl. Eng 33(1), 49–84 (2000)
Litwin, W., Ketabchi, M.A., Krishnamurthy, R.: First Order Normal Form for Relational Databases and Multidatabases. SIGMOD Record 20(4), 74–76 (1991)
Melnik, S.: Generic Model Management. LNCS, vol. 2967. Springer, Heidelberg (2004)
Melnik, S., et al.: Supporting Executable Mappings in Model Management. In: Proc. ACM SIGMOD, Baltimore, MD, USA (2005)
Miller, R.J., Haas, L.M., Hernández, M.A.: Schema Mapping as Query Discovery. In: Proc. VLDB Conf., Cairo, Egypt, pp. 77–88 (2000)
Morishima, A., et al.: A Machine Learning Approach to Rapid Development of XML Mapping Queries. In: Proc. IEEE ICDE, Boston, MA, USA, pp. 276–287 (2004)
Nilsson, N.J.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann, San Francisco (1998)
Noy, N.F., Doan, A., Halevy, A.Y.: Special Issue on Semantic Integration. AI Magazine 26(1) (2005)
Perkowitz, M., Etzioni, O.: Category Translation: Learning to Understand Information on the Internet. In: Proc. IJCAI, Montréal, Canada, pp. 930–938 (1995)
Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)
Raman, V., Hellerstein, J.M.: Potter’sWheel: An Interactive Data Cleaning System. In: Proc. VLDB Conf., Roma, Italy, pp. 381–390 (2001)
Schmid, U., Waltermann, J.: Automatic Synthesis of XSL-Transformations from Example Documents. In: Proc. IASTED AIA, Innsbruck, Austria (2004)
Shvaiko, P., Euzenat, J.: A Survey of Schema-Based Matching Approaches. In: J. Data Semantics IV (2005)(to appear)
Smiljanić, M., van Keulen, M., Jonker, W.: Formalizing the XML schema matching problem as a constraint optimization problem. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 333–342. Springer, Heidelberg (2005)
Stephens, D.R.: Information Retrieval and Computational Geometry. Dr. Dobb’s Journal 29(12), 42–45 (2004)
Wang, G., Goguen, J.A., Nam, Y.-K., Lin, K.: Critical points for interactive schema matching. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 654–664. Springer, Heidelberg (2004)
Winkler, W.E.: The State of Record Linkage and Current Research Problems. U.S. Bureau of the Census, Statistical Research Division, Technical Report RR99/04 (1999)
Wyss, C.M., Robertson, E.L.: Relational Languages for Metadata Integration. ACM TODS 30(2), 624–660 (2005)
Wyss, C.M., Edward, L.: A Formal Characterization of PIVOT / UNPIVOT. In: Proc. ACM CIKM, Bremen, Germany (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fletcher, G.H.L., Wyss, C.M. (2006). Data Mapping as Search. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_9
Download citation
DOI: https://doi.org/10.1007/11687238_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32960-2
Online ISBN: 978-3-540-32961-9
eBook Packages: Computer ScienceComputer Science (R0)