Abstract
The Internet has instigated a critical need for automated tools that facilitate integrating countless databases. Since non-technical end users are often the ultimate repositories of the domain information required to distinguish differences in data types, we suppose an effective solution must integrate simple GUI based data browsing tools and automatic mapping methods that eliminate technical users from the solution. We develop a meta-model of data integration as the basis for absorbing feedback from an end-user. The schema integration algorithm draws examples from the data and learns integrating view definitions by asking a user simple yes or no questions. The meta-model enables a search mechanism that is guaranteed to converge to a correct integrating view definition without the user having to know a view definition language such as SQL or even having to inspect the final view definition. We show how data catalog statistics, normally used to optimize queries, can be exploited to parameterize the search heuristics and improve the convergence of the learning algorithm.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arens, Y., Knoblock, C., Shen, W.: Query Reformulation for Dynamic Information Integration. JIIS 6(2/3), 99–130 (1996)
Barbançon, F., Miranker, D.: Active Learning of Schema Integration Queries. The University of Texas at Austin, Dept. of Computer Sciences Tech. Report CS-TR-04-23 (submitted for publication) (2004)
Chen, Y., Benn, W.: Building DD to Support Query Processing in Federated Systems. In: KRDB, pp.5.1–5.10(1997)
Cohn, D., Atlas, L., Ladner, R.: Improving Generalization with Active Learning. Machine Learning 15(2), 201–221 (1994)
Dagan, I., Engelson, S.: Committee-Based Sampling for Training Probabilistic Classifiers. In: ICML 1995, pp. 150–157 (1995)
Gasarch, W., Smith, C.: Recursion Theoric Models of Learning: Some Results and Intuitions. Annals of Mathematics and Artificial Intelligence 15, 151–166 (1995)
Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., Widom, J.: The TSIMMIS Approach to Mediation: Data Models and Languages. JIIS 8(2), 117–132 (1997)
Haas, L., Kossman, D., Wimmers, E., Yang, J.: Optimizing Queries Across Diverse Data Sources. In: VLDB 1997, pp. 276–285 (1997)
Johannesson, P.: Using Conceptual Graph Theory to Support Schema Integration. In: ER 1993, pp. 283–296 (1993)
Kent, W.: Profile Functions and Bag Theory. Hewlett-Packard Company. Technology Department, Hewlett-Packard Laboratories. Palo Alto, California (1992)
Kent, W.: Solving Domain Mismatch and Schema Mismatch Problems with an Object- Oriented Database Programming Language. In: VLDB 1991, pp. 147–160 (1991)
Krishnamurthy, R., Litwin, W., Kent, W.: Language Features for Interoperability of Databases with Schematic Discrepancies. In: SIGMOD Conference 1991, pp. 40–49 (1991)
Lakshmanan, L., Sadri, F., Subramanian, I.: SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems. In: VLDB 1996, pp. 239–250 (1996)
Levy, A., Rajaraman, A., Ordille, J.: Querying Heterogeneous Information Sources Using Source Descriptions. In: VLDB 1996, pp. 251–262 (1996)
Lewis, D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: ICML 1994, pp. 148–156 (1994)
Liere, R., Tadepalli, P.: Active Learning with Committees for Text Categorization. In: AAAI 1997, pp. 591–596 (1997)
Milo, T., Zohar, S.: Using Schema Matching to Simplify Heterogeneous Data Translation. In: VLDB 1998, pp. 122–133 (1998)
Miranker, D., Taylor, M., Padmanaban, A.: A Tractable Query Cache by Approximation. In: SARA 2002, pp. 140–151 (2002)
Mitchell, T.: Version Spaces: A Candidate Elimination Approach to Rule Learning. In: IJCAI 1977, pp. 305–310 (1977)
Mitra, P., Wiederhold, G., Kersten, M.: A Graph-Oriented Model for Articulation of Ontology Interdependencies. In: EDBT 2000, pp. 86–100 (2000)
Rahm, E., Bernstein, P.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Spaccapietra, S., Parent, C., Dupont, Y.: Model Independent Assertions for Integration of Heterogeneous Schemas. VLDB Journal 1(1), 81–126 (1992)
Stephan, F.: Learning via Queries and Oracles. In: Stephan, F. (ed.) COLT 1995, pp. 162–169 (1995)
Thompson, C., Califf, M., Mooney, R.: Active Learning for Natural Language Parsing and Information Extraction. In: ICML 1999, pp. 406–414 (1999)
Tomasic, A., Raschid, L., Valduriez, P.: Scaling Heterogeneous Databases and the Design of Disco. In: ICDCS 1996, pp. 449–457 (1996)
Vassalos, V., Papakonstantinou, Y.: Describing and Using Query Capabilities of Heterogeneous Sources. In: VLDB 1997, pp. 256–265 (1997)
XML Query: http://www.w3.org/TR/Xquery
Yan, L., Özsu, M., Liu, L.: Accessing Heterogeneous Data Through Homogenization and Integration Mediators. In: CoopIS 1997, pp. 130–139 (1997)
Yan, L., Miller, R., Haas, L., Fagin, R.: Data Driven Understanding and Refinement of Schema Mappings. In: SIGMOD Conference (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barbançon, F., Miranker, D.P. (2004). Interactive Schema Integration with Sphinx. In: Christiansen, H., Hacid, MS., Andreasen, T., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2004. Lecture Notes in Computer Science(), vol 3055. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25957-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-25957-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22160-9
Online ISBN: 978-3-540-25957-2
eBook Packages: Springer Book Archive