Skip to main content

Interactive Schema Integration with Sphinx

  • Conference paper
Flexible Query Answering Systems (FQAS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3055))

Included in the following conference series:

Abstract

The Internet has instigated a critical need for automated tools that facilitate integrating countless databases. Since non-technical end users are often the ultimate repositories of the domain information required to distinguish differences in data types, we suppose an effective solution must integrate simple GUI based data browsing tools and automatic mapping methods that eliminate technical users from the solution. We develop a meta-model of data integration as the basis for absorbing feedback from an end-user. The schema integration algorithm draws examples from the data and learns integrating view definitions by asking a user simple yes or no questions. The meta-model enables a search mechanism that is guaranteed to converge to a correct integrating view definition without the user having to know a view definition language such as SQL or even having to inspect the final view definition. We show how data catalog statistics, normally used to optimize queries, can be exploited to parameterize the search heuristics and improve the convergence of the learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Arens, Y., Knoblock, C., Shen, W.: Query Reformulation for Dynamic Information Integration. JIIS 6(2/3), 99–130 (1996)

    Google Scholar 

  2. Barbançon, F., Miranker, D.: Active Learning of Schema Integration Queries. The University of Texas at Austin, Dept. of Computer Sciences Tech. Report CS-TR-04-23 (submitted for publication) (2004)

    Google Scholar 

  3. Chen, Y., Benn, W.: Building DD to Support Query Processing in Federated Systems. In: KRDB, pp.5.1–5.10(1997)

    Google Scholar 

  4. Cohn, D., Atlas, L., Ladner, R.: Improving Generalization with Active Learning. Machine Learning 15(2), 201–221 (1994)

    Google Scholar 

  5. Dagan, I., Engelson, S.: Committee-Based Sampling for Training Probabilistic Classifiers. In: ICML 1995, pp. 150–157 (1995)

    Google Scholar 

  6. Gasarch, W., Smith, C.: Recursion Theoric Models of Learning: Some Results and Intuitions. Annals of Mathematics and Artificial Intelligence 15, 151–166 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  7. Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., Widom, J.: The TSIMMIS Approach to Mediation: Data Models and Languages. JIIS 8(2), 117–132 (1997)

    Google Scholar 

  8. Haas, L., Kossman, D., Wimmers, E., Yang, J.: Optimizing Queries Across Diverse Data Sources. In: VLDB 1997, pp. 276–285 (1997)

    Google Scholar 

  9. Johannesson, P.: Using Conceptual Graph Theory to Support Schema Integration. In: ER 1993, pp. 283–296 (1993)

    Google Scholar 

  10. Kent, W.: Profile Functions and Bag Theory. Hewlett-Packard Company. Technology Department, Hewlett-Packard Laboratories. Palo Alto, California (1992)

    Google Scholar 

  11. Kent, W.: Solving Domain Mismatch and Schema Mismatch Problems with an Object- Oriented Database Programming Language. In: VLDB 1991, pp. 147–160 (1991)

    Google Scholar 

  12. Krishnamurthy, R., Litwin, W., Kent, W.: Language Features for Interoperability of Databases with Schematic Discrepancies. In: SIGMOD Conference 1991, pp. 40–49 (1991)

    Google Scholar 

  13. Lakshmanan, L., Sadri, F., Subramanian, I.: SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems. In: VLDB 1996, pp. 239–250 (1996)

    Google Scholar 

  14. Levy, A., Rajaraman, A., Ordille, J.: Querying Heterogeneous Information Sources Using Source Descriptions. In: VLDB 1996, pp. 251–262 (1996)

    Google Scholar 

  15. Lewis, D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: ICML 1994, pp. 148–156 (1994)

    Google Scholar 

  16. Liere, R., Tadepalli, P.: Active Learning with Committees for Text Categorization. In: AAAI 1997, pp. 591–596 (1997)

    Google Scholar 

  17. Milo, T., Zohar, S.: Using Schema Matching to Simplify Heterogeneous Data Translation. In: VLDB 1998, pp. 122–133 (1998)

    Google Scholar 

  18. Miranker, D., Taylor, M., Padmanaban, A.: A Tractable Query Cache by Approximation. In: SARA 2002, pp. 140–151 (2002)

    Google Scholar 

  19. Mitchell, T.: Version Spaces: A Candidate Elimination Approach to Rule Learning. In: IJCAI 1977, pp. 305–310 (1977)

    Google Scholar 

  20. Mitra, P., Wiederhold, G., Kersten, M.: A Graph-Oriented Model for Articulation of Ontology Interdependencies. In: EDBT 2000, pp. 86–100 (2000)

    Google Scholar 

  21. Rahm, E., Bernstein, P.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  22. Spaccapietra, S., Parent, C., Dupont, Y.: Model Independent Assertions for Integration of Heterogeneous Schemas. VLDB Journal 1(1), 81–126 (1992)

    Article  Google Scholar 

  23. Stephan, F.: Learning via Queries and Oracles. In: Stephan, F. (ed.) COLT 1995, pp. 162–169 (1995)

    Google Scholar 

  24. Thompson, C., Califf, M., Mooney, R.: Active Learning for Natural Language Parsing and Information Extraction. In: ICML 1999, pp. 406–414 (1999)

    Google Scholar 

  25. Tomasic, A., Raschid, L., Valduriez, P.: Scaling Heterogeneous Databases and the Design of Disco. In: ICDCS 1996, pp. 449–457 (1996)

    Google Scholar 

  26. Vassalos, V., Papakonstantinou, Y.: Describing and Using Query Capabilities of Heterogeneous Sources. In: VLDB 1997, pp. 256–265 (1997)

    Google Scholar 

  27. XML Query: http://www.w3.org/TR/Xquery

  28. Yan, L., Özsu, M., Liu, L.: Accessing Heterogeneous Data Through Homogenization and Integration Mediators. In: CoopIS 1997, pp. 130–139 (1997)

    Google Scholar 

  29. Yan, L., Miller, R., Haas, L., Fagin, R.: Data Driven Understanding and Refinement of Schema Mappings. In: SIGMOD Conference (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Barbançon, F., Miranker, D.P. (2004). Interactive Schema Integration with Sphinx. In: Christiansen, H., Hacid, MS., Andreasen, T., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2004. Lecture Notes in Computer Science(), vol 3055. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25957-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25957-2_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22160-9

  • Online ISBN: 978-3-540-25957-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics