Pay-as-You-Go Ranking of Schema Mappings Using Query Logs

Maskat, Ruhaila; Paton, Norman W.; Embury, Suzanne M.

doi:10.1007/978-3-642-31040-9_4

Ruhaila Maskat²⁰,
Norman W. Paton²⁰ &
Suzanne M. Embury²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7348))

Included in the following conference series:

International Conference on Data Integration in the Life Sciences

584 Accesses
1 Citations
3 Altmetric

Abstract

Data integration systems typically make use of mappings to capture the relationships between the data resources to be integrated and the integrated representations presented to users. Manual development and maintenance of such mappings is time consuming and thus costly. Pay-as-you-go approaches to data integration support automatic construction of initial mappings, which are generally of rather poor quality, for refinement in the light of user feedback. However, automatic approaches that produce these mappings typically lead to the generation of multiple, overlapping candidate mappings. To present the most relevant set of results to user queries, the mappings have to be ranked. We proposed a ranking technique that uses information from query logs to discriminate among candidate mappings. The technique is evaluated in terms of how quickly stable rankings can be produced, and to investigate how the rankings track query patterns that are skewed towards specific sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Evfimievski, A., Kiernan, J., Velu, R.: Auditing disclosure by relevance ranking. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 79–90. ACM (2007)
Google Scholar
Agrawal, S., Chaudhuri, S.: DBXplorer: A system for keyword-based search over relational databases. In: Data Engineering, 2002, pp. 5–16 (2002)
Google Scholar
Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based annotation, selection and refinement of schema mappings for dataspaces. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 573–584. ACM, New York (2010)
Chapter Google Scholar
Bhalotia, G., Hulgeri, A., Nakhe, C.: Keyword searching and browsing in databases using BANKS. In: Data Engineering (2002)
Google Scholar
Cao, H., Qi, Y., Selçuk Candan, K., Sapino, M.L.: Feedback-driven result ranking and query refinement for exploring semi-structured data collections. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 3–14. ACM, New York (2010)
Chapter Google Scholar
Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic information retrieval approach for ranking of database query results. ACM Trans. Database Syst. 31(3), 1134–1168 (2006)
Article Google Scholar
Demeter, J., et al.: The stanford microarray database: implementation of new analysis tools and open source release of software. Nucleic Acids Research 35(Database-Issue), 766–770 (2007)
Article Google Scholar
Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic Acids Research 30(1), 207–210 (2002)
Article Google Scholar
Elmeleegy, H., Elmagarmid, A., Lee, J.: Leveraging query logs for schema mapping generation in U-MAP. In: Proceedings of the 2011 International Conference on Management of Data, SIGMOD 2011, pp. 121–132. ACM, New York (2011)
Google Scholar
Elmeleegy, H., Ouzzani, M., Elmagarmid, A.: Usage-Based Schema Matching. In: International Conference on Data Engineering, pp. 20–29 (2008)
Google Scholar
Engel, S.R., et al.: Saccharomyces genome database provides mutant phenotype data. Nucleic Acids Research 38(Database-Issue), 433–436 (2010)
Article Google Scholar
Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec. 34(4), 27–33 (2005)
Article Google Scholar
Gospodnetic, O., Hatcher, E.: Lucene in Action (In Action series). Manning Publications (December 2004)
Google Scholar
Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of Dataspaces. In: Sexton, A.P. (ed.) BNCOD 2009. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)
Chapter Google Scholar
Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: A semi-automatic tool for schema mapping. In: SIGMOD Conference, p. 607 (2001)
Google Scholar
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 850–861. VLDB Endowment (2003)
Google Scholar
Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 670–681. VLDB Endowment (2002)
Google Scholar
Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 847–860. ACM, New York (2008)
Chapter Google Scholar
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 154–161. ACM, New York (2005)
Chapter Google Scholar
Kelly, D., Belkin, N.J.: Display time as implicit feedback: understanding task effects. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 377–384. ACM (2004)
Google Scholar
Kelly, D., Teevan, J.: Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37(2), 18–28 (2003)
Article Google Scholar
Lange, M., Spies, K., Bargsten, J., Haberhauer, G., Klapperstück, M., Leps, M., Weinel, C., Wünschiers, R., Weissbach, M., Stein, J., Scholz, U.: The LAILAPS search engine: relevance ranking in life science databases. J. Integr. Bioinform. 7(2), 110 (2010)
Google Scholar
Lu, Z., Kim, W., John Wilbur, W.: Evaluating relevance ranking strategies for MEDLINE retrieval. Journal of the American Medical Informatics Association: JAMIA 16(1), 32–36 (2009)
Article Google Scholar
Madhavan, J., Jeffery, S.F., Cohen, S., Dong, X., Ko, D., Yu, C., Halevy, A.: Web-scale data integration: You can only afford to pay as you go. In: Proceedings of CIDR, pp. 342–350 (2007)
Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Bullock, B.N., Jäschke, R., Hotho, A.: Tagging data as implicit feedback for learning-to-rank. In: Proceedings of the ACM WebSci 2011, Koblenz, Germany, June 14-17, pp. 1–4 (2011)
Google Scholar
Oard, D.W., Kim, J.: Modeling information content using observable behavior. Science, 481–488 (2001)
Google Scholar
Parkinson, H.E., et al.: Arrayexpress update - an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Research 39(Database-Issue), 1002–1004 (2011)
Article Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. The VLDB Journal 10, 334–350 (2001)
Article MATH Google Scholar
Salton, G., Waldstein, R.K.: Term relevance weights in on-line information retrieval. Information Processing and Management 14(1), 29–35 (1978)
Article Google Scholar
Schlieder, T., Meuss, H.: Querying and ranking xml documents. Journal of the American Society for Information Science and Technology 53(6), 489–503 (2002)
Article Google Scholar
Sugiyama, K., Hatano, K., Yoshikawa, M., Uemura, S.: Refinement of tf-idf schemes for web pages using their hyperlinked neighboring pages. In: Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia, HYPERTEXT 2003, pp. 198–207. ACM, New York (2003)
Chapter Google Scholar
Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to create data-integrating queries. Proc. VLDB Endow. 1(1), 785–796 (2008)
Google Scholar
Xu, L., Embley, D.W.: A composite approach to automating direct and indirect schema mappings. Inf. Syst. 31(8), 697–732 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Manchester, Manchester, M13 9PL, United Kingdom
Ruhaila Maskat, Norman W. Paton & Suzanne M. Embury

Authors

Ruhaila Maskat
View author publications
You can also search for this author in PubMed Google Scholar
Norman W. Paton
View author publications
You can also search for this author in PubMed Google Scholar
Suzanne M. Embury
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, 20894, Bethesda, MD, USA
Olivier Bodenreider & Bastien Rance &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maskat, R., Paton, N.W., Embury, S.M. (2012). Pay-as-You-Go Ranking of Schema Mappings Using Query Logs. In: Bodenreider, O., Rance, B. (eds) Data Integration in the Life Sciences. DILS 2012. Lecture Notes in Computer Science(), vol 7348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31040-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-31040-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31039-3
Online ISBN: 978-3-642-31040-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics