Abstract
Data integration systems typically make use of mappings to capture the relationships between the data resources to be integrated and the integrated representations presented to users. Manual development and maintenance of such mappings is time consuming and thus costly. Pay-as-you-go approaches to data integration support automatic construction of initial mappings, which are generally of rather poor quality, for refinement in the light of user feedback. However, automatic approaches that produce these mappings typically lead to the generation of multiple, overlapping candidate mappings. To present the most relevant set of results to user queries, the mappings have to be ranked. We proposed a ranking technique that uses information from query logs to discriminate among candidate mappings. The technique is evaluated in terms of how quickly stable rankings can be produced, and to investigate how the rankings track query patterns that are skewed towards specific sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Evfimievski, A., Kiernan, J., Velu, R.: Auditing disclosure by relevance ranking. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 79–90. ACM (2007)
Agrawal, S., Chaudhuri, S.: DBXplorer: A system for keyword-based search over relational databases. In: Data Engineering, 2002, pp. 5–16 (2002)
Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based annotation, selection and refinement of schema mappings for dataspaces. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 573–584. ACM, New York (2010)
Bhalotia, G., Hulgeri, A., Nakhe, C.: Keyword searching and browsing in databases using BANKS. In: Data Engineering (2002)
Cao, H., Qi, Y., Selçuk Candan, K., Sapino, M.L.: Feedback-driven result ranking and query refinement for exploring semi-structured data collections. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 3–14. ACM, New York (2010)
Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic information retrieval approach for ranking of database query results. ACM Trans. Database Syst. 31(3), 1134–1168 (2006)
Demeter, J., et al.: The stanford microarray database: implementation of new analysis tools and open source release of software. Nucleic Acids Research 35(Database-Issue), 766–770 (2007)
Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic Acids Research 30(1), 207–210 (2002)
Elmeleegy, H., Elmagarmid, A., Lee, J.: Leveraging query logs for schema mapping generation in U-MAP. In: Proceedings of the 2011 International Conference on Management of Data, SIGMOD 2011, pp. 121–132. ACM, New York (2011)
Elmeleegy, H., Ouzzani, M., Elmagarmid, A.: Usage-Based Schema Matching. In: International Conference on Data Engineering, pp. 20–29 (2008)
Engel, S.R., et al.: Saccharomyces genome database provides mutant phenotype data. Nucleic Acids Research 38(Database-Issue), 433–436 (2010)
Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec. 34(4), 27–33 (2005)
Gospodnetic, O., Hatcher, E.: Lucene in Action (In Action series). Manning Publications (December 2004)
Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of Dataspaces. In: Sexton, A.P. (ed.) BNCOD 2009. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)
Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: A semi-automatic tool for schema mapping. In: SIGMOD Conference, p. 607 (2001)
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 850–861. VLDB Endowment (2003)
Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 670–681. VLDB Endowment (2002)
Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 847–860. ACM, New York (2008)
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 154–161. ACM, New York (2005)
Kelly, D., Belkin, N.J.: Display time as implicit feedback: understanding task effects. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 377–384. ACM (2004)
Kelly, D., Teevan, J.: Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37(2), 18–28 (2003)
Lange, M., Spies, K., Bargsten, J., Haberhauer, G., Klapperstück, M., Leps, M., Weinel, C., Wünschiers, R., Weissbach, M., Stein, J., Scholz, U.: The LAILAPS search engine: relevance ranking in life science databases. J. Integr. Bioinform. 7(2), 110 (2010)
Lu, Z., Kim, W., John Wilbur, W.: Evaluating relevance ranking strategies for MEDLINE retrieval. Journal of the American Medical Informatics Association: JAMIA 16(1), 32–36 (2009)
Madhavan, J., Jeffery, S.F., Cohen, S., Dong, X., Ko, D., Yu, C., Halevy, A.: Web-scale data integration: You can only afford to pay as you go. In: Proceedings of CIDR, pp. 342–350 (2007)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Bullock, B.N., Jäschke, R., Hotho, A.: Tagging data as implicit feedback for learning-to-rank. In: Proceedings of the ACM WebSci 2011, Koblenz, Germany, June 14-17, pp. 1–4 (2011)
Oard, D.W., Kim, J.: Modeling information content using observable behavior. Science, 481–488 (2001)
Parkinson, H.E., et al.: Arrayexpress update - an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Research 39(Database-Issue), 1002–1004 (2011)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. The VLDB Journal 10, 334–350 (2001)
Salton, G., Waldstein, R.K.: Term relevance weights in on-line information retrieval. Information Processing and Management 14(1), 29–35 (1978)
Schlieder, T., Meuss, H.: Querying and ranking xml documents. Journal of the American Society for Information Science and Technology 53(6), 489–503 (2002)
Sugiyama, K., Hatano, K., Yoshikawa, M., Uemura, S.: Refinement of tf-idf schemes for web pages using their hyperlinked neighboring pages. In: Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia, HYPERTEXT 2003, pp. 198–207. ACM, New York (2003)
Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to create data-integrating queries. Proc. VLDB Endow. 1(1), 785–796 (2008)
Xu, L., Embley, D.W.: A composite approach to automating direct and indirect schema mappings. Inf. Syst. 31(8), 697–732 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maskat, R., Paton, N.W., Embury, S.M. (2012). Pay-as-You-Go Ranking of Schema Mappings Using Query Logs. In: Bodenreider, O., Rance, B. (eds) Data Integration in the Life Sciences. DILS 2012. Lecture Notes in Computer Science(), vol 7348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31040-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-31040-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31039-3
Online ISBN: 978-3-642-31040-9
eBook Packages: Computer ScienceComputer Science (R0)