Abstract
The emergence of the deep Web has given a new connotation to the concept of ranking database query results. Earlier approaches for ranking either resorted to analyzing frequencies of database values and query logs or establishing user profiles. In contrast, an integrated approach, based on the notion of a similarity model, for holistically supporting user- and query-dependent ranking has been recently proposed (Telang et al. in IEEE Transactions on Knowledge and Data Engineering (TKDE), 2011). An important component of this framework is a workload consisting of ranking functions, wherein each function represents an individual user’s preferences towards the results of a specific query. At the time of answering a query for which no prior ranking function exists, the similarity model is employed, and is expected to ensure a good quality of ranking as long as a ranking function for a very similar user-query pair exists in this workload.
In this paper, we address the problem of determining an appropriate set of user-query pairs to form a workload of ranking functions to support user- and query-dependent ranking for Web databases. We propose a novel metric, termed workload goodness, that quantifies the notion of a “good” workload into an absolute value. The process of finding such a workload of optimal goodness is a combinatorially explosive problem; therefore, we propose a heuristic solution, and advance three approaches for determining an acceptable workload, in a static as well as a dynamic environment. We discuss the effectiveness of our proposal analytically as well as experimentally over two Web databases.









Similar content being viewed by others
Notes
The concept of workload here is significantly different from the one in traditional databases. In the Former’s case, the workload is a collection of ranking functions along with the user-query pairs for whom the functions are derived; in contrast, it pertains to a log of queries in the latter’s context.
A ranking function is obtained via a learning model, proposed in [30], that analyzes a user’s preferences towards the results of a query.
The functional details of the similarity-based ranking framework are elaborated in Sect. 2.
Given that we focus on establishing only W K , we use the term workload and W K interchangeably for the rest of the paper.
Typically, the number of users and queries on most real Web databases like Yahoo! Autos, Google Base, etc. are extremely large, whereas the value of K is typically much smaller.
The value ‘any’ will match all possible values for the domain of the particular attribute. For example, a value of ‘any’ for the Transmission attribute in a Vehicle database retrieves cars with ‘manual’ as well as ‘auto’ transmission.
Without loss of generality, we assume {Q 1,Q 2,…,Q r } are the common queries for U and U′, although they can be any queries.
References
Agrawal, R., Rantzau, R., Terzi, E.: Context-sensitive ranking. In: SIGMOD Conference, pp. 383–394. ACM, New York (2006)
Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. In: Conference on Innovations in Database Research (CIDR) (2003)
Balabanovic, M., Shoham, Y.: Content-based collaborative recommendation. ACM Commun. 40(3), 66–72 (1997)
Basilico, J., Hofmann, T.: A joint framework for collaborative and content filtering. In: SIGIR, pp. 550–551 (2004)
Basu, C., Hirsh, H., Cohen, W.W.: Recommendation as classification: using social and content-based information in recommendation. In: AAAI/IAAI, pp. 714–720 (1998)
Bergman, M.K.: The deep web: surfacing hidden value. J. Electron. Publ. 7(1) (2001)
Billsus, D., Pazzani, M.J.: Learning collaborative information filters. In: International Conference on Machine Learning (ICML), pp. 46–54 (1998)
Blum, M., Floyd, R.W., Pratt, V., Rivest, R.L., Tarjan, R.E.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)
Chang, K.C.-C., He, B., Li, C., Patil, M., Zhang, Z.: Structured databases on the web: observations and implications. SIGMOD Rec. 33(3), 61–70 (2004)
Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic ranking of database query results. In: VLDB, pp. 888–899 (2004)
Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic information retrieval approach for ranking of database query results. TODS 31(3), 1134–1168 (2006)
Foltz, P.W., Dumais, S.T.: Personalized information delivery: an analysis of information filtering methods. ACM Commun. 35(12), 51–60 (1992)
Gauch, S., Speretta, M., Chandramouli, A., Micarelli, A.: User profiles for personalized information access. In: The Adaptive Web, pp. 54–89 (2007)
Google. Google base. http://www.google.com/base
Hofmann, T.: Collaborative filtering via gaussian probabilistic latent semantic analysis. In: SIGIR, pp. 259–266 (2003)
Hwang, S.-W.: Supporting ranking for data retrieval. Ph.D. thesis, University of Illinois, Urbana Champaign (2005)
Ilyas, I.F., Soliman, M.A.: Probabilistic Ranking Techniques in Relational Databases. Synthesis Lectures on Data Management (2011). Morgan & Claypool Publishers
Kanungo, T., Mount, D.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Koutrika, G.: Database query personalization. In: EDBT, pp. 147–152 (2005)
Koutrika, G., Ioannidis, Y.E.: Personalization of queries in database systems. In: ICDE, pp. 597–608 (2004)
Koutrika, G., Ioannidis, Y.E.: Constrained optimalities in query personalization. In: SIGMOD Conference, pp. 73–84 (2005)
Li, C., Chang, K.C.-C., Ilyas, I.F., Song, S.: Ranksql: query algebra and optimization for relational top-k queries. In: SIGMOD Conference, pp. 131–142 (2005)
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)
Ortega-Binderberger, M., Chakrabarti, K., Mehrotra, S.: An approach to integrating query refinement in sql. In: EDBT, pp. 15–33 (2002)
Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: SIGIR, pp. 253–260 (2002)
Soliman, M.A., Ilyas, I.F., Ben-David, S.: Supporting ranking queries on uncertain and incomplete data. VLDB J. 19(4), 477–501 (2010)
Soliman, M.A., Ilyas, I.F., Martinenghi, D., Tagliasacchi, M.: Ranking with uncertain scoring functions: semantics and sensitivity measures. In: SIGMOD Conference, pp. 805–816 (2011)
Su, W., Wang, J., Huang, Q., Lochovsky, F.: Query result ranking over e-commerce web databases. In: Conference on Information and Knowledge Management (CIKM), pp. 575–584 (2006)
Telang, A., Li, C., Chakravarthy, S.: One size does not fit all: towards user- and query-dependent ranking for web databases. Technical report 6, University of Texas at Arlington (2009)
Telang, A., Li, C., Chakravarthy, S.: One size does not fit all: towards user- and query-dependent ranking for web databases. IEEE Transactions on Knowledge and Data Engineering (TKDE) (2011)
Werner, K.: Foundations of preferences in database systems. In: VLDB. VLDB Endowment, pp. 311–322 (2002)
Yu, H., Hwang, S.-w., Chang, K.C.-C.: Enabling soft queries for data retrieval. Inf. Syst. 32(4), 560–574 (2007)
Yu, H., Kim, Y., won Hwang, S.: Rv-svm: an efficient method for learning ranking svm. In: PAKDD, pp. 426–438 (2009)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Kaushik Chakrabarti.
Rights and permissions
About this article
Cite this article
Telang, A., Chakravarthy, S. & Li, C. Personalized ranking in web databases: establishing and utilizing an appropriate workload. Distrib Parallel Databases 31, 47–70 (2013). https://doi.org/10.1007/s10619-012-7106-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-012-7106-2