Abstract
Modern recommendation systems leverage some forms of collaborative user or crowd sourced collection of information. For instance, services like TripAdvisor, Airbnb and HungyGoWhere rely on user-generated content to describe and classify hotels, vacation rentals and restaurants. By nature of such independent collection of information, the multiplicity, diversity and varying quality of the information collected result in uncertainty. Objects, such as the services offered by hotels, vacation rentals and restaurants, have uncertain scores for their various features.
In this context, ranking of uncertain data becomes a crucial issue. Several data models for uncertain data and several semantics for probabilistic top-k queries have been proposed in the literature. We consider here a model of objects with uncertain scores given as probability distributions and the semantics proposed by the state of the art reference work of Soliman, Hyas and Ben-David.
In this paper, we explore the design space of Metropolis-Hastings Markov chain Monte Carlo algorithms for answering probabilistic top-k queries over a database of objects with uncertain scores. We are able to devise several algorithms that yield better performance than the reference algorithm. We empirically and comparatively prove the effectiveness and efficiency of these new algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amarilli, A., Amsterdamer, Y., Milo, T.: Uncertainty in crowd data sourcing under structural constraints. In: Han, W.-S., Lee, M.L., Muliantara, A., Sanjaya, N.A., Thalheim, B., Zhou, S. (eds.) DASFAA 2014. LNCS, vol. 8505, pp. 351–359. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43984-5_27
Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE, pp. 305–316 (2009)
Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In ICDT, pp. 225–236 (2013)
Ge, T., Zdonik, S., Madden, S.: Top-k queries on uncertain data: on score distribution and typical answers. In: SIGMOD, pp. 375–388. ACM (2009)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD, pp. 673–686. ACM (2008)
Jestes, J., Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data. TKDE 23(12), 1903–1917 (2011)
Li, J., Deshpande, A.: Ranking continuous probabilistic datasets. VLDB 3(1–2), 638–649 (2010)
Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. VLDB 2(1), 502–513 (2009)
Newman, M.E., Barkema, G.T., Newman, M.: Monte Carlo Methods in Statistical Physics, vol. 13. Clarendon Press, Oxford (1999)
O’Leary, D.P.: Multidimensional integration: partition and conquer. Comput. Sci. Eng. 6(6), 58–66 (2004)
Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)
Soliman, M.A., Ilyas, I.F.: Ranking with uncertain scores. In: ICDE, pp. 317–328 (2009)
Soliman, M.A., Ilyas, I.F., Ben-David, S.: Supporting ranking queries on uncertain and incomplete data. VLDB J. 19(4), 477–501 (2010)
Soliman, M.A., Ilyas, I.F., Chang, KC.-C.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)
Wang, C., Yuan, L.Y., You, J.-H., Zaiane, O.R., Pei, J.: On pruning for top-k ranking in uncertain databases. VLDB 4(10), 598–609 (2011)
Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases with x-relations. TKDE 20(12), 1669–1682 (2008)
Zhang, X., Li, G., Feng, J.: Crowdsourced top-k algorithms: an experimental evaluation. VLDB 9(8), 612–623 (2016)
Acknowledgement
This research is funded by research grant R-252-000-622-114 by Singapore Ministry of Education Academic Research Fund (project 251RES1607- “Janus: Effective, Efficient and Fair Algorithms for Spatio-temporal Crowdsourcing”) and is a collaboration between the National University of Singapore and Télécom ParisTech.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Liu, Q., Basu, D., Abdessalem, T., Bressan, S. (2016). Top-k Queries Over Uncertain Scores. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-48472-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48471-6
Online ISBN: 978-3-319-48472-3
eBook Packages: Computer ScienceComputer Science (R0)