Abstract
Crowdsourced query processing is an emerging technique that tackles computationally challenging problems by human intelligence. The basic idea is to decompose a computationally challenging problem into a set of human-friendly microtasks (e.g., pairwise comparisons) that are distributed to and answered by the crowd. The solution of the problem is then computed (e.g., by aggregation) based on the crowdsourced answers to the microtasks. In this work, we attempt to revisit the crowdsourced processing of the top-k queries, aiming at (1) securing the quality of crowdsourced comparisons by a certain confidence level and (2) minimizing the total monetary cost. To secure the quality of each paired comparison, we employ statistical tools to estimate the confidence interval from the collected judgments of the crowd, which is then used to guide the aggregated judgment. We propose novel frameworks, SPR and SPR\(^+\), to address the crowdsourced top-k queries. Both SPR and SPR\(^+\) are budget-aware, confidence-aware, and effective in producing high-quality top-k results. SPR requires as input a budget for each paired comparison, whereas SPR\(^+\) requires only a total budget for the whole top-k task. Extensive experiments, conducted on four real datasets, demonstrate that our proposed methods outperform the other existing top-k processing techniques by a visible difference.



















Similar content being viewed by others
Notes
The confidence interval of an unobserved variable with confidence level \(1 - \alpha \) means that the variable falls into the interval with probability \(1 - \alpha \).
A small \(\varepsilon >0\) guarantees that the interval excludes 0.
Note that this adaptive budget allocation does not affect the unit price of one single judgment from the crowd, which is assumed to be fixed regardless the difficulty of the paired comparison.
References
Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD (2013)
Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I The method of paired comparisons. Biometrika 39(3/4), 324–345 (1952)
Busa-Fekete, R., Szörényi, B., Cheng, W., Weng, P., Hüllermeier, E.: Top-k selection based on adaptive sampling of noisy preferences. In: ICML (2013)
Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: WSDM (2013)
Chu, W., Ghahramani, Z.: Preference learning with Gaussian processes. In: ICML (2005)
Ciceri, E., Fraternali, P., Martinenghi, D., Tagliasacchi, M.: Crowdsourcing for top-k query processing over uncertain data. TKDE 28(1), 41–53 (2016)
Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: ICDT (2013)
Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Top-k and clustering with noisy comparisons. TODS 39(4), 35:1–35:39 (2014)
de Alfaro, L., Polychronopoulos, V., Polyzotis, N.: Efficient techniques for crowdsourced top-k lists. In: IJCAI (2017)
Diaconis, P., Graham, R.L.: Spearman’s footrule as a measure of disarray. JRSS: Series B pp. 262–268 (1977)
Dong, J., Yang, K., Shi, Y.: Ranking from crowdsourced pairwise comparisons via smoothed matrix manifold optimization. In: ICDM workshops (2017)
Dushkin, E., Milo, T.: Top-k sorting under partial order information. In: SIGMOD (2018)
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: WWW (2001)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
Ghosh, M., Mukhopadhyay, N., Sen, P.K.: Sequential Estimation, 1st edn. Wiley, Hoboken (2011)
Goldberg, K.Y., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: a constant time collaborative filtering algorithm. Inf. Retr. 4(2), 133–151 (2001)
Gottlieb, A., Hoehndorf, R., Dumontier, M., Altman, R.B.: Ranking adverse drug reactions with crowdsourcing. J. Med. Internet Res. 17(3), e80 (2015)
Guo, S., Parameswaran, A.G., Garcia-Molina, H.: So who won?. In: SIGMOD, Dynamic Max Discovery with the Crowd (2012)
Hoare, C.A.R.: Algorithm 65: Find. Commun. ACM 4(7), 321–322 (1961)
Hogg, R., Tanis, E., Zimmerman, D.: Probability and Statistical Inference, 9th edn. Pearson, London (2013)
Kemeny, J.G.: Mathematics without numbers. Daedalus 88(4), 577–591 (1959)
Khan, A.R., García-Molina, H.: Hybrid strategies for finding the max with the crowd. Tech. Rep. 1090, Stanford InfoLab, Stanford University (2014)
Kou, N.M., Li, Y., Wang, H., U, L.H., Gong, Z.: Crowdsourced top-k queries by confidence-aware pairwise judgments. In: SIGMOD (2017)
Lakshmivarahan, S., Dhall, S.K., Miller, L.L.: Parallel sorting algorithms. Adv. Comput. 23, 295–354 (1984)
Lee, J., Lee, D., Hwang, S.: CrowdK: answering top-k queries with crowdsourcing. Inf. Sci. 399, 98–120 (2017)
Li, K., Zhang, X., Li, G.: A rating-ranking method for crowdsourced top-k computation. In: SIGMOD (2018)
Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: a survey. TKDE 28(9), 2296–2319 (2016)
Li, Y., Kou, N.M., Wang, H., U, L.H., Gong, Z.: A confidence-aware top-k query processing toolkit on crowdsourcing. PVLDB 10(12), 1909–1912 (2017)
Lin, X., Xu, J., Hu, H., Fan, Z.: Reducing uncertainty of probabilistic top-k ranking via pairwise crowdsourcing. TKDE 29(10), 2290–2303 (2017)
Luce, R.D.: Individual Choice Behavior: A Theoretical Analysis. Wiley, Hoboken (1959)
Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)
Matsui, T., Baba, Y., Kamishima, T., Kashima, H.: Crowdordering. In: PAKDD (2014)
Mohajer, S., Suh, C., Elmahdy, A.: Active learning for top-k rank aggregation from noisy comparisons. In: ICML (2017)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Tech. Rep. 422, Stanford InfoLab, Stanford University (1999)
Polychronopoulos, V., de Alfaro, L., Davis, J., Garcia-Molina, H., Polyzotis, N.: Human-powered top-k lists. In: WebDB (2013)
Rajpal, S., Parameswaran, A.: Holistic crowd-powered sorting via AID: Optimizing for accuracies, inconsistencies, and difficulties. In: CIKM (2018)
Snyder, J.: Estimating the distribution of voter preferences using partially aggregated voting data. Polit. Methodol. 13(1), 2–5 (2005)
Stein, C.: A two-sample test for a linear hypothesis whose power is independent of the variance. Ann. Math. Stat. 16(3), 243–258 (1945)
Thurstone, L.L.: A law of comparative judgement. Psychol. Rev. 34, 273–286 (1927)
Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW (2012)
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Inf. Process. Manag. 36(5), 697–716 (2000)
Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD (2013)
Xu, Q., Xiong, J., Sun, X., Yang, Z., Cao, X., Huang, Q., Yao, Y.: A margin-based MLE for crowdsourced partial ranking. In: Multimedia (2018)
Ye, P., Doermann, D.: Combining preference and absolute judgements in a crowd-sourced setting. In: Machine Learning Meets Crowdsourcing (2013)
Zhang, X., Li, G., Feng, J.: Crowdsourced top-k algorithms: an experimental evaluation. PVLDB 9(8), 612–623 (2016)
Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: WWW (2005)
Acknowledgements
This work was supported by the National Key Research and Development Plan of China (No. 2019YFB2102100), Key-Area Research and Development Program of Guangdong Province (NO. 2020B010164003), the Science and Technology Development Fund, Macau SAR (File no. SKL-IOTSC-2018-2020 and 0015/2019/AKP), and University of Macau (File no. MYRG2019-00119-FST).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Wang, H., Kou, N.M. et al. Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control. The VLDB Journal 30, 189–213 (2021). https://doi.org/10.1007/s00778-020-00631-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-020-00631-8