Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control

Li, Yan; Wang, Hao; Kou, Ngai Meng; U, Leong Hou; Gong, Zhiguo

doi:10.1007/s00778-020-00631-8

Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control

Regular Paper
Published: 21 September 2020

Volume 30, pages 189–213, (2021)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Yan Li¹,
Hao Wang²,
Ngai Meng Kou³,
Leong Hou U¹ &
…
Zhiguo Gong¹

713 Accesses
Explore all metrics

Abstract

Crowdsourced query processing is an emerging technique that tackles computationally challenging problems by human intelligence. The basic idea is to decompose a computationally challenging problem into a set of human-friendly microtasks (e.g., pairwise comparisons) that are distributed to and answered by the crowd. The solution of the problem is then computed (e.g., by aggregation) based on the crowdsourced answers to the microtasks. In this work, we attempt to revisit the crowdsourced processing of the top-k queries, aiming at (1) securing the quality of crowdsourced comparisons by a certain confidence level and (2) minimizing the total monetary cost. To secure the quality of each paired comparison, we employ statistical tools to estimate the confidence interval from the collected judgments of the crowd, which is then used to guide the aggregated judgment. We propose novel frameworks, SPR and SPR$^+$, to address the crowdsourced top-k queries. Both SPR and SPR$^+$ are budget-aware, confidence-aware, and effective in producing high-quality top-k results. SPR requires as input a budget for each paired comparison, whereas SPR$^+$ requires only a total budget for the whole top-k task. Extensive experiments, conducted on four real datasets, demonstrate that our proposed methods outperform the other existing top-k processing techniques by a visible difference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stagewise learning for noisy k-ary preferences

Article 11 May 2018

Robust Plackett–Luce model for k-ary crowdsourced preferences

Article 25 October 2017

Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

https://translate.google.com/community.
https://www.duolingo.com/.
https://translate.twitter.com/.
https://www.whoscored.com/PlayerComparison.
https://www.topuniversities.com/university-rankings/world-university-rankings/2020.
The confidence interval of an unobserved variable with confidence level $1 - \alpha $ means that the variable falls into the interval with probability $1 - \alpha $.
A small $\varepsilon >0$ guarantees that the interval excludes 0.
Note that this adaptive budget allocation does not affect the unit price of one single judgment from the crowd, which is assumed to be fixed regardless the difficulty of the paired comparison.
https://github.com/yanl2031/Pairwise-Preference-Judgment-Datasets.
https://www.imdb.com/interfaces/.
https://www.figure-eight.com/.
http://www.shanghairanking.com/.
https://www.topuniversities.com/university-rankings/world-university-rankings/2020.
https://www.timeshighereducation.com/world-university-rankings.

References

Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD (2013)
Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I The method of paired comparisons. Biometrika 39(3/4), 324–345 (1952)
Article MathSciNet Google Scholar
Busa-Fekete, R., Szörényi, B., Cheng, W., Weng, P., Hüllermeier, E.: Top-k selection based on adaptive sampling of noisy preferences. In: ICML (2013)
Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: WSDM (2013)
Chu, W., Ghahramani, Z.: Preference learning with Gaussian processes. In: ICML (2005)
Ciceri, E., Fraternali, P., Martinenghi, D., Tagliasacchi, M.: Crowdsourcing for top-k query processing over uncertain data. TKDE 28(1), 41–53 (2016)
Google Scholar
Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: ICDT (2013)
Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Top-k and clustering with noisy comparisons. TODS 39(4), 35:1–35:39 (2014)
Article MathSciNet Google Scholar
de Alfaro, L., Polychronopoulos, V., Polyzotis, N.: Efficient techniques for crowdsourced top-k lists. In: IJCAI (2017)
Diaconis, P., Graham, R.L.: Spearman’s footrule as a measure of disarray. JRSS: Series B pp. 262–268 (1977)
Dong, J., Yang, K., Shi, Y.: Ranking from crowdsourced pairwise comparisons via smoothed matrix manifold optimization. In: ICDM workshops (2017)
Dushkin, E., Milo, T.: Top-k sorting under partial order information. In: SIGMOD (2018)
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: WWW (2001)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
Book Google Scholar
Ghosh, M., Mukhopadhyay, N., Sen, P.K.: Sequential Estimation, 1st edn. Wiley, Hoboken (2011)
MATH Google Scholar
Goldberg, K.Y., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: a constant time collaborative filtering algorithm. Inf. Retr. 4(2), 133–151 (2001)
Article Google Scholar
Gottlieb, A., Hoehndorf, R., Dumontier, M., Altman, R.B.: Ranking adverse drug reactions with crowdsourcing. J. Med. Internet Res. 17(3), e80 (2015)
Article Google Scholar
Guo, S., Parameswaran, A.G., Garcia-Molina, H.: So who won?. In: SIGMOD, Dynamic Max Discovery with the Crowd (2012)
Hoare, C.A.R.: Algorithm 65: Find. Commun. ACM 4(7), 321–322 (1961)
Google Scholar
Hogg, R., Tanis, E., Zimmerman, D.: Probability and Statistical Inference, 9th edn. Pearson, London (2013)
Google Scholar
Kemeny, J.G.: Mathematics without numbers. Daedalus 88(4), 577–591 (1959)
Google Scholar
Khan, A.R., García-Molina, H.: Hybrid strategies for finding the max with the crowd. Tech. Rep. 1090, Stanford InfoLab, Stanford University (2014)
Kou, N.M., Li, Y., Wang, H., U, L.H., Gong, Z.: Crowdsourced top-k queries by confidence-aware pairwise judgments. In: SIGMOD (2017)
Lakshmivarahan, S., Dhall, S.K., Miller, L.L.: Parallel sorting algorithms. Adv. Comput. 23, 295–354 (1984)
Article MathSciNet Google Scholar
Lee, J., Lee, D., Hwang, S.: CrowdK: answering top-k queries with crowdsourcing. Inf. Sci. 399, 98–120 (2017)
Article Google Scholar
Li, K., Zhang, X., Li, G.: A rating-ranking method for crowdsourced top-k computation. In: SIGMOD (2018)
Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: a survey. TKDE 28(9), 2296–2319 (2016)
Google Scholar
Li, Y., Kou, N.M., Wang, H., U, L.H., Gong, Z.: A confidence-aware top-k query processing toolkit on crowdsourcing. PVLDB 10(12), 1909–1912 (2017)
Google Scholar
Lin, X., Xu, J., Hu, H., Fan, Z.: Reducing uncertainty of probabilistic top-k ranking via pairwise crowdsourcing. TKDE 29(10), 2290–2303 (2017)
Google Scholar
Luce, R.D.: Individual Choice Behavior: A Theoretical Analysis. Wiley, Hoboken (1959)
MATH Google Scholar
Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)
Google Scholar
Matsui, T., Baba, Y., Kamishima, T., Kashima, H.: Crowdordering. In: PAKDD (2014)
Mohajer, S., Suh, C., Elmahdy, A.: Active learning for top-k rank aggregation from noisy comparisons. In: ICML (2017)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Tech. Rep. 422, Stanford InfoLab, Stanford University (1999)
Polychronopoulos, V., de Alfaro, L., Davis, J., Garcia-Molina, H., Polyzotis, N.: Human-powered top-k lists. In: WebDB (2013)
Rajpal, S., Parameswaran, A.: Holistic crowd-powered sorting via AID: Optimizing for accuracies, inconsistencies, and difficulties. In: CIKM (2018)
Snyder, J.: Estimating the distribution of voter preferences using partially aggregated voting data. Polit. Methodol. 13(1), 2–5 (2005)
MathSciNet Google Scholar
Stein, C.: A two-sample test for a linear hypothesis whose power is independent of the variance. Ann. Math. Stat. 16(3), 243–258 (1945)
Article MathSciNet Google Scholar
Thurstone, L.L.: A law of comparative judgement. Psychol. Rev. 34, 273–286 (1927)
Article Google Scholar
Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW (2012)
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Inf. Process. Manag. 36(5), 697–716 (2000)
Article Google Scholar
Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD (2013)
Xu, Q., Xiong, J., Sun, X., Yang, Z., Cao, X., Huang, Q., Yao, Y.: A margin-based MLE for crowdsourced partial ranking. In: Multimedia (2018)
Ye, P., Doermann, D.: Combining preference and absolute judgements in a crowd-sourced setting. In: Machine Learning Meets Crowdsourcing (2013)
Zhang, X., Li, G., Feng, J.: Crowdsourced top-k algorithms: an experimental evaluation. PVLDB 9(8), 612–623 (2016)
MathSciNet Google Scholar
Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: WWW (2005)

Download references

Acknowledgements

This work was supported by the National Key Research and Development Plan of China (No. 2019YFB2102100), Key-Area Research and Development Program of Guangdong Province (NO. 2020B010164003), the Science and Technology Development Fund, Macau SAR (File no. SKL-IOTSC-2018-2020 and 0015/2019/AKP), and University of Macau (File no. MYRG2019-00119-FST).

Author information

Authors and Affiliations

State Key Laboratory of Internet of Things for Smart City, Department of Computer and Information Science, University of Macau, Macao, China
Yan Li, Leong Hou U & Zhiguo Gong
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE
Hao Wang
Cainiao Smart Logistics Network Limited, Hangzhou, China
Ngai Meng Kou

Authors

Yan Li
View author publications
You can also search for this author inPubMed Google Scholar
Hao Wang
View author publications
You can also search for this author inPubMed Google Scholar
Ngai Meng Kou
View author publications
You can also search for this author inPubMed Google Scholar
Leong Hou U
View author publications
You can also search for this author inPubMed Google Scholar
Zhiguo Gong
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Hao Wang or Leong Hou U.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Wang, H., Kou, N.M. et al. Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control. The VLDB Journal 30, 189–213 (2021). https://doi.org/10.1007/s00778-020-00631-8

Download citation

Received: 15 November 2019
Revised: 05 June 2020
Accepted: 26 August 2020
Published: 21 September 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00778-020-00631-8

Keywords

Profiles

Leong Hou U View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stagewise learning for noisy k-ary preferences

Robust Plackett–Luce model for k-ary crowdsourced preferences

Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now