Crowdsourcing High-Quality Structured Data

Halpin, Harry; Lykourentzou, Ioanna

doi:10.1007/978-3-030-11680-4_29

Crowdsourcing High-Quality Structured Data

Harry Halpin^11,12 &
Ioanna Lykourentzou^11,12

Conference paper
First Online: 08 February 2019

757 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 898))

Abstract

One of the most difficult problems faced by consumers of semi-structured and structured data on the Web is how to discover or create the data they need. On the other hand, the producers of Web data do not have any (semi)automated way to align their data production with consumer needs. In this paper we formalize the problem of a data marketplace, hypothesize that one can quantify the value of semi-structured and structured data given a set of consumers, and that this quantification can be applied on both existing data-sets and data-sets that need to be created. Furthermore, we provide an algorithm for showing how the production of this data can be crowd-sourced while assuring the consumer a certain level of quality. Using real-world empirical data collected via data producers and consumers, we simulate a crowd-sourced data marketplace with quality guarantees.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://datahub.io/.
2.
While Infochimps claims to be larger (approximately 9,000 data-sets) than The Data Hub, approximately half of its data-sets are APIs rather than structured data and thus cannot be queried, and no history of users and revisions are available as Infochimps does not use crowd-sourcing.

References

Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD Conference, pp. 241–252 (2013)
Google Scholar
Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 43–50. ACM, New York (2006)
Google Scholar
Bernstein, M.S., Karger, D.R., Miller, R.C., Brandt, J.: Analytic methods for optimizing realtime crowdsourcing. CoRR, abs/1204.2995 (2012)
Google Scholar
Dertouzos, M., Gates, B.: What Will Be: How the New World of Information Will Change Our Lives. HarperCollins, New York City (1998)
Google Scholar
Dorn, C., Dustdar, S.: Composing near-optimal expert teams: a trade-off between skills and connectivity. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6426, pp. 472–489. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16934-2_35
Chapter Google Scholar
Ghosh, A., Hummel, P.: Implementing optimal outcomes in social computing: a game-theoretic approach. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 539–548. ACM, New York (2012)
Google Scholar
Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs Isn’t the same: an analysis of identity in linked data. In: Patel-Schneider, P.F., et al. (eds.) ISWC 2010. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17746-0_20
Chapter Google Scholar
Ho, C.-J., Slivkins, A., Suri, S., Vaughan, J.W.: Incentivizing high quality crowdwork. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 419–429 (2015)
Google Scholar
Huang, E., Zhang, H., Parkes, D.C., Gajos, K.Z., Chen, Y.: Toward automatic task design: a progress report. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP 2010, pp. 77–85. ACM, New York (2010)
Google Scholar
Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Human Computation Workshop (KDD-HCOMP 2010) (2010)
Google Scholar
Lykourentzou, I., Vergados, D.J., Naudet, Y.: Improving wiki article quality through crowd coordination: a resource allocation approach. Int. J. Semantic Web Inf. Syst. 9(3), 105–125 (2013)
Article Google Scholar
Mao, A., et al.: Volunteering versus work for pay: incentives and tradeoffs in crowdsourcing. In: First AAAI Conference on Human Computation and crowdsourcing (2013)
Google Scholar
Mason, W., Watts, D.J.: Financial incentives and the “performance of crowds”. In: Human Computation Workshop (HComp2009) (2009)
Google Scholar
Nath, S., Zoeter, O., Narahari, Y., Dance, C.: Dynamic mechanism design for markets with strategic resources. In: Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp. 539–546. AUAI Press, Corvallis (2011)
Google Scholar
Shahaf, D., Horvitz, E.: Generalized task markets for human and machine computation. In: National Conference on Artificial Intelligence (2010)
Google Scholar
Shen, H.Y.Z., Fauvel, S., Cui, L.: Efficient scheduling in crowdsourcing based on workers. In: 2017 IEEE International Conference on Agents (ICA), pp. 121–126. IEEE (2017)
Google Scholar
Smirnova, E.: A model for expert finding in social networks. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1191–1192. ACM, New York (2011)
Google Scholar
Von Ahn, L.: Human computation. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (2005). AAI3205378
Google Scholar

Download references

Author information

Authors and Affiliations

Inria, 2 Rue Simone Iff, Paris, France
Harry Halpin & Ioanna Lykourentzou
Utrecht University, Princetonplein 5, Utrecht, Netherlands
Harry Halpin & Ioanna Lykourentzou

Authors

Harry Halpin
View author publications
You can also search for this author in PubMed Google Scholar
Ioanna Lykourentzou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Harry Halpin or Ioanna Lykourentzou .

Editor information

Editors and Affiliations

Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
Juan Antonio Lossio-Ventura
Fondazione Bruno Kessler, Trento, Italy
Denisse Muñante
Facultad de Ingeniería, University of the Pacific, Jesús María, Lima, Peru
Hugo Alatrista-Salas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Halpin, H., Lykourentzou, I. (2019). Crowdsourcing High-Quality Structured Data. In: Lossio-Ventura, J., Muñante, D., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2018. Communications in Computer and Information Science, vol 898. Springer, Cham. https://doi.org/10.1007/978-3-030-11680-4_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-11680-4_29
Published: 08 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11679-8
Online ISBN: 978-3-030-11680-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics