Abstract
One of the most difficult problems faced by consumers of semi-structured and structured data on the Web is how to discover or create the data they need. On the other hand, the producers of Web data do not have any (semi)automated way to align their data production with consumer needs. In this paper we formalize the problem of a data marketplace, hypothesize that one can quantify the value of semi-structured and structured data given a set of consumers, and that this quantification can be applied on both existing data-sets and data-sets that need to be created. Furthermore, we provide an algorithm for showing how the production of this data can be crowd-sourced while assuring the consumer a certain level of quality. Using real-world empirical data collected via data producers and consumers, we simulate a crowd-sourced data marketplace with quality guarantees.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
While Infochimps claims to be larger (approximately 9,000 data-sets) than The Data Hub, approximately half of its data-sets are APIs rather than structured data and thus cannot be queried, and no history of users and revisions are available as Infochimps does not use crowd-sourcing.
References
Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD Conference, pp. 241–252 (2013)
Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 43–50. ACM, New York (2006)
Bernstein, M.S., Karger, D.R., Miller, R.C., Brandt, J.: Analytic methods for optimizing realtime crowdsourcing. CoRR, abs/1204.2995 (2012)
Dertouzos, M., Gates, B.: What Will Be: How the New World of Information Will Change Our Lives. HarperCollins, New York City (1998)
Dorn, C., Dustdar, S.: Composing near-optimal expert teams: a trade-off between skills and connectivity. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6426, pp. 472–489. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16934-2_35
Ghosh, A., Hummel, P.: Implementing optimal outcomes in social computing: a game-theoretic approach. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 539–548. ACM, New York (2012)
Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs Isn’t the same: an analysis of identity in linked data. In: Patel-Schneider, P.F., et al. (eds.) ISWC 2010. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17746-0_20
Ho, C.-J., Slivkins, A., Suri, S., Vaughan, J.W.: Incentivizing high quality crowdwork. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 419–429 (2015)
Huang, E., Zhang, H., Parkes, D.C., Gajos, K.Z., Chen, Y.: Toward automatic task design: a progress report. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP 2010, pp. 77–85. ACM, New York (2010)
Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Human Computation Workshop (KDD-HCOMP 2010) (2010)
Lykourentzou, I., Vergados, D.J., Naudet, Y.: Improving wiki article quality through crowd coordination: a resource allocation approach. Int. J. Semantic Web Inf. Syst. 9(3), 105–125 (2013)
Mao, A., et al.: Volunteering versus work for pay: incentives and tradeoffs in crowdsourcing. In: First AAAI Conference on Human Computation and crowdsourcing (2013)
Mason, W., Watts, D.J.: Financial incentives and the “performance of crowds”. In: Human Computation Workshop (HComp2009) (2009)
Nath, S., Zoeter, O., Narahari, Y., Dance, C.: Dynamic mechanism design for markets with strategic resources. In: Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp. 539–546. AUAI Press, Corvallis (2011)
Shahaf, D., Horvitz, E.: Generalized task markets for human and machine computation. In: National Conference on Artificial Intelligence (2010)
Shen, H.Y.Z., Fauvel, S., Cui, L.: Efficient scheduling in crowdsourcing based on workers. In: 2017 IEEE International Conference on Agents (ICA), pp. 121–126. IEEE (2017)
Smirnova, E.: A model for expert finding in social networks. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1191–1192. ACM, New York (2011)
Von Ahn, L.: Human computation. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (2005). AAI3205378
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Halpin, H., Lykourentzou, I. (2019). Crowdsourcing High-Quality Structured Data. In: Lossio-Ventura, J., Muñante, D., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2018. Communications in Computer and Information Science, vol 898. Springer, Cham. https://doi.org/10.1007/978-3-030-11680-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-11680-4_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11679-8
Online ISBN: 978-3-030-11680-4
eBook Packages: Computer ScienceComputer Science (R0)