Skip to main content

Crowdsourcing High-Quality Structured Data

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 898))

Abstract

One of the most difficult problems faced by consumers of semi-structured and structured data on the Web is how to discover or create the data they need. On the other hand, the producers of Web data do not have any (semi)automated way to align their data production with consumer needs. In this paper we formalize the problem of a data marketplace, hypothesize that one can quantify the value of semi-structured and structured data given a set of consumers, and that this quantification can be applied on both existing data-sets and data-sets that need to be created. Furthermore, we provide an algorithm for showing how the production of this data can be crowd-sourced while assuring the consumer a certain level of quality. Using real-world empirical data collected via data producers and consumers, we simulate a crowd-sourced data marketplace with quality guarantees.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://datahub.io/.

  2. 2.

    While Infochimps claims to be larger (approximately 9,000 data-sets) than The Data Hub, approximately half of its data-sets are APIs rather than structured data and thus cannot be queried, and no history of users and revisions are available as Infochimps does not use crowd-sourcing.

References

  1. Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD Conference, pp. 241–252 (2013)

    Google Scholar 

  2. Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 43–50. ACM, New York (2006)

    Google Scholar 

  3. Bernstein, M.S., Karger, D.R., Miller, R.C., Brandt, J.: Analytic methods for optimizing realtime crowdsourcing. CoRR, abs/1204.2995 (2012)

    Google Scholar 

  4. Dertouzos, M., Gates, B.: What Will Be: How the New World of Information Will Change Our Lives. HarperCollins, New York City (1998)

    Google Scholar 

  5. Dorn, C., Dustdar, S.: Composing near-optimal expert teams: a trade-off between skills and connectivity. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6426, pp. 472–489. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16934-2_35

    Chapter  Google Scholar 

  6. Ghosh, A., Hummel, P.: Implementing optimal outcomes in social computing: a game-theoretic approach. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 539–548. ACM, New York (2012)

    Google Scholar 

  7. Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs Isn’t the same: an analysis of identity in linked data. In: Patel-Schneider, P.F., et al. (eds.) ISWC 2010. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17746-0_20

    Chapter  Google Scholar 

  8. Ho, C.-J., Slivkins, A., Suri, S., Vaughan, J.W.: Incentivizing high quality crowdwork. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 419–429 (2015)

    Google Scholar 

  9. Huang, E., Zhang, H., Parkes, D.C., Gajos, K.Z., Chen, Y.: Toward automatic task design: a progress report. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP 2010, pp. 77–85. ACM, New York (2010)

    Google Scholar 

  10. Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Human Computation Workshop (KDD-HCOMP 2010) (2010)

    Google Scholar 

  11. Lykourentzou, I., Vergados, D.J., Naudet, Y.: Improving wiki article quality through crowd coordination: a resource allocation approach. Int. J. Semantic Web Inf. Syst. 9(3), 105–125 (2013)

    Article  Google Scholar 

  12. Mao, A., et al.: Volunteering versus work for pay: incentives and tradeoffs in crowdsourcing. In: First AAAI Conference on Human Computation and crowdsourcing (2013)

    Google Scholar 

  13. Mason, W., Watts, D.J.: Financial incentives and the “performance of crowds”. In: Human Computation Workshop (HComp2009) (2009)

    Google Scholar 

  14. Nath, S., Zoeter, O., Narahari, Y., Dance, C.: Dynamic mechanism design for markets with strategic resources. In: Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp. 539–546. AUAI Press, Corvallis (2011)

    Google Scholar 

  15. Shahaf, D., Horvitz, E.: Generalized task markets for human and machine computation. In: National Conference on Artificial Intelligence (2010)

    Google Scholar 

  16. Shen, H.Y.Z., Fauvel, S., Cui, L.: Efficient scheduling in crowdsourcing based on workers. In: 2017 IEEE International Conference on Agents (ICA), pp. 121–126. IEEE (2017)

    Google Scholar 

  17. Smirnova, E.: A model for expert finding in social networks. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1191–1192. ACM, New York (2011)

    Google Scholar 

  18. Von Ahn, L.: Human computation. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (2005). AAI3205378

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Harry Halpin or Ioanna Lykourentzou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Halpin, H., Lykourentzou, I. (2019). Crowdsourcing High-Quality Structured Data. In: Lossio-Ventura, J., Muñante, D., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2018. Communications in Computer and Information Science, vol 898. Springer, Cham. https://doi.org/10.1007/978-3-030-11680-4_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11680-4_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11679-8

  • Online ISBN: 978-3-030-11680-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics