Information Extraction Meets Crowdsourcing: A Promising Couple

Lofi, Christoph; Selke, Joachim; Balke, Wolf-Tilo

doi:10.1007/s13222-012-0092-8

Information Extraction Meets Crowdsourcing: A Promising Couple

Schwerpunktbeitrag
Published: 23 May 2012

Volume 12, pages 109–120, (2012)
Cite this article

Datenbank-Spektrum Aims and scope Submit manuscript

Christoph Lofi¹,
Joachim Selke¹ &
Wolf-Tilo Balke¹

510 Accesses
18 Citations
4 Altmetric
Explore all metrics

Abstract

Recent years brought tremendous advancements in the area of automated information extraction. But still, problem scenarios remain where even state-of-the-art algorithms do not provide a satisfying solution. In these cases, another aspiring recent trend can be exploited to achieve the required extraction quality: explicit crowdsourcing of human intelligence tasks. In this paper, we discuss the synergies between information extraction and crowdsourcing. In particular, we methodically identify and classify the challenges and fallacies that arise when combining both approaches. Furthermore, we argue that for harnessing the full potential of either approach, true hybrid techniques must be considered. To demonstrate this point, we showcase such a hybrid technique, which tightly interweaves information extraction with crowdsourcing and machine learning to vastly surpass the abilities of either technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

For more detailed information, see http://crowdflower.com/docs/gold.
http://samasource.org/.
http://www.facebook.com/press/info.php?statistics.

References

Weikum G, Theobald M (2010) From information to knowledge: harvesting entities and relationships from web sources. In: ACM SIGMOD symp on principles of database systems (PODS), Indianapolis, USA, pp 65–76
Google Scholar
Chang C-H, Kayed M, Girgis MR, Shaalan KF (2006) A survey of web information extraction systems. IEEE Trans Knowl Data Eng 18:1411–1428
Article Google Scholar
Amer-Yahia S, Doan A, Kleinberg JM, Koudas N, Franklin MJ, (2010) Crowds, clouds, and algorithms: exploring the human side of “big data” applications. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD), pp 1259–1260
Google Scholar
Surowiecki J (2004) The wisdom of crowds. Doubleday, Anchor
Google Scholar
Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the world-wide web. Commun ACM 54:86–96
Article Google Scholar
Franklin M, Kossmann D, Kraska T, Ramesh S, Xin R (2011) CrowdDB: answering queries with crowdsourcing. In: ACM SIGMOD int conf on management of data, Athens, Greece
Google Scholar
Selke J, Lofi C, Balke W-T (2012) Pushing the boundaries of crowd-enabled databases with query-driven schema expansion. In: 38th int conf on very large data bases (VLDB). PVLDB 5(2), Istanbul, Turkey, pp 538–549
Google Scholar
Goodchild M, Glennon JA (2010) Crowdsourcing geographic information for disaster response: a research frontier. Int J Digit Earth 3:231
Article Google Scholar
Marcus A, Wu E, Karger DR, Madden S, Miller RC (2011) Crowdsourced databases: query processing with people. In: Conf on innovative data systems research (CIDR). Asilomar, California, USA
Google Scholar
Etzioni O, Banko M, Soderland S, Weld DS (2008) Open information extraction from the Web. Commun ACM 51:68–74
Article Google Scholar
Getoor L, Taskar B (2007) Introduction to statistical relational learning. MIT Press, Cambridge
MATH Google Scholar
Suchanek FM, Kasneci G, Weikum G (2008) YAGO: a large ontology from Wikipedia and WordNet. J Web Semant 6:203–217
Article Google Scholar
Wu F, Weld DS (2008) Automatically refining the Wikipedia infobox ontology. In: Proceedings of the international conference on the world wide web (WWW), pp 635–644
Google Scholar
Chai X, Gao BJ, Shen W, Doan AH, Bohannon P, Zhu X (2008) Building community Wikipedias: a machine-human partnership approach. In: Int conf on data engineering (ICDE), Cancun, Mexico
Google Scholar
DeRose P, Shen W, Chen F, Lee Y, Burdick D, Doan AH, Ramakrishnan R (2007) DBLife: a community information management platform for the database research community In: Conf on innovative data systems research (CIDR) Asilomar, California, USA
Google Scholar
Chai X, Vuong B-q, Doan A, Naughton JF (2009) Efficiently incorporating user feedback into information extraction and integration programs. In: SIGMOD int conf on management of data, Providence, Rhode Island, USA
Google Scholar
Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 99:1297–1322
MathSciNet Google Scholar
Ipeirotis PG (2010) Analyzing the amazon mechanical turk marketplace. XRDS: Crossroads 17:16–21
Google Scholar
Mason W, Watts DJ (2010) Financial incentives and the performance of crowds. ACM SIGKDD Explor Newsl 11:100–108
Article Google Scholar
von Ahn L (2006) Games with a purpose. Computer 39:92–94
Article Google Scholar
von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: SIGCHI conf on human factors in computing systems (CHI), Vienna, Austria
Google Scholar
Paolacci G, Chandler J, Ipeirotis PG (2010) Running experiments on amazon mechanical turk. Judgm Decis Mak 5:411–419
Google Scholar
Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with mechanical turk. In: SIGCHI conf on human factors in computing systems
Google Scholar
Ross J, Irani L, Silberman MS, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers? Shifting demographics in mechanical turk. In: Int conf on extended abstracts on human factors in computing systems (CHI EA), Atlanta, USA
Google Scholar
Ipeirotis PG (2010) Demographics of mechanical turk. NYU stern school of business research paper series
Drucker H, Burges CJC, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Adv Neural Inf Process Syst 54:155–161
Google Scholar
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Article MathSciNet Google Scholar
Jäkel F, Schölkopf B, Wichmann FA (2009) Does cognitive science need kernels? Trends Cogn Sci 13:381–388
Article Google Scholar
Keeney RL, Raiffa H (1993) Decisions with multiple objectives: preferences and value tradeoffs. Cambridge University Press, Cambridge
Google Scholar
Kahneman D, Tversky A (1982) The psychology of preferences. Sci Am 246:160–173
Article Google Scholar
Hofmann T (2004) Latent semantic models for collaborative filtering. ACM Trans Inf Syst 22:89–115
Article Google Scholar
Koren Y, Bell R (2011) Advances in collaborative filtering. Recommender Systems Handbook, 145–186
Gemulla R, Haas PJ, Nijkamp E, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: ACM SIGKDD int conf on knowledge discovery and data mining (KDD), San Diego, USA. Technical report RJ10481, IBM Almaden Research Center, San Jose, CA, 2011. Available at www.almaden.ibm.com/cs/people/peterh/dsgdTechRep.pdf
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informationssysteme, Technische Universität Braunschweig, Braunschweig, Germany
Christoph Lofi, Joachim Selke & Wolf-Tilo Balke

Authors

Christoph Lofi
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Selke
View author publications
You can also search for this author in PubMed Google Scholar
Wolf-Tilo Balke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christoph Lofi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lofi, C., Selke, J. & Balke, WT. Information Extraction Meets Crowdsourcing: A Promising Couple. Datenbank Spektrum 12, 109–120 (2012). https://doi.org/10.1007/s13222-012-0092-8

Download citation

Received: 10 April 2012
Accepted: 12 May 2012
Published: 23 May 2012
Issue Date: July 2012
DOI: https://doi.org/10.1007/s13222-012-0092-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information Extraction Meets Crowdsourcing: A Promising Couple

Abstract

Access this article

Similar content being viewed by others

Crowdsourcing for Information Retrieval

Harnessing Diversity in Crowds and Machines for Better NER Performance

CrowdED and CREX: Towards Easy Crowdsourcing Quality Control Evaluation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Information Extraction Meets Crowdsourcing: A Promising Couple

Abstract

Access this article

Similar content being viewed by others

Crowdsourcing for Information Retrieval

Harnessing Diversity in Crowds and Machines for Better NER Performance

CrowdED and CREX: Towards Easy Crowdsourcing Quality Control Evaluation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation