skip to main content
10.1145/3183713.3183732acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base

Published: 27 May 2018 Publication History

Abstract

Knowledge base construction (KBC) has become a hot and in-time topic recently with the increasing application need of large-scale knowledge bases (KBs), such as semantic search, QA systems, the Google Knowledge Graph and IBM Watson QA System. Existing KBs mainly focus on encoding the factual facts of the world, e.g., city area and company product, which are regarded as the objective knowledge, whereas the subjective knowledge, which is frequently mentioned in Web queries, has been neglected. The subjective knowledge has no documented ground truth, instead, the truth relies on people's dominant opinion, which can be solicited from online crowd workers. In our work, we propose a KBC framework for subjective knowledge base construction taking advantage of the knowledge from the crowd and existing KBs. We develop a two-staged framework for subjective KB construction which consists of core subjective KB construction and subjective KB enrichment. Firstly, we try to build a core subjective KB mined from existing KBs, where every instance has rich objective properties. Then, we populate the core subjective KB with instances extracted from existing KBs, in which the crowd is leverage to annotate the subjective property of the instances. In order to optimize the crowd annotation process, we formulate the problem of subjective KB enrichment procedure as a cost-aware instance annotation problem and propose two instance annotation algorithms, i.e., adaptive instance annotation and batch-mode instance annotation algorithms. We develop a two-stage system for subjective KB construction which consists of core subjective KB construction and subjective knowledge enrichment. We evaluate our framework on real knowledge bases and a real crowdsourcing platform, the experimental results show that we can derive high quality subjective knowledge facts from existing KBs and crowdsourcing techniques through our proposed framework.

References

[1]
Yael Amsterdamer, Susan B. Davidson, Anna Kukliansky, Tova Milo, Slava Novgorodov, and Amit Somech. 2015. Managing General and Individual Knowledge in Crowd Mining Applications CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4--7, 2015, Online Proceedings.
[2]
Yael Amsterdamer, Yael Grossman, Tova Milo, and Pierre Senellart. 2013. CrowdMiner: Mining association rules from the crowd. PVLDB, Vol. 6, 12 (2013), 1250--1253.
[3]
Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge SIGMOD. 1247--1250.
[4]
Jonathan Bragg, Mausam, and Daniel S. Weld. 2013. Crowdsourcing Multi-Label Classification for Taxonomy Creation Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2013, November 7--9, 2013, Palm Springs, CA, USA.
[5]
Caleb Chen Cao, Jiayang Tu, Zheng Liu, Lei Chen, and H. V. Jagadish. 2017. Tuning Crowdsourced Human Computation. In 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017. 1021--1032.
[6]
Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, and Jianhua Feng. 2016. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26-July 01, 2016. 969--984.
[7]
Lydia B. Chilton, Greg Little, Darren Edge, Daniel S. Weld, and James A. Landay. 2013. Cascade: crowdsourcing taxonomy creation. In 2013 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI '13, Paris, France, April 27-May 2, 2013. 1999--2008.
[8]
Minsoo Choy, Jae-Gil Lee, Gahgene Gweon, and Daehoon Kim. 2014. Glaucus: Exploiting the Wisdom of Crowds for Location-Based Queries in Mobile Environments Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1-4, 2014.
[9]
Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, and Yin Ye. 2015. KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31-June 4, 2015. 1247--1261.
[10]
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24-27, 2014. 601--610.
[11]
Oren Etzioni, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in knowitall: (preliminary results) WWW. 100--110.
[12]
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-Lee Tan, and Jianhua Feng. 2015. iCrowd: An Adaptive Crowdsourcing Framework. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015. 1015--1030.
[13]
Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, and Meihui Zhang. 2014. A hybrid machine-crowdsourcing system for matching web tables IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31-April 4, 2014. 976--987.
[14]
Yihan Gao and Aditya G. Parameswaran. 2014. Finish Them!: Pricing Algorithms for Human Computation. PVLDB, Vol. 7, 14 (2014), 1965--1976.
[15]
Daniel Golovin and Andreas Krause. 2011. Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization. JAIR (2011).
[16]
Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. Vol. 194 (2013), 28--61.
[17]
Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, and Dan Roth. 2016. Question Answering via Integer Programming over Semi-Structured Knowledge Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9--15 July 2016. 1145--1152.
[18]
Sarath Kumar Kondreddi, Peter Triantafillou, and Gerhard Weikum. 2014. Combining information extraction and human computing for crowdsourced knowledge acquisition. In ICDE. 988--999.
[19]
Yen-Ling Kuo, J Hsu, and Fuming Shih. 2012. Contextual commonsense knowledge acquisition from social content by crowd-sourcing explanations Proceedings of the Fourth AAAI Workshop on Human Computation. 18--24.
[20]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web (2015), 167--195.
[21]
Hugo Liu and Push Singh. 2004. ConceptNet: A practical commonsense reasoning tool-kit. BT technology journal Vol. 22, 4 (2004), 211--226.
[22]
Julian McAuley and Alex Yang. 2016. Addressing Complex and Subjective Product-Related Queries with Customer Reviews Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11-15, 2016. 625--635.
[23]
Rui Meng, Lei Chen, Yongxin Tong, and Chen Jason Zhang. 2017. Knowledge Base Semantic Integration Using Crowdsourcing. IEEE Trans. Knowl. Data Eng. Vol. 29, 5 (2017), 1087--1100.
[24]
Rui Meng, Yongxin Tong, Lei Chen, and Caleb Chen Cao. 2015. CrowdTC: Crowdsourced Taxonomy Construction. In 2015 IEEE International Conference on Data Mining, ICDM 2015, Atlantic City, NJ, USA, November 14-17, 2015. 913--918.
[25]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM (1995).
[26]
Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik. 2012. DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources, Istanbul, Turkey, August 31, 2012. 25--28.
[27]
Chen Shi, Shujie Liu, Shuo Ren, Shi Feng, Mu Li, Ming Zhou, Xu Sun, and Houfeng Wang. 2016. Knowledge-Based Semantic Embedding for Machine Translation Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers.
[28]
Amit Singhal. 2012. Introducing to the Knowledge Graph: things not strings. https://googleblog.blogspot.hk/2012/05/introducing-knowledge-graph-things-not.html, (2012).
[29]
Yongxin Tong, Lei Chen, Zimu Zhou, H.V. Jagadish, Lidan Shou, and Weifeng Lv . 2018. SLADE: A smart large-scale task decomposer in crowdsourcing. IEEE Transactions on Knowledge and Data Engineering (2018).
[30]
Yongxin Tong, Caleb Chen Cao, Chen Jason Zhang, Yatao Li, and Lei Chen. 2014. Crowdcleaner: Data cleaning for multi-version data on the web via crowdsourcing Proceedings of the 30th International Conference on Data Engineering (ICDE 2014). IEEE, 1182--1185.
[31]
Yongxin Tong, Jieying She, Bolin Ding, Libin Wang, and Lei Chen. 2016. Online mobile micro-task allocation in spatial crowdsourcing Proceedings of the 32nd International Conference on Data Engineering (ICDE 2016). IEEE, 49--60.
[32]
Immanuel Trummer, Alon Y. Halevy, Hongrae Lee, Sunita Sarawagi, and Rahul Gupta. 2015. Mining Subjective Properties on the Web. In SIGMOD. 1745--1760.
[33]
Norases Vesdapunt, Kedar Bellare, and Nilesh N. Dalvi. 2014. Crowdsourcing Algorithms for Entity Resolution. PVLDB, Vol. 7, 12 (2014), 1071--1082.
[34]
Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsourcing Entity Resolution. PVLDB, Vol. 5, 11 (2012), 1483--1494.
[35]
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Qili Zhu. 2012. Probase: a probabilistic taxonomy for text understanding Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, USA, May 20-24, 2012. 481--492.
[36]
Omar Zaidan and Chris Callison-Burch. 2011. Crowdsourcing Translation: Professional Quality from Non-Professionals ACL. 1220--1229.
[37]
Chen Jason Zhang, Lei Chen, H. V. Jagadish, and Caleb Chen Cao. 2013. Reducing Uncertainty of Schema Matching via Crowdsourcing. PVLDB, Vol. 6, 9 (2013), 757--768.
[38]
Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, and Jianhua Feng. 2015. QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31-June 4, 2015. 1031--1046.

Cited By

View all
  • (2024)The Effect of Individual-Level Factors and Task Features on Interface Design for Rule-Verification Crowdsourcing TasksInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2332031(1-28)Online publication date: 16-Apr-2024
  • (2022)Defining a Knowledge Graph Development Process Through a Systematic ReviewACM Transactions on Software Engineering and Methodology10.1145/352258632:1(1-40)Online publication date: 30-Apr-2022
  • (2021)A Human-in-the-loop Approach to Social Behavioral Targeting2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00031(277-288)Online publication date: Apr-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowdsourcing
  2. knowledge base construction
  3. subjective knowledge

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation of China (NSFC)
  • National Grand Fundamental Research 973 Program of China
  • Hong Kong RGC GRF Project
  • Science and Technology Planning Project of Guangdong Province

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)4
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)The Effect of Individual-Level Factors and Task Features on Interface Design for Rule-Verification Crowdsourcing TasksInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2332031(1-28)Online publication date: 16-Apr-2024
  • (2022)Defining a Knowledge Graph Development Process Through a Systematic ReviewACM Transactions on Software Engineering and Methodology10.1145/352258632:1(1-40)Online publication date: 30-Apr-2022
  • (2021)A Human-in-the-loop Approach to Social Behavioral Targeting2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00031(277-288)Online publication date: Apr-2021
  • (2021)Task Selection Based on Worker Performance Prediction in Gamified CrowdsourcingAgents and Multi-Agent Systems: Technologies and Applications 202110.1007/978-981-16-2994-5_6(65-75)Online publication date: 8-Jun-2021
  • (2020)Querying subjective dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00634-530:1(115-140)Online publication date: 8-Sep-2020
  • (2019)Subjective databasesProceedings of the VLDB Endowment10.14778/3342263.334227112:11(1330-1343)Online publication date: 1-Jul-2019
  • (2019)MedTruthProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357934(719-728)Online publication date: 3-Nov-2019
  • (2019)Towards Automatic Mathematical Exercise SolvingData Science and Engineering10.1007/s41019-019-00098-w4:3(179-192)Online publication date: 6-Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media