research-article

Building, maintaining, and using knowledge bases: a report from the trenches

Authors:

Omkar Deshpande,

Digvijay S. Lamba,

Sri Subramaniam,

Anand Rajaraman,

Venky Harinarayan,

AnHai DoanAuthors Info & Claims

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Pages 1209 - 1220

https://doi.org/10.1145/2463676.2465297

Published: 22 June 2013 Publication History

Abstract

A knowledge base (KB) contains a set of concepts, instances, and relationships. Over the past decade, numerous KBs have been built, and used to power a growing array of applications. Despite this flurry of activities, however, surprisingly little has been published about the end-to-end process of building, maintaining, and using such KBs in industry. In this paper we describe such a process. In particular, we describe how we build, update, and curate a large KB at Kosmix, a Bay Area startup, and later at WalmartLabs, a development and research lab of Walmart. We discuss how we use this KB to power a range of applications, including query understanding, Deep Web search, in-context advertising, event monitoring in social media, product search, social gifting, and social mining. Finally, we discuss how the KB team is organized, and the lessons learned. Our goal with this paper is to provide a real-world case study, and to contribute to the emerging direction of building, maintaining, and using knowledge bases for data management applications.

References

[1]

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a web of open data. In The Semantic Web, 2007.

Digital Library

[2]

T. Berners-Lee, J. Hendler, O. Lassila, et al. The semantic web. Scientific American, 284(5):28--37, 2001.

[3]

C. Bizer, T. Heath, and T. Berners-Lee. Linked data-the story so far. IJSWIS, 5(3):1--22, 2009.

[4]

C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia- a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3):154--165, 2009.

Digital Library

[5]

K. Bollacker, R. Cook, and P. Tufts. A platform for scalable, collaborative, structured information integration. In IIWeb, 2007.

[6]

K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.

Digital Library

[7]

D. Butler. Science searches shift up a gear as google starts scholar engine. Nature, 432(7016):423--423, 2004.

[8]

X. Chai, B. Vuong, A. Doan, and J. F. Naughton. Efficiently incorporating user feedback into information extraction and integration programs. In SIGMOD, 2009.

Digital Library

[9]

Y. J. Chu and T. H. Liu. On the shortest arborescence of a directed graph. Science Sinica, 14(270):1396--1400, 1965.

[10]

P. DeRose, X. Chai, B. Gao, W. Shen, A. Doan, P. Bohannon, and X. Zhu. Building community wikipedias: A machine-human partnership approach. In ICDE, 2008.

Digital Library

[11]

P. DeRose, W. Shen, F. Chen, A. Doan, and R. Ramakrishnan. Building structured web community portals: A top-down, compositional, and incremental approach. In VLDB, 2007.

Digital Library

[12]

J. Edmonds. Optimum branchings. Journal of Research of the National Bureau of Standards B, 71:233--240, 1967.

[13]

D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, et al. Building Watson: An overview of the DeepQA project. AI magazine, 31(3):59--79, 2010.

Digital Library

[14]

D. B. Lenat. CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33--38, 1995.

Digital Library

[15]

F. Manola, E. Miller, and B. McBride. RDF primer. W3C recommendation, 10:1--107, 2004.

[16]

G. A. Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39--41, 1995.

Digital Library

[17]

F. Niu, C. Zhang, C. Ré, and J. Shavlik. DeepDive: Web-scale knowledge-base construction using statistical learning and inference. In VLDS, 2012.

[18]

Y. Pavlidis, M. Mathihalli, I. Chakravarty, A. Batra, R. Benson, R. Raj, R. Yau, M. McKiernan, V. Harinarayan, and A. Rajaraman. Anatomy of a gift recommendation engine powered by social media. In SIGMOD, 2012.

Digital Library

[19]

L. Sauermann, R. Cyganiak, and M. Völkel. Cool URIs for the semantic web. W3 Interest Group Note. http://www.w3.org/TR/cooluris/.

[20]

A. Singhal. Introducing the Knowledge Graph: things, not strings. Official Google Blog, May, 2012.

[21]

F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, 2007.

Digital Library

[22]

F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A large ontology from Wikipedia and WordNet. Web Semantics: Science, Services and Agents on the World Wide Web, 6(3):203--217, 2008.

Digital Library

[23]

R. E. Tarjan. Finding optimum branchings. Networks, 7(1):25--35, 1977.

[24]

J. Weaver and P. Tarjan. Facebook Linked Data via the Graph API. Semantic Web, 2012.

[25]

J. Zhu, Z. Nie, X. Liu, B. Zhang, and J. R. Wen. StatSnowball: a statistical approach to extracting entity relationships. In WWW, 2009.

Digital Library

Cited By

Marchesin SSilvello G(2024)Efficient and Reliable Estimation of Knowledge Graph AccuracyProceedings of the VLDB Endowment10.14778/3665844.366586517:9(2392-2403)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665865
Mannino MGarcia JHazim RAbouzied APapotti PSerra ESpezzano F(2024)Data Void Exploits: Tracking & Mitigation StrategiesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679781(1627-1637)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679781
Marchesin SSilvello GAlonso OSerra ESpezzano F(2024)Veracity Estimation for Entity-Oriented Search with Knowledge GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679561(1649-1659)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679561
Show More Cited By

Index Terms

Building, maintaining, and using knowledge bases: a report from the trenches
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Creation of ESTELLE/Ag Specifications Using Knowledge Bases

The paper presents a technique that defines creation of ESTELLE/Ag specifications using knowledge bases (KB). Application KB is created using the knowledge acquisition technique joined with a piece-linear aggregate model. The production rules of the ...
Protokol: A system-building aid for developing protocol-type knowledge bases
MonitoringKnowledge Acquisition Instead of Evaluating Knowledge Bases
EKAW '00: Proceedings of the 12th European Workshop on Knowledge Acquisition, Modeling and Management

Evaluating the success of a knowledge acquisition (KA) task is difficult and expensive. Most evaluation approaches rely on the expert themselves, either directly, or indirectly by relying on data previously prepared with the help of experts. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

June 2013

1322 pages

ISBN:9781450320375

DOI:10.1145/2463676

General Chairs:
Kenneth Ross
Columbia University
,
Divesh Srivastava
AT&T Research
,
Program Chair:
Dimitris Papadias
HKUST

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS'13

Sponsor:

SIGMOD

SIGMOD/PODS'13: International Conference on Management of Data

June 22 - 27, 2013

New York, New York, USA

Acceptance Rates

SIGMOD '13 Paper Acceptance Rate 76 of 372 submissions, 20%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

62
Total Citations
View Citations
917
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)2

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Marchesin SSilvello G(2024)Efficient and Reliable Estimation of Knowledge Graph AccuracyProceedings of the VLDB Endowment10.14778/3665844.366586517:9(2392-2403)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665865
Mannino MGarcia JHazim RAbouzied APapotti PSerra ESpezzano F(2024)Data Void Exploits: Tracking & Mitigation StrategiesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679781(1627-1637)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679781
Marchesin SSilvello GAlonso OSerra ESpezzano F(2024)Veracity Estimation for Entity-Oriented Search with Knowledge GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679561(1649-1659)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679561
Xin HChen LShen Y(2024)PKBC: A Product-Specific Knowledge base Taxonomy FrameworkDatabase Systems for Advanced Applications10.1007/978-981-97-5562-2_24(375-390)Online publication date: 27-Oct-2024
https://doi.org/10.1007/978-981-97-5562-2_24
Sarıkoz S(2023)Examining Knowledge Extraction Processes from Heterogeneous Data SourcesBrilliant Engineering10.36937/ben.2023.47984:1(1-8)Online publication date: 8-Feb-2023
https://doi.org/10.36937/ben.2023.4798
Shen WYin YYang YHan JWang JYuan X(2022)Toward Tweet Entity Linking With Heterogeneous Information NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.306809334:12(6003-6017)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TKDE.2021.3068093
Lu JWen GLu RWang YZhang S(2022)Networked Knowledge and Complex Networks: An Engineering ViewIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2022.1057379:8(1366-1383)Online publication date: Aug-2022
https://doi.org/10.1109/JAS.2022.105737
Kouagou NHeindorf SDemir CNgomo A(2022)Learning Concept Lengths Accelerates Concept Learning in ALCThe Semantic Web10.1007/978-3-031-06981-9_14(236-252)Online publication date: 31-May-2022
https://doi.org/10.1007/978-3-031-06981-9_14
Loster MMottin DPapotti PEhmüller JFeldmann BNaumann F(2021)Few-Shot Knowledge Validation using RulesProceedings of the Web Conference 202110.1145/3442381.3450040(3314-3324)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3450040
Jo HKim JPorras PYegneswaran VShin S(2021)GapFinder: Finding Inconsistency of Security Information From Unstructured TextIEEE Transactions on Information Forensics and Security10.1109/TIFS.2020.300357016(86-99)Online publication date: 2021
https://doi.org/10.1109/TIFS.2020.3003570
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten