skip to main content
10.1145/2463676.2465297acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Building, maintaining, and using knowledge bases: a report from the trenches

Published: 22 June 2013 Publication History

Abstract

A knowledge base (KB) contains a set of concepts, instances, and relationships. Over the past decade, numerous KBs have been built, and used to power a growing array of applications. Despite this flurry of activities, however, surprisingly little has been published about the end-to-end process of building, maintaining, and using such KBs in industry. In this paper we describe such a process. In particular, we describe how we build, update, and curate a large KB at Kosmix, a Bay Area startup, and later at WalmartLabs, a development and research lab of Walmart. We discuss how we use this KB to power a range of applications, including query understanding, Deep Web search, in-context advertising, event monitoring in social media, product search, social gifting, and social mining. Finally, we discuss how the KB team is organized, and the lessons learned. Our goal with this paper is to provide a real-world case study, and to contribute to the emerging direction of building, maintaining, and using knowledge bases for data management applications.

References

[1]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a web of open data. In The Semantic Web, 2007.
[2]
T. Berners-Lee, J. Hendler, O. Lassila, et al. The semantic web. Scientific American, 284(5):28--37, 2001.
[3]
C. Bizer, T. Heath, and T. Berners-Lee. Linked data-the story so far. IJSWIS, 5(3):1--22, 2009.
[4]
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia- a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3):154--165, 2009.
[5]
K. Bollacker, R. Cook, and P. Tufts. A platform for scalable, collaborative, structured information integration. In IIWeb, 2007.
[6]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.
[7]
D. Butler. Science searches shift up a gear as google starts scholar engine. Nature, 432(7016):423--423, 2004.
[8]
X. Chai, B. Vuong, A. Doan, and J. F. Naughton. Efficiently incorporating user feedback into information extraction and integration programs. In SIGMOD, 2009.
[9]
Y. J. Chu and T. H. Liu. On the shortest arborescence of a directed graph. Science Sinica, 14(270):1396--1400, 1965.
[10]
P. DeRose, X. Chai, B. Gao, W. Shen, A. Doan, P. Bohannon, and X. Zhu. Building community wikipedias: A machine-human partnership approach. In ICDE, 2008.
[11]
P. DeRose, W. Shen, F. Chen, A. Doan, and R. Ramakrishnan. Building structured web community portals: A top-down, compositional, and incremental approach. In VLDB, 2007.
[12]
J. Edmonds. Optimum branchings. Journal of Research of the National Bureau of Standards B, 71:233--240, 1967.
[13]
D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, et al. Building Watson: An overview of the DeepQA project. AI magazine, 31(3):59--79, 2010.
[14]
D. B. Lenat. CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33--38, 1995.
[15]
F. Manola, E. Miller, and B. McBride. RDF primer. W3C recommendation, 10:1--107, 2004.
[16]
G. A. Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39--41, 1995.
[17]
F. Niu, C. Zhang, C. Ré, and J. Shavlik. DeepDive: Web-scale knowledge-base construction using statistical learning and inference. In VLDS, 2012.
[18]
Y. Pavlidis, M. Mathihalli, I. Chakravarty, A. Batra, R. Benson, R. Raj, R. Yau, M. McKiernan, V. Harinarayan, and A. Rajaraman. Anatomy of a gift recommendation engine powered by social media. In SIGMOD, 2012.
[19]
L. Sauermann, R. Cyganiak, and M. Völkel. Cool URIs for the semantic web. W3 Interest Group Note. http://www.w3.org/TR/cooluris/.
[20]
A. Singhal. Introducing the Knowledge Graph: things, not strings. Official Google Blog, May, 2012.
[21]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, 2007.
[22]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A large ontology from Wikipedia and WordNet. Web Semantics: Science, Services and Agents on the World Wide Web, 6(3):203--217, 2008.
[23]
R. E. Tarjan. Finding optimum branchings. Networks, 7(1):25--35, 1977.
[24]
J. Weaver and P. Tarjan. Facebook Linked Data via the Graph API. Semantic Web, 2012.
[25]
J. Zhu, Z. Nie, X. Liu, B. Zhang, and J. R. Wen. StatSnowball: a statistical approach to extracting entity relationships. In WWW, 2009.

Cited By

View all
  • (2024)Efficient and Reliable Estimation of Knowledge Graph AccuracyProceedings of the VLDB Endowment10.14778/3665844.366586517:9(2392-2403)Online publication date: 1-May-2024
  • (2024)Data Void Exploits: Tracking & Mitigation StrategiesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679781(1627-1637)Online publication date: 21-Oct-2024
  • (2024)Veracity Estimation for Entity-Oriented Search with Knowledge GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679561(1649-1659)Online publication date: 21-Oct-2024
  • Show More Cited By

Index Terms

  1. Building, maintaining, and using knowledge bases: a report from the trenches

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
    June 2013
    1322 pages
    ISBN:9781450320375
    DOI:10.1145/2463676
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data integration
    2. human curation
    3. information extraction
    4. knowledge base
    5. social media
    6. taxonomy
    7. wikipedia

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS'13
    Sponsor:

    Acceptance Rates

    SIGMOD '13 Paper Acceptance Rate 76 of 372 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient and Reliable Estimation of Knowledge Graph AccuracyProceedings of the VLDB Endowment10.14778/3665844.366586517:9(2392-2403)Online publication date: 1-May-2024
    • (2024)Data Void Exploits: Tracking & Mitigation StrategiesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679781(1627-1637)Online publication date: 21-Oct-2024
    • (2024)Veracity Estimation for Entity-Oriented Search with Knowledge GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679561(1649-1659)Online publication date: 21-Oct-2024
    • (2024)PKBC: A Product-Specific Knowledge base Taxonomy FrameworkDatabase Systems for Advanced Applications10.1007/978-981-97-5562-2_24(375-390)Online publication date: 27-Oct-2024
    • (2023)Examining Knowledge Extraction Processes from Heterogeneous Data SourcesBrilliant Engineering10.36937/ben.2023.47984:1(1-8)Online publication date: 8-Feb-2023
    • (2022)Toward Tweet Entity Linking With Heterogeneous Information NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.306809334:12(6003-6017)Online publication date: 1-Dec-2022
    • (2022)Networked Knowledge and Complex Networks: An Engineering ViewIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2022.1057379:8(1366-1383)Online publication date: Aug-2022
    • (2022)Learning Concept Lengths Accelerates Concept Learning in ALCThe Semantic Web10.1007/978-3-031-06981-9_14(236-252)Online publication date: 31-May-2022
    • (2021)Few-Shot Knowledge Validation using RulesProceedings of the Web Conference 202110.1145/3442381.3450040(3314-3324)Online publication date: 19-Apr-2021
    • (2021)GapFinder: Finding Inconsistency of Security Information From Unstructured TextIEEE Transactions on Information Forensics and Security10.1109/TIFS.2020.300357016(86-99)Online publication date: 2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media