poster

Using natural language to integrate, evaluate, and optimize extracted knowledge bases

Authors:
Doug Downey

Northwestern University, Evanston, IL, USA

Northwestern University, Evanston, IL, USA
View Profile

,
Chandra Sekhar Bhagavatula

Northwestern University, Evanston, IL, USA

Northwestern University, Evanston, IL, USA
View Profile

,
Alexander Yates

Temple University, Philadelphia, PA, USA

Temple University, Philadelphia, PA, USA
View Profile

AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base constructionOctober 2013Pages 61–66https://doi.org/10.1145/2509558.2509569

Published:27 October 2013Publication History

AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction

Pages 61–66

ABSTRACT

Web Information Extraction (WIE) systems extract billions of unique facts, but integrating the assertions into a coherent knowledge base and evaluating across different WIE techniques remains a challenge. We propose a framework that utilizes natural language to integrate and evaluate extracted knowledge bases (KBs). In the framework, KBs are integrated by exchanging probability distributions over natural language, and evaluated by how well the output distributions predict held-out text. We describe the advantages of the approach, and detail remaining research challenges.

References

Cynthia Matuszek Michael, Michael Witbrock, Robert C. Kahlert, John Cabral, Dave Schneider, Purvesh Shah, and Doug Lenat. Searching for common sense: Populating cyc from the web. In In Proceedings of the Twentieth National Conference on Artificial Intelligence, pages 1430--1435, 2005. Google ScholarDigital Library
O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91--134, 2005. Google ScholarDigital Library
Kenneth D Forbus, Christopher Riesbeck, Lawrence Birnbaum, Kevin Livingston, Abhishek Sharma, and Leo Ureel. Integrating natural language, knowledge representation and reasoning, and analogical processing to learn by reading. In PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, volume 22, page 1542. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2007. Google ScholarDigital Library
M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Procs. of IJCAI, 2007. Google ScholarDigital Library
Wolfgang Gatterbauer, Paul Bohunsky, Marcus Herzog, Bernhard Krüpl, and Bernhard Pollak. Towards domain-independent information extraction from web tables. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 71--80, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Michael J. Cafarella, Alon Y. Halevy, Daisy Z. Wang, Eugene W. 0002, and Yang Zhang. Webtables: exploring the power of tables on the web. PVLDB, 1(1):538--549, 2008. Google ScholarDigital Library
Fei Wu and Daniel S. Weld. Automatically refining the wikipedia infobox ontology. In Proc. of WWW, 2008. Google ScholarDigital Library
F.M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge. In Procs. of WWW, 2007. Google ScholarDigital Library
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), 2010.Google ScholarDigital Library
Sören Auer and Jens Lehmann. What have innsbruck and leipzig in common? extracting semantics from wiki content. In Proc. of ESWC, 2007. Google ScholarDigital Library
James Fan, David Ferrucci, David Gondek, and Aditya Kalyanpur. Prismatic: Inducing knowledge from a large scale lexicalized relation resource. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 122--127. Association for Computational Linguistics, 2010. Google ScholarDigital Library
David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A Kalyanpur, Adam Lally, J William Murdock, Eric Nyberg, John Prager, et al. Building watson: An overview of the deepqa project. AI magazine, 31(3):59--79, 2010.Google ScholarDigital Library
M. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora. In Procs. of the 14th International Conference on Computational Linguistics, pages 539--545, Nantes, France, 1992. Google ScholarDigital Library
Doug Downey, Oren Etzioni, and Stephen Soderland. Analysis of a probabilistic model of redundancy in unsupervised information extraction. Artificial Intelligence, 174(11):726 -- 748, 2010. Google ScholarDigital Library
Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain. Organizing and searching the world wide web of facts - step one: The one-million fact extraction challenge. In AAAI 2006. AAAI Press, 2006. Google ScholarDigital Library
Fei Wu, Raphael Hoffmann, and Daniel S. Weld. Information extraction from wikipedia: moving down the long tail. In Proc. of KDD, 2008. Google ScholarDigital Library
Fei Wu and Daniel S. Weld. Autonomously semantifying wikipedia. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 41--50, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Hector Gonzalez, Alon Y Halevy, Christian S Jensen, Anno Langen, Jayant Madhavan, Rebecca Shapley, Warren Shen, and Jonathan Goldberg-Kidon. Google fusion tables: web-centered data management and collaboration. In Proceedings of the 2010 international conference on Management of data, pages 1061--1066. ACM, 2010. Google ScholarDigital Library
Push Singh, Thomas Lin, Erik T Mueller, Grace Lim, Travell Perkins, and Wan Li Zhu. Open mind common sense: Knowledge acquisition from the general public. In On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE, pages 1223--1237. Springer, 2002. Google ScholarDigital Library
L.K. Schubert and M.H. Tong. Extracting and evaluating general world knowledge from the brown corpus. In Proc. of the HLT/NAACL Workshop on Text Meaning, 2003. Google ScholarDigital Library
AnHai Doan and Alon Y. Halevy. Semantic-integration research in the database community. AI Mag., 26(1):83--94, 2005. Google ScholarDigital Library
Christian Bizer, Tom Heath, Kingsley Idehen, and Tim Berners-Lee. Linked data on the web (ldow2008). In Proceedings of the 17th international conference on World Wide Web, pages 1265--1266. ACM, 2008. Google ScholarDigital Library
O. Medelyan and C. Legg. Integrating cyc and wikipedia: Folksonomy meets rigorously defined common-sense. In Proc. of WIKIAI, 2008.Google Scholar
D. Downey, A. Ahuja, and M. Anderson. Learning to integrate relational databases with wikipedia. In Proc. of WIKIAI, 2009.Google Scholar
Thomas Lin, Oren Etzioni, et al. Entity linking at web scale. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 84--88. Association for Computational Linguistics, 2012. Google ScholarDigital Library
Z. Harris. Distributional structure. In J. J. Katz, editor, The Philosophy of Linguistics, pages 26--47. New York: Oxford University Press, 1985.Google Scholar
Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. Methods for Exploring and Mining Tables on Wikipedia. In Proceedings of the ACM SIGKDD Interactive Data Exploration and Analytics (IDEA). ACM, 2013. Google ScholarDigital Library
Hoifung Poon, Janara Christensen, Pedro Domingos, Oren Etzioni, Raphael Hoffmann, Chloe Kiddon, Thomas Lin, Xiao Ling, Alan Ritter, Stefan Schoenmackers, et al. Machine reading at the university of washington. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 87--95. Association for Computational Linguistics, 2010. Google ScholarDigital Library
Jonathan Gordon and Benjamin Van Durme. Reporting bias and knowledge acquisition. In Automated Knowledge Base Construction (AKBC): The 3rd Workshop on Knowledge Extraction at CIKM, 2013. Google ScholarDigital Library
Fei Huang, Arun Ahuja, Doug Downey, Yi Yang, Yuhong Guo, and Alexander Yates. Learning Representations for Weakly Supervised Natural Language Processing Tasks. Computational Linguistics, xx:yy, 2013.Google Scholar
Noah A Smith. Adversarial evaluation for models of natural language. arXiv preprint arXiv:1207.0245, 2012.Google Scholar
Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008. Google ScholarDigital Library
Jeff Mitchell and Mirella Lapata. Composition in distributional models of semantics. Cognitive Science, 34(8):1388--1429, 2010.Google ScholarCross Ref
Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1201--1211. Association for Computational Linguistics, 2012. Google ScholarDigital Library
Jason Wolfe, Aria Haghighi, and Dan Klein. Fully distributed em for very large datasets. In ICML, 2008. Google ScholarDigital Library
Yi Yang, Alexander Yates, and Doug Downey. Overcoming the memory bottleneck in distributed training of latent variable models of text. In Proceedings of NAACL-HLT, pages 579--584, 2013.Google Scholar
Burr Settles. Active learning literature survey. University of Wisconsin, Madison, 2010.Google ScholarDigital Library
Michael Lucas and Doug Downey. Scaling semi-supervised naive bayes with feature marginals. In Proceedings of ACL, 2013.Google Scholar

Index Terms

Using natural language to integrate, evaluate, and optimize extracted knowledge bases
1. Information systems
  1. Information retrieval
  2. World Wide Web
    1. Web applications
    2. Web services

Recommendations

Representation, Analysis, and Extraction of Knowledge from Unstructured Natural Language Texts
Abstract
This article overviews means of description logics for representing knowledge contained in natural language texts and a classification of description logics by constructors of concepts and roles. It also considers basic conceptions of temporal ...
Read More
How to make knowledge resources valuable

PurposeThis paper aims to offer an integration point for newly acquired heterogeneous knowledge resources to be assessed if these resources qualify to be a part of a firm's existing knowledge resource portfolio. Focus of this paper will be on the ...
Read More
Deep knowledge integration of heterogeneous features for domain adaptive SAR target recognition
Highlights
- Deep knowledge integration at the feature and the decision levels based on heterogeneous features.
Abstract
How to integrate various heterogeneous features for better recognition performance is increasingly critical for automatic target recognition. Existing integration methods present the following drawbacks: (1) most feature integration ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction
October 2013
124 pages
ISBN:9781450324113
DOI:10.1145/2509558
Program Chairs:
Fabian M. Suchanek
Max Planck Institute for Informatics, Germany
,
Sebastian Riedel
University College London, UK
,
Sameer Singh
University of Massachusetts Amherst, USA
,
Partha Pratim Talukdar
Carnegie Mellon University, USA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
knowledge extraction
knowledge integration
language modeling
Qualifiers
- poster
Conference

Acceptance Rates
AKBC '13 Paper Acceptance Rate9of19submissions,47%Overall Acceptance Rate9of19submissions,47%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 129
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using natural language to integrate, evaluate, and optimize extracted knowledge bases

AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Representation, Analysis, and Extraction of Knowledge from Unstructured Natural Language Texts

How to make knowledge resources valuable

Deep knowledge integration of heterogeneous features for domain adaptive SAR target recognition