A Semi-automatic System for Knowledge Base Population

Goldstein-Stewart, Jade; Winder, Ransom K.

doi:10.1007/978-3-642-19032-2_21

Jade Goldstein-Stewart⁵ &
Ransom K. Winder⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 128))

Included in the following conference series:

International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management

844 Accesses

Abstract

The typical method for transferring key information from unstructured text to knowledge bases is laborious manual entry, but automated information extraction is still at unacceptable accuracies to replace it. A viable alternative is a user interface that allows correction and validation of assertions proposed by the automated extractor for entry into the knowledge base. In this paper, we discuss our system for semi-automatic database population and how issues arising in content extraction and knowledge base population are addressed. The major contributions are detailing challenges in building a semi-automated tool, classifying expected extraction errors, identifying the gaps in current extraction technology with regard to databasing, and designing and developing the FEEDE system that supports human correction of automated content extractors in order to speed up data entry into knowledge bases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Incremental knowledge base construction using DeepDive

Article 04 August 2016

Too Much Information: Can AI Cope with Modern Knowledge Graphs?

Knowledge Harvesting: Achievements and Challenges

References

Grishman, R., Sundheim, B.: Message Understanding Conference – 6: A Brief History. In: Proc. 16th International Conference on Computational Linguistics (COLING), Ministry of Research, Denmark, Copenhagen, pp. 466–471 (1996)
Google Scholar
ACE (Automatic Content Extraction) English Annotation Guidelines for Entities Version 5.6.1 (2005), http://projects.ldc.upenn.edu/ace/docs/English-Entities-Guidelines_v5.6.1.pdf
Vilain, M., Su, J., Lubar, S.: Entity Extraction is a Boring Solved Problem—Or is it? In: HLT-NAACL – Short Papers, pp. 181–184. ACL, Rochester (2007)
Google Scholar
Marsh, E., Perzanowsi, D.: MUC-7 Evaluation of IE Technology: Overview of Results (1998), http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_toc.html
ACE (Automatic Content Extraction) English Annotation Guidelines for Relations Version 5.8.3 (2005), http://projects.ldc.upenn.edu/ace/docs/English-Relations-Guidelines_v5.8.3.pdf
ACE (Automatic Content Extraction) English Annotation Guidelines for Events Version 5.4.3 (2005), http://projects.ldc.upenn.edu/ace/docs/English-Events-Guidelines_v5.4.3.pdf
Working Guidelines ACE++ Events (2007) (unpublished Internal Report)
Google Scholar
Automatic Content Extraction 2008 Evaluation Plan, http://www.nist.gov/speech/tests/ace/2008/doc/ace08-evalplan.v1.2d.pdf
Barclay, C., Boisen, S., Hyde, C., Weischedel, R.: The Hookah Information Extraction System. In: Proc. Workshop on TIPSTER II, pp. 79–82. ACL, Vienna (1996)
Google Scholar
Donaldson, I., Martin, J., de Bruijn, B., Wolting, C., Lay, V., Tuekam, B., Zhang, S., Baskin, B., Bader, G., Michalickova, K., Pawson, T., Hogue, C.: PreBIND and Textomy—Mining the Biomedical Literature for Protein-Protein Interactions Using a Support Vector Machine. BMC Bioinformatics 4(11) (2003)
Google Scholar
Ferro, L., Gerber, L., Mani, I., Sundheim, B., Wilson, G.: TIDES—2005 Standard for the Annotation of Temporal Expressions. Technical Report, MITRE (2005), http://timex2.mitre.org/annotation_guidelines/2005_timex2_standard_v1.1.pdf
Evaluation Scoring Script, v14a (2005), ftp://jaguar.ncsl.nist.gov/ace/resources/ace05-eval-v14a.pl
Harabagiu, S., Bunescu, R., Maiorano, S.: Text and Knowledge Mining for Coreference Resolution. In: Proc. 2nd Meeting of the North America Chapter of the Association for Computational Linguistics (NAACL 2001), pp. 55–62. ACL, Pittsburgh (2001)
Google Scholar
NIST 2005 Automatic Content Extraction Evaluation Official Results (2006), http://www.nist.gov/speech/tests/ace/2005/doc/ace05eval_official_results_20060110.html
Frokjaer, E., Hertzum, M., Hornbaek, K.: Measuring Usability: Are Effectiveness, Efficiency, and Satisfaction Really Correlated? In: Proc. ACM CHI 2000 Conference on Human Factors in Computing Systems, pp. 345–352. ACM Press, The Hague (2000)
Google Scholar
Haimson, C., Grossman, J.: A GOMSL analysis of semi-automated data entry. In: Proc. ACM SIGCHI Symposium on Engineering Interactive Computing Systems, pp. 61–66. ACM, Pittsburgh (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

U.S. Department of Defense, Washington, U.S.A.
Jade Goldstein-Stewart
The MITRE Corporation, Annapolis Junction, MD, U.S.A.
Ransom K. Winder

Authors

Jade Goldstein-Stewart
View author publications
You can also search for this author in PubMed Google Scholar
Ransom K. Winder
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Delft University of Technology, Mekelweg 4, 2628, Delft, CD, The Netherlands
Jan L. G. Dietz
Informatics Research Centre, Henley Business School, University of Reading, RG6 6UD, Reading, UK
Kecheng Liu
Departament of Systems and Informatics, Polytechnic Institute of Setúbal – INSTICC, Rua do Vale de Chaves - Estefanilha, 2910-761, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goldstein-Stewart, J., Winder, R.K. (2011). A Semi-automatic System for Knowledge Base Population. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowlege Engineering and Knowledge Management. IC3K 2009. Communications in Computer and Information Science, vol 128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19032-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-19032-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19031-5
Online ISBN: 978-3-642-19032-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics