Abstract
The typical method for transferring key information from unstructured text to knowledge bases is laborious manual entry, but automated information extraction is still at unacceptable accuracies to replace it. A viable alternative is a user interface that allows correction and validation of assertions proposed by the automated extractor for entry into the knowledge base. In this paper, we discuss our system for semi-automatic database population and how issues arising in content extraction and knowledge base population are addressed. The major contributions are detailing challenges in building a semi-automated tool, classifying expected extraction errors, identifying the gaps in current extraction technology with regard to databasing, and designing and developing the FEEDE system that supports human correction of automated content extractors in order to speed up data entry into knowledge bases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Grishman, R., Sundheim, B.: Message Understanding Conference – 6: A Brief History. In: Proc. 16th International Conference on Computational Linguistics (COLING), Ministry of Research, Denmark, Copenhagen, pp. 466–471 (1996)
ACE (Automatic Content Extraction) English Annotation Guidelines for Entities Version 5.6.1 (2005), http://projects.ldc.upenn.edu/ace/docs/English-Entities-Guidelines_v5.6.1.pdf
Vilain, M., Su, J., Lubar, S.: Entity Extraction is a Boring Solved Problem—Or is it? In: HLT-NAACL – Short Papers, pp. 181–184. ACL, Rochester (2007)
Marsh, E., Perzanowsi, D.: MUC-7 Evaluation of IE Technology: Overview of Results (1998), http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_toc.html
ACE (Automatic Content Extraction) English Annotation Guidelines for Relations Version 5.8.3 (2005), http://projects.ldc.upenn.edu/ace/docs/English-Relations-Guidelines_v5.8.3.pdf
ACE (Automatic Content Extraction) English Annotation Guidelines for Events Version 5.4.3 (2005), http://projects.ldc.upenn.edu/ace/docs/English-Events-Guidelines_v5.4.3.pdf
Working Guidelines ACE++ Events (2007) (unpublished Internal Report)
Automatic Content Extraction 2008 Evaluation Plan, http://www.nist.gov/speech/tests/ace/2008/doc/ace08-evalplan.v1.2d.pdf
Barclay, C., Boisen, S., Hyde, C., Weischedel, R.: The Hookah Information Extraction System. In: Proc. Workshop on TIPSTER II, pp. 79–82. ACL, Vienna (1996)
Donaldson, I., Martin, J., de Bruijn, B., Wolting, C., Lay, V., Tuekam, B., Zhang, S., Baskin, B., Bader, G., Michalickova, K., Pawson, T., Hogue, C.: PreBIND and Textomy—Mining the Biomedical Literature for Protein-Protein Interactions Using a Support Vector Machine. BMC Bioinformatics 4(11) (2003)
Ferro, L., Gerber, L., Mani, I., Sundheim, B., Wilson, G.: TIDES—2005 Standard for the Annotation of Temporal Expressions. Technical Report, MITRE (2005), http://timex2.mitre.org/annotation_guidelines/2005_timex2_standard_v1.1.pdf
Evaluation Scoring Script, v14a (2005), ftp://jaguar.ncsl.nist.gov/ace/resources/ace05-eval-v14a.pl
Harabagiu, S., Bunescu, R., Maiorano, S.: Text and Knowledge Mining for Coreference Resolution. In: Proc. 2nd Meeting of the North America Chapter of the Association for Computational Linguistics (NAACL 2001), pp. 55–62. ACL, Pittsburgh (2001)
NIST 2005 Automatic Content Extraction Evaluation Official Results (2006), http://www.nist.gov/speech/tests/ace/2005/doc/ace05eval_official_results_20060110.html
Frokjaer, E., Hertzum, M., Hornbaek, K.: Measuring Usability: Are Effectiveness, Efficiency, and Satisfaction Really Correlated? In: Proc. ACM CHI 2000 Conference on Human Factors in Computing Systems, pp. 345–352. ACM Press, The Hague (2000)
Haimson, C., Grossman, J.: A GOMSL analysis of semi-automated data entry. In: Proc. ACM SIGCHI Symposium on Engineering Interactive Computing Systems, pp. 61–66. ACM, Pittsburgh (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Goldstein-Stewart, J., Winder, R.K. (2011). A Semi-automatic System for Knowledge Base Population. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowlege Engineering and Knowledge Management. IC3K 2009. Communications in Computer and Information Science, vol 128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19032-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-19032-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19031-5
Online ISBN: 978-3-642-19032-2
eBook Packages: Computer ScienceComputer Science (R0)