Skip to main content
Log in

Conditional random fields for entity extraction and ontological text coding

  • Published:
Computational and Mathematical Organization Theory Aims and scope Submit manuscript

Abstract

Previous research suggests that one field with a strong yet unsatisfied need for automatically extracting instances of various entity classes from texts is the analysis of socio-technical systems (Feldstein in Media in Transition MiT5, 2007; Hampe et al. in Netzwerkanalyse und Netzwerktheorie, 2007; Weil et al. in Proceedings of the 2006 Command and Control Research and Technology Symposium, 2006; Diesner and Carley in XXV Sunbelt Social Network Conference, 2005). Traditional as well as non-traditional and customized sets of entity classes and the relationships between them are often specified in ontologies or taxonomies. We present a Conditional Random Fields (CRF)-based approach to distilling a set of entities that are defined in an ontology originating from organization science. CRF, a supervised sequential machine learning technique, facilitates the derivation of relational data from corpora by locating and classifying instances of various entity classes. The classified entities can be used as nodes for the construction of socio-technical networks. We find the outcome sufficiently accurate (82.7 percent accuracy of locating and classifying entities) for future application in the described problem domain. We propose using the presented methodology as a crucial step in the process of advanced modeling and analysis of complex and dynamic networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bikel DM, Schwartz R, Weischedel RM (1999) An algorithm that learns what’s in a name. Mach Learn 34(1–3):211–231

    Article  Google Scholar 

  • Borthwick A, Sterling J, Agichtein E, Grishman R (1998) Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Sixth workshop on very large corpora association for computational linguistics, Montreal, QC, Canada, August 1998, pp 152–160

  • Carley KM (2002) Smart agents and organizations of the future. In: Lievrouw L, Livingstone S (eds) The handbook of new media. Sage, Thousand Oaks, pp 206–220

    Google Scholar 

  • Carley KM, Frantz T, Diesner J (2006) Social and knowledge networks from large scale databases. In: 56th annual conference of the international communication association (ICA), Dresden, Germany, June 2006

  • CoNLL-2003 (2003) In: Proceedings of seventh conference on natural language learning (CoNLL-2003), Edmonton, Canada, May–June 2003

  • Diesner J, Carley KM (2005) Revealing and comparing the organizational structure of covert networks with network text analysis. In: XXV Sunbelt social network conference, Redondo Beach X, CA, February 2005

  • Diesner J, Carley KM (2006) Revealing social structure from texts: meta-matrix text analysis as a novel method for network text analysis. In: Narayanan VK, Armstrong DJ (eds) Causal mapping for information systems and technology research: approaches, advances, and illustrations. Idea Group, Harrisburg, pp 81–108

    Google Scholar 

  • Dietterich TG (2002) Machine learning for sequential data: A review. In: Joint IAPR international workshops SSPR 2002 and SPR 2002, Windsor, ON, Canada, August 2002

  • Feldstein A (2007) Brand communities in a world of knowledge-based products and common property. Media in Transition MiT5, Cambridge, MA, April 2007

  • Freitag D (1997) Using grammatical inference to improve precision in information extraction. In: Fourteenth international conference on machine learning, workshop on automata induction, grammatical inference, and language acquisition, Nashville, TN

  • Hampe P, Hatzel I, Höhnsch J, Ueschner P (2007) Forschungsgruppe Transparentes Parlament, Ein neues Paradigma in den Sozialwissenschaften. Netzwerkanalyse und Netzwerktheorie, Frankfurt, Germany, September 2007

  • Krackhardt D, Carley KM (1998) A PCANS model of structure in organization. In: Proceedings of the 1998 international symposium on command and control, research and technology, Monterrey, CA, June 1998, pp 113–119

  • Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Eighteenth international conference on machine learning (ICML-01), San Francisco, pp 282–289

  • Linguistic Data Consortium (LDC) Automatic Content Extraction (ACE) (2007) http://projectsldcupennedu/ace/ DARPA. Accessed March 20 2007

  • McCallum A (2005) Information extraction: distilling structured data from unstructured text. ACM Queue 3(9):48–57

    Article  Google Scholar 

  • Message Understanding Conferences (MUC) 6 (2006) Named entity task definition 1995. http://csnyuedu/cs/faculty/grishman/NEtask20book_1html. Accessed 11 March 2007

  • Sarawagi S (n.d.) CRF project page. http://crfsourceforgenet/. Accessed 20 March 2007

  • Sha F, Pereira F (2003) Shallow parsing with conditional random fields. In: Proceedings of human language technology, HLT-NAACL 2003, pp 213–220

  • Weil SA, Carley KM, Diesner J, Freeman J, Cooke NJ (2006) Measuring situational awareness through analysis of communications: A preliminary exercise. In: Proceedings of the 2006 command and control research and technology symposium, San Diego, CA

  • Weischedel R, Brunstein A (2005) In: BBN pronoun coreference and entity type corpus linguistic data consortium, Philadelphia, LDC2005T33

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jana Diesner.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Diesner, J., Carley, K.M. Conditional random fields for entity extraction and ontological text coding. Comput Math Organiz Theor 14, 248–262 (2008). https://doi.org/10.1007/s10588-008-9029-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10588-008-9029-z

Keywords

Navigation