Skip to main content
Log in

Name identification and extraction with formal concept analysis

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

One of the applications of the Formal concept analysis (FCA) is the ability to extract structured information from textual documents. Typically, one can define a set of attributes that will characterize the objects. Consequently, these defined objects will be extracted by standard FCA algorithms. In this paper, we describe how FCA identifies and extracts personal names as units of thought similar to the decoding of text sequences by Viterbi algorithm as used with Hidden Markov Models. We further exhibit how FCA mimics the thought process that goes into a rule-based information extraction system. We then observe that the formal approach of FCA combined with already established computational techniques such as bottom up intersection algorithm avoids the difficulties associated with hand coding and maintenance of rule-based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Appelt DE, Israel DJ (1999) Introduction to information extraction technology. Tutorial Prepared for IJCAI-99

  2. Ganter B, Wille R (1999) Formal Concept Analysis: Logical Foundations. Springer-Verlag

  3. Dias SM, Vieira NJ (2013) Applying the jbos reduction method for relevant knowledge extraction. Expert Syst Appl 40(5):1880–1887

    Article  Google Scholar 

  4. Freitag D, McCallum AKD (1999) Information extraction with hmms and shrinkage. In: Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction

  5. Grishman R, Sundheim B (1996) Message understanding conference-6: a brief history. In: Proceedings of the 16th conference on Computational linguistics, vol 1, COLING ’96. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 466–471

  6. Hall GR, Taghva K (2015) Using the web 1t 5-gram database for attribute selection in formal concept analysis to correct overstemmed clusters. In: 2015 12th International Conference on Information Technology—New Generations (ITNG), pp 651–654

  7. Kumar CA, Ishwarya MS, Loo CK (2015) Formal concept analysis approach to cognitive functionalities of bidirectional associative memory. Biol Inspired Cogn Archit

  8. Li J, Mei C, Weihua X, Qian Y (2015a) Concept learning via granular computing: a cognitive viewpoint. Inf Sci 298:447–467

    Article  MathSciNet  Google Scholar 

  9. Li J, Ren Y, Mei C, Qian Y, Yang Xibei (2016) A comparative study of multigranulation rough sets and concept lattices via rule acquisition. Knowl Based Syst 91:152–164

    Article  Google Scholar 

  10. LSN2001 (2001) Licensing support network baselined design requirements. http://www.lsnnet.gov/

  11. Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26

    Article  Google Scholar 

  12. Poibeau T, Kosseim L (2001) Proper name extraction from non-journalistic texts. Lang Comput 37(1):144–157

    MATH  Google Scholar 

  13. Powley B, Dale R (2007) High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007, pp 119–124

  14. Priss U (2005) Linguistic applications of formal concept analysis. In: Formal Concept Analysis. Springer, pp 149–160

  15. Rabiner LR (1989) Readings in speech recognition. In: Waibel A, Lee K-F (eds) Readings in speech recognition, chapter A tutorial on hidden Markov models and selected applications in speech recognition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 267–296, 1990. ISBN 1-55860-124-4

  16. Rocha LM (2002) Proximity and semi-metric analysis of social networks. In: Report of Advanced Knowledge Integration In Assessing Terrorist Threats LDRD-DR Network Analysis Component. LAUR 02-6557

  17. Siff M, Reps TW (1999) Identifying modules via concept analysis. IEEE Trans Softw Eng 25(6):749–768

    Article  Google Scholar 

  18. Stumme G (2002) Efficient data mining based on formal concept analysis. In: DEXA, pp 534–546

  19. Taghva K (2009) Identification of Sensitive Unclassified Information. Springer, pp 89–103

  20. Taghva K, Gilbreth J (1999) Recognizing acronyms and their definitions. IJDAR 1(4):191–198

    Article  Google Scholar 

  21. Taghva K, Coombs JS, Pereda R, Nartker TA (2005) Address extraction using hidden markov models. In: Proceedings Document Recognition and Retrieval XII, 16-20 January 2005, San Jose, California, USA, pp 119–126

  22. Taghva K, Beckley R, Coombs JS (2006) The effects of ocr error on the extraction of private information. In: Document Analysis Systems, pp 348–357

  23. Taghva K, Beckley R, Coombs JS (2011) Name extraction and formal concept analysis. In: Proceedings Conceptual Structures for Discovering Knowledge—19th International Conference on Conceptual Structures, ICCS 2011, Derby, UK, July 25–29, pp 339–345

  24. US Government (2005) Frequently occurring first names and surnames from the 1990 census, 1990. View ed August, 2005. http://www.census.gov/genealogy/www/freqnames.html

  25. Weihua X, Pang J, Luo S (2014) A novel cognitive system model and approach to transformation of information granules. Int J Approx Reason 55(3):853–866

    Article  MathSciNet  MATH  Google Scholar 

  26. Xu WH, Li WT (2016) Granular computing approach to two-way learning based on formal concept analysis in fuzzy datasets. IEEE Transactions on Cybernetics (To appear)

Download references

Acknowledgments

The author would like to thank anonymous reviewers for their contributions to this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazem Taghva.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Taghva, K. Name identification and extraction with formal concept analysis. Int. J. Mach. Learn. & Cyber. 8, 171–178 (2017). https://doi.org/10.1007/s13042-016-0514-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-016-0514-2

Keywords

Navigation