Abstract
One of the applications of the Formal concept analysis (FCA) is the ability to extract structured information from textual documents. Typically, one can define a set of attributes that will characterize the objects. Consequently, these defined objects will be extracted by standard FCA algorithms. In this paper, we describe how FCA identifies and extracts personal names as units of thought similar to the decoding of text sequences by Viterbi algorithm as used with Hidden Markov Models. We further exhibit how FCA mimics the thought process that goes into a rule-based information extraction system. We then observe that the formal approach of FCA combined with already established computational techniques such as bottom up intersection algorithm avoids the difficulties associated with hand coding and maintenance of rule-based systems.
Similar content being viewed by others
References
Appelt DE, Israel DJ (1999) Introduction to information extraction technology. Tutorial Prepared for IJCAI-99
Ganter B, Wille R (1999) Formal Concept Analysis: Logical Foundations. Springer-Verlag
Dias SM, Vieira NJ (2013) Applying the jbos reduction method for relevant knowledge extraction. Expert Syst Appl 40(5):1880–1887
Freitag D, McCallum AKD (1999) Information extraction with hmms and shrinkage. In: Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction
Grishman R, Sundheim B (1996) Message understanding conference-6: a brief history. In: Proceedings of the 16th conference on Computational linguistics, vol 1, COLING ’96. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 466–471
Hall GR, Taghva K (2015) Using the web 1t 5-gram database for attribute selection in formal concept analysis to correct overstemmed clusters. In: 2015 12th International Conference on Information Technology—New Generations (ITNG), pp 651–654
Kumar CA, Ishwarya MS, Loo CK (2015) Formal concept analysis approach to cognitive functionalities of bidirectional associative memory. Biol Inspired Cogn Archit
Li J, Mei C, Weihua X, Qian Y (2015a) Concept learning via granular computing: a cognitive viewpoint. Inf Sci 298:447–467
Li J, Ren Y, Mei C, Qian Y, Yang Xibei (2016) A comparative study of multigranulation rough sets and concept lattices via rule acquisition. Knowl Based Syst 91:152–164
LSN2001 (2001) Licensing support network baselined design requirements. http://www.lsnnet.gov/
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26
Poibeau T, Kosseim L (2001) Proper name extraction from non-journalistic texts. Lang Comput 37(1):144–157
Powley B, Dale R (2007) High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007, pp 119–124
Priss U (2005) Linguistic applications of formal concept analysis. In: Formal Concept Analysis. Springer, pp 149–160
Rabiner LR (1989) Readings in speech recognition. In: Waibel A, Lee K-F (eds) Readings in speech recognition, chapter A tutorial on hidden Markov models and selected applications in speech recognition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 267–296, 1990. ISBN 1-55860-124-4
Rocha LM (2002) Proximity and semi-metric analysis of social networks. In: Report of Advanced Knowledge Integration In Assessing Terrorist Threats LDRD-DR Network Analysis Component. LAUR 02-6557
Siff M, Reps TW (1999) Identifying modules via concept analysis. IEEE Trans Softw Eng 25(6):749–768
Stumme G (2002) Efficient data mining based on formal concept analysis. In: DEXA, pp 534–546
Taghva K (2009) Identification of Sensitive Unclassified Information. Springer, pp 89–103
Taghva K, Gilbreth J (1999) Recognizing acronyms and their definitions. IJDAR 1(4):191–198
Taghva K, Coombs JS, Pereda R, Nartker TA (2005) Address extraction using hidden markov models. In: Proceedings Document Recognition and Retrieval XII, 16-20 January 2005, San Jose, California, USA, pp 119–126
Taghva K, Beckley R, Coombs JS (2006) The effects of ocr error on the extraction of private information. In: Document Analysis Systems, pp 348–357
Taghva K, Beckley R, Coombs JS (2011) Name extraction and formal concept analysis. In: Proceedings Conceptual Structures for Discovering Knowledge—19th International Conference on Conceptual Structures, ICCS 2011, Derby, UK, July 25–29, pp 339–345
US Government (2005) Frequently occurring first names and surnames from the 1990 census, 1990. View ed August, 2005. http://www.census.gov/genealogy/www/freqnames.html
Weihua X, Pang J, Luo S (2014) A novel cognitive system model and approach to transformation of information granules. Int J Approx Reason 55(3):853–866
Xu WH, Li WT (2016) Granular computing approach to two-way learning based on formal concept analysis in fuzzy datasets. IEEE Transactions on Cybernetics (To appear)
Acknowledgments
The author would like to thank anonymous reviewers for their contributions to this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Taghva, K. Name identification and extraction with formal concept analysis. Int. J. Mach. Learn. & Cyber. 8, 171–178 (2017). https://doi.org/10.1007/s13042-016-0514-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-016-0514-2