Unsupervised Learning of Pattern Templates from Unannotated Corpora for Proper Noun Extraction

Kang, Seung-Shik; Woo, Chong-Woo

doi:10.1007/3-540-39205-X_103

Seung-Shik Kang⁵ &
Chong-Woo Woo⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2639))

Included in the following conference series:

International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing

706 Accesses

Abstract

This paper describes an approach to extracting proper nouns in the very large text corpora without using the lexicon or cue word dictionary. At first, we train the pattern for extracting the proper nouns by applying the initial proper names into the unannotated corpora that does not have any tags yet. And then we continuously apply the pattern templates into the corpora in order to extract new proper nouns until certain period.

This work was supported by the Korea Science and Engineering Foundation(KOSEF) through the Advanced Information Technology Research Center(AITrc).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

MUC, Proc. of 7^th Message Understanding Conference(MUC-7), (1998)
Google Scholar
Borthwick, A.: A Japanese Named Entity Recognizer Constructed by a Non-speaker of Japanese. In Proc. of the IREX Workshop (1999) 187–193
Google Scholar
Yangaber, R., W. Lin, and R. Grishman: Unsupervised Learning of Generalized Names. In Proc. of the 19 ^th International Conference on Computational Linguistics, (2002) 1135–1141
Google Scholar
Stevenson, M. and R. Gaizauskas: Improving Named Entity Recognition using Annotated Corpora. LREC Workshop on Information Extraction meets Corpus Linguistics (2000)
Google Scholar
Kang, S.: Korean Morphological Analyzer. http://nlp.kookmin.ac.kr/ (2000)

Download references

Author information

Authors and Affiliations

School of Computer Science, Kookmin University & AITrc, Seoul, 136-702, Korea
Seung-Shik Kang & Chong-Woo Woo

Authors

Seung-Shik Kang
View author publications
You can also search for this author in PubMed Google Scholar
Chong-Woo Woo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, P.R. China
Guoyin Wang
Department of Computer Science, Nanchang University, Nanchang, 330029, P.R. China
Qing Liu
Department of Computer Science, University of Regina, Regina, Saskatchewan, S4S 0A2, Canada
Yiyu Yao
Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, SS., Woo, CW. (2003). Unsupervised Learning of Pattern Templates from Unannotated Corpora for Proper Noun Extraction. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. RSFDGrC 2003. Lecture Notes in Computer Science(), vol 2639. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39205-X_103

Download citation

DOI: https://doi.org/10.1007/3-540-39205-X_103
Published: 30 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-14040-5
Online ISBN: 978-3-540-39205-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics