skip to main content
10.1145/1008992.1009042acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Focused named entity recognition using machine learning

Published: 25 July 2004 Publication History

Abstract

In this paper we study the problem of finding most topical named entities among all entities in a document, which we refer to as focused named entity recognition. We show that these focused named entities are useful for many natural language processing applications, such as document summarization, search result ranking, and entity detection and tracking. We propose a statistical model for focused named entity recognition by converting it into a classification problem. We then study the impact of various linguistic features and compare a number of classification algorithms. From experiments on an annotated Chinese news corpus, we demonstrate that the proposed method can achieve near human-level accuracy.

References

[1]
R. Barzilay and M. Elhadad. Using lexical chains for text summarization. In Proceedings of the ACL Intelligent Scalable Text Summarization Workshop (ISTS'97), pages 10--17, 1997.
[2]
F. J. Damerau, T. Zhang, S. M. Weiss, and N. Indurkhya. Text categorization for a comprehensive time-dependent benchmark. Information Processing & Management, 40(2):209--221, 2004.
[3]
H. P. Edmundson. New methods in automatic abstracting. Journal of The Association for Computing Machinery, 16(2):264--285, 1969.
[4]
J. Y. Ge, X. J. Huang, and L. Wu. Approaches to event-focused summarization based on named entities and query words. In DUC 2003 Workshop on Text Summarization, 2003.
[5]
E. Hovy and C.-Y. Lin. Automated text summarization in summarist. In I. Mani and M. Maybury, editors, Advances in Automated Text Summarization, pages 81--94. MIT Press, 1999.
[6]
D. E. Johnson, F. J. Oles, T. Zhang, and T. Goetz. A decision-tree-based symbolic rule induction system for text categorization. IBM Systems Journal, 41:428--437, 2002.
[7]
M.-Y. Kan and K. R. McKeown. Information extraction and summarization: domain independence through focus types. Columbia University Computer Science Technical Report CUCS-030-99.
[8]
J. M. Kupiec, J. Pedersen, and F. Chen. A trainable document summarizer. In SIGIR '95, pages 68--73, 1995.
[9]
D. Lawrie, W. B. Croft, and A. Rosenberg. Finding topic words for hierarchical summarization. In SIGIR '01, pages 349--357, 2001.
[10]
F. Li and Y. Yang. A loss function analysis for classification methods in text categorization. In ICML 03, pages 472--479, 2003.
[11]
C.-Y. Lin. Training a selection function for extraction. In CIKM '99, pages 1--8, 1999.
[12]
C.-Y. Lin and E. Hovy. Identifying topics by position. In Proceedings of the Applied Natural Language Processing Conference (ANLP-97), pages 283--290, 1997.
[13]
D. Marcu. From discourse structures to text summaries. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization, pages 82--88. ACL, 1997.
[14]
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pages 41--48, 1998.
[15]
J. L. Neto, A. Santos, C. Kaestner, A. Freitas, and J. Nievola. A trainable algorithm for summarizing news stories. In Proceedings of PKDD'2000 Workshop on Machine Learning and Textual Information Access, September 2000.
[16]
C. Nobata, S. Sekine, H. Isahara, and R. Grishman. Summarization system integrated with named entity tagging and ie pattern discovery. In Proceedings of Third International Conference on Language Resources and Evaluation (LREC 2002), 2002.
[17]
C. D. Paice and P. A. Jones. The identification of important concepts in highly structured technical papers. In SIGIR '93, pages 69--78. ACM, 1993.
[18]
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[19]
E. F. T. K. Sang and F. D. Meulder. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of CoNLL-2003, pages 142--147, 2003.
[20]
W.-M. Soon, H.-T. Ng, and C.-Y. Lim. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521--544, 2001.
[21]
S. Teufel and M. Moens. Sentence extraction as a classification task. In ACL/EACL-97 Workshop on Intelligent and Scalable Text Summarization, 1997.
[22]
T. Zhang. On the dual formulation of regularized linear systems. Machine Learning, 46:91--129, 2002.
[23]
T. Zhang, F. Damerau, and D. E. Johnson. Text chunking based on a generalization of Winnow. Journal of Machine Learning Research, 2:615--637, 2002.
[24]
T. Zhang and F. J. Oles. Text categorization based on regularized linear classification methods. Information Retrieval, 4:5--31, 2001.

Cited By

View all
  • (2024)Named Entity Recognition in Aviation Products Domain Based on BERTIEEE Access10.1109/ACCESS.2024.351639012(189710-189721)Online publication date: 2024
  • (2021)“FabNER”: information extraction from manufacturing process science domain literature using named entity recognitionJournal of Intelligent Manufacturing10.1007/s10845-021-01807-x33:8(2393-2407)Online publication date: 24-Jun-2021
  • (2020)Datasets and Performance Metrics for Greek Named Entity Recognition11th Hellenic Conference on Artificial Intelligence10.1145/3411408.3411437(160-167)Online publication date: 2-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
July 2004
624 pages
ISBN:1581138814
DOI:10.1145/1008992
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. decision tree
  2. information retrieval
  3. naive bayes
  4. robust risk minimization
  5. text summarization
  6. topic identification

Qualifiers

  • Article

Conference

SIGIR04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Named Entity Recognition in Aviation Products Domain Based on BERTIEEE Access10.1109/ACCESS.2024.351639012(189710-189721)Online publication date: 2024
  • (2021)“FabNER”: information extraction from manufacturing process science domain literature using named entity recognitionJournal of Intelligent Manufacturing10.1007/s10845-021-01807-x33:8(2393-2407)Online publication date: 24-Jun-2021
  • (2020)Datasets and Performance Metrics for Greek Named Entity Recognition11th Hellenic Conference on Artificial Intelligence10.1145/3411408.3411437(160-167)Online publication date: 2-Sep-2020
  • (2019)An Improved Word Representation for Deep Learning Based NER in Indian LanguagesInformation10.3390/info1006018610:6(186)Online publication date: 30-May-2019
  • (2017)The Effect of Corpora Size on Performance of Named Entity RecognitionHighlighting the Importance of Big Data Management and Analysis for Various Applications10.1007/978-3-319-60255-4_8(93-105)Online publication date: 23-Aug-2017
  • (2016)Recognition of Chemical Entities using Pattern Matching and Functional Group ClassificationInternational Journal of Intelligent Information Technologies10.4018/IJIIT.201610010212:4(21-44)Online publication date: 1-Oct-2016
  • (2015)A Logic-Based Approach to Named-Entity Disambiguation in the Web of DataAI*IA 2015 Advances in Artificial Intelligence10.1007/978-3-319-24309-2_28(367-380)Online publication date: 17-Oct-2015
  • (2014)AMRITA_CEN@FIRE-2014Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/2824864.2824882(103-111)Online publication date: 5-Dec-2014
  • (2014)Crime analysis and prediction using data mining2014 First International Conference on Networks & Soft Computing (ICNSC2014)10.1109/CNSC.2014.6906719(406-412)Online publication date: Aug-2014
  • (2013)Topic-Oriented words as features for named entity recognitionProceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I10.1007/978-3-642-37247-6_25(304-316)Online publication date: 24-Mar-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media