Article

Focused named entity recognition using machine learning

Authors:

Tong ZhangAuthors Info & Claims

SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 281 - 288

https://doi.org/10.1145/1008992.1009042

Published: 25 July 2004 Publication History

Abstract

In this paper we study the problem of finding most topical named entities among all entities in a document, which we refer to as focused named entity recognition. We show that these focused named entities are useful for many natural language processing applications, such as document summarization, search result ranking, and entity detection and tracking. We propose a statistical model for focused named entity recognition by converting it into a classification problem. We then study the impact of various linguistic features and compare a number of classification algorithms. From experiments on an annotated Chinese news corpus, we demonstrate that the proposed method can achieve near human-level accuracy.

References

[1]

R. Barzilay and M. Elhadad. Using lexical chains for text summarization. In Proceedings of the ACL Intelligent Scalable Text Summarization Workshop (ISTS'97), pages 10--17, 1997.

[2]

F. J. Damerau, T. Zhang, S. M. Weiss, and N. Indurkhya. Text categorization for a comprehensive time-dependent benchmark. Information Processing & Management, 40(2):209--221, 2004.

Digital Library

[3]

H. P. Edmundson. New methods in automatic abstracting. Journal of The Association for Computing Machinery, 16(2):264--285, 1969.

Digital Library

[4]

J. Y. Ge, X. J. Huang, and L. Wu. Approaches to event-focused summarization based on named entities and query words. In DUC 2003 Workshop on Text Summarization, 2003.

[5]

E. Hovy and C.-Y. Lin. Automated text summarization in summarist. In I. Mani and M. Maybury, editors, Advances in Automated Text Summarization, pages 81--94. MIT Press, 1999.

[6]

D. E. Johnson, F. J. Oles, T. Zhang, and T. Goetz. A decision-tree-based symbolic rule induction system for text categorization. IBM Systems Journal, 41:428--437, 2002.

Digital Library

[7]

M.-Y. Kan and K. R. McKeown. Information extraction and summarization: domain independence through focus types. Columbia University Computer Science Technical Report CUCS-030-99.

[8]

J. M. Kupiec, J. Pedersen, and F. Chen. A trainable document summarizer. In SIGIR '95, pages 68--73, 1995.

Digital Library

[9]

D. Lawrie, W. B. Croft, and A. Rosenberg. Finding topic words for hierarchical summarization. In SIGIR '01, pages 349--357, 2001.

Digital Library

[10]

F. Li and Y. Yang. A loss function analysis for classification methods in text categorization. In ICML 03, pages 472--479, 2003.

[11]

C.-Y. Lin. Training a selection function for extraction. In CIKM '99, pages 1--8, 1999.

Digital Library

[12]

C.-Y. Lin and E. Hovy. Identifying topics by position. In Proceedings of the Applied Natural Language Processing Conference (ANLP-97), pages 283--290, 1997.

Digital Library

[13]

D. Marcu. From discourse structures to text summaries. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization, pages 82--88. ACL, 1997.

[14]

A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pages 41--48, 1998.

[15]

J. L. Neto, A. Santos, C. Kaestner, A. Freitas, and J. Nievola. A trainable algorithm for summarizing news stories. In Proceedings of PKDD'2000 Workshop on Machine Learning and Textual Information Access, September 2000.

[16]

C. Nobata, S. Sekine, H. Isahara, and R. Grishman. Summarization system integrated with named entity tagging and ie pattern discovery. In Proceedings of Third International Conference on Language Resources and Evaluation (LREC 2002), 2002.

[17]

C. D. Paice and P. A. Jones. The identification of important concepts in highly structured technical papers. In SIGIR '93, pages 69--78. ACM, 1993.

Digital Library

[18]

J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

Digital Library

[19]

E. F. T. K. Sang and F. D. Meulder. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of CoNLL-2003, pages 142--147, 2003.

Digital Library

[20]

W.-M. Soon, H.-T. Ng, and C.-Y. Lim. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521--544, 2001.

[21]

S. Teufel and M. Moens. Sentence extraction as a classification task. In ACL/EACL-97 Workshop on Intelligent and Scalable Text Summarization, 1997.

[22]

T. Zhang. On the dual formulation of regularized linear systems. Machine Learning, 46:91--129, 2002.

Digital Library

[23]

T. Zhang, F. Damerau, and D. E. Johnson. Text chunking based on a generalization of Winnow. Journal of Machine Learning Research, 2:615--637, 2002.

Digital Library

[24]

T. Zhang and F. J. Oles. Text categorization based on regularized linear classification methods. Information Retrieval, 4:5--31, 2001.

Digital Library

Cited By

Yang MNamoano BFarsi MAhmet Erkoyuncu J(2024)Named Entity Recognition in Aviation Products Domain Based on BERTIEEE Access10.1109/ACCESS.2024.351639012(189710-189721)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3516390
Kumar AStarly B(2021)“FabNER”: information extraction from manufacturing process science domain literature using named entity recognitionJournal of Intelligent Manufacturing10.1007/s10845-021-01807-x33:8(2393-2407)Online publication date: 24-Jun-2021
https://doi.org/10.1007/s10845-021-01807-x
Bartziokas NMavropoulos TKotropoulos C(2020)Datasets and Performance Metrics for Greek Named Entity Recognition11th Hellenic Conference on Artificial Intelligence10.1145/3411408.3411437(160-167)Online publication date: 2-Sep-2020
https://dl.acm.org/doi/10.1145/3411408.3411437
Show More Cited By

Index Terms

Focused named entity recognition using machine learning
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

Improving named entity recognition and disambiguation in news headlines

In this paper, we present a framework for extraction and disambiguation of hyphenated and partially named entities in news headlines. The direct application of state-of-the-art named entity detection and disambiguation approaches on news headlines results ...
Automatic gazette creation for named entity recognition and application to resume processing
COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies

Named entities are important content-carrying units within documents. Consequently named entity recognition (NER) is an important part of information extraction. One fast and accurate approach to NER uses a list or gazette consisting of known instances. ...
Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

July 2004

624 pages

ISBN:1581138814

DOI:10.1145/1008992

General Chair:
Mark Sanderson
University of Sheffield (UK)
,
Program Chairs:
Kalervo Järvelin
University of Tampere (Finland)
,
James Allan
University of Massachusetts (USA)
,
Peter Bruza
Distributed Systems Technology Centre (Australia)

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR04

Sponsor:

SIGIR04: The 27th ACM/SIGIR International Symposium on Information Retrieval 2004

July 25 - 29, 2004

Sheffield, United Kingdom

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
1,007
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang MNamoano BFarsi MAhmet Erkoyuncu J(2024)Named Entity Recognition in Aviation Products Domain Based on BERTIEEE Access10.1109/ACCESS.2024.351639012(189710-189721)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3516390
Kumar AStarly B(2021)“FabNER”: information extraction from manufacturing process science domain literature using named entity recognitionJournal of Intelligent Manufacturing10.1007/s10845-021-01807-x33:8(2393-2407)Online publication date: 24-Jun-2021
https://doi.org/10.1007/s10845-021-01807-x
Bartziokas NMavropoulos TKotropoulos C(2020)Datasets and Performance Metrics for Greek Named Entity Recognition11th Hellenic Conference on Artificial Intelligence10.1145/3411408.3411437(160-167)Online publication date: 2-Sep-2020
https://dl.acm.org/doi/10.1145/3411408.3411437
A P AK MMary Idicula S(2019)An Improved Word Representation for Deep Learning Based NER in Indian LanguagesInformation10.3390/info1006018610:6(186)Online publication date: 30-May-2019
https://doi.org/10.3390/info10060186
Liaghat Z(2017)The Effect of Corpora Size on Performance of Named Entity RecognitionHighlighting the Importance of Big Data Management and Analysis for Various Applications10.1007/978-3-319-60255-4_8(93-105)Online publication date: 23-Aug-2017
https://doi.org/10.1007/978-3-319-60255-4_8
Geetha THema R(2016)Recognition of Chemical Entities using Pattern Matching and Functional Group ClassificationInternational Journal of Intelligent Information Technologies10.4018/IJIIT.201610010212:4(21-44)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.4018/IJIIT.2016100102
Giannini SColucci SDonini FDi Sciascio E(2015)A Logic-Based Approach to Named-Entity Disambiguation in the Web of DataAI*IA 2015 Advances in Artificial Intelligence10.1007/978-3-319-24309-2_28(367-380)Online publication date: 17-Oct-2015
https://doi.org/10.1007/978-3-319-24309-2_28
Abinaya NJohn NGanesh BKumar ASoman K(2014)AMRITA_CEN@FIRE-2014Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/2824864.2824882(103-111)Online publication date: 5-Dec-2014
https://dl.acm.org/doi/10.1145/2824864.2824882
Sathyadevan SS DS. S(2014)Crime analysis and prediction using data mining2014 First International Conference on Networks & Soft Computing (ICNSC2014)10.1109/CNSC.2014.6906719(406-412)Online publication date: Aug-2014
https://doi.org/10.1109/CNSC.2014.6906719
Zhang ZCohn TCiravegna F(2013)Topic-Oriented words as features for named entity recognitionProceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I10.1007/978-3-642-37247-6_25(304-316)Online publication date: 24-Mar-2013
https://dl.acm.org/doi/10.1007/978-3-642-37247-6_25
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten