research-article

SEED: a framework for extracting social events from press news

Authors:

Salvatore Orlando,

Francesco Pizzolon,

Gabriele TolomeiAuthors Info & Claims

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Pages 1285 - 1294

https://doi.org/10.1145/2487788.2488163

Published: 13 May 2013 Publication History

Abstract

Everyday people are exchanging a huge amount of data through the Internet. Mostly, such data consist of unstructured texts, which often contain references to structured information (e.g., person names, contact records, etc.). In this work, we propose a novel solution to discover social events from actual press news edited by humans. Concretely, our method is divided in two steps, each one addressing a specific Information Extraction (IE) task: first, we use a technique to automatically recognize four classes of named-entities from press news: DATE, LOCATION, PLACE, and ARTIST. Furthermore, we detect social events by extracting ternary relations between such entities, also exploiting evidence from external sources (i.e., the Web). Finally, we evaluate both stages of our proposed solution on a real-world dataset. Experimental results highlight the quality of our first-step Named-Entity Recognition (NER) approach, which indeed performs consistently with state-of-the-art solutions. Eventually, we show how to precisely select true events from the list of all candidate events (i.e., all the ternary relations), which result from our second-step Relation Extraction (RE) method. Indeed, we discover that true social events can be detected if enough evidence of those is found in the result list of Web search engines.

References

[1]

E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. In DL '00, pages 85--94. ACM, 2000.

Digital Library

[2]

N. Bach and S. Badaskar. A review of relation extraction. Literature Review for Language and Statistics II, 2007.

[3]

M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open Information Extraction for the Web. PhD thesis, University of Washington, 2009.

Digital Library

[4]

S. Brin. Extracting patterns and relations from the world wide web. The World Wide Web and Databases, pages 172--183, 1999.

[5]

R. Bunescu and R. Mooney. Subsequence kernels for relation extraction. Advances in Neural Information Processing Systems 18, pages 171--178, MIT Press, 2006.

[6]

R. C. Bunescu and R. J. Mooney. A shortest path dependency kernel for relation extraction. In HLT/EMNLP '05, pages 724--731. Association for Computational Linguistics, 2005.

Digital Library

[7]

M. E. Califf and R. J. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 4:177--210, 2003.

Digital Library

[8]

H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. Gate: an architecture for development of robust hlt applications. In ACL '02, pages 168--175. Association for Computational Linguistics, 2002.

Digital Library

[9]

D. Downey, O. Etzioni, and S. Soderland. Analysis of a probabilistic model of redundancy in unsupervised information extraction. Artificial Intelligence, 174(11):726--748. Elsevier Science Publishers Ltd., 2010.

Digital Library

[10]

L. Getoor, N. Friedman, D. Koller, A. Pfeffer, and B. Taskar. 5 probabilistic relational models. Introduction to Statistical Relational Learning, page 129, 2007.

[11]

X. Han, L. Sun, and J. Zhao. Collective entity linking in web text: a graph-based method. In SIGIR '11, pages 765--774. ACM, 2011.

Digital Library

[12]

C.-N. Hsu and M.-T. Dung. Generating finite-state transducers for semi-structured data extraction from the web. Information Systems, 23(9):521--538. Elsevier Science Publishers Ltd., 1998.

Digital Library

[13]

N. Jahan, S. Morwal, and D. Chopra. Named entity recognition in indian languages using gazetteer method and hidden markov model: A hybrid approach. IJCSET, March 2012.

[14]

N. Kambhatla. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In ACL '04, page 22. Association for Computational Linguistics, 2004.

Digital Library

[15]

J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML, June 2001.

Digital Library

[16]

H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2:419--444, 2002.

Digital Library

[17]

R. Malouf et al. A comparison of algorithms for maximum entropy parameter estimation. In COLING '02, pages 1--7. Association for Computational Linguistics, 2002.

Digital Library

[18]

D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks. Named entity recognition from diverse text types. In RANLP '01, pages 257--274. Association for Computational Linguistics, 2001.

[19]

R. McDonald. Extracting relations from unstructured text. Rapport technique, Department of Computer and Information Science-University of Pennsylvania, 2005.

[20]

R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In CIKM '07, pages 233--242. ACM, 2007.

Digital Library

[21]

A. Mikheev, M. Moens, and C. Grover. Named entity recognition without gazetteers. In EACL '99, pages 1--8. Association for Computational Linguistics, 1999.

Digital Library

[22]

S. Sarawagi. Information Extraction. Foundations and Trends in Databases, 1(3):261--377, March 2008.

Digital Library

[23]

S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1):233--272. Kluwer Academic Publishers, 1999.

Digital Library

[24]

B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin. Learning structured prediction models: A large margin approach. In ML '05, pages 896--903. ACM, 2005.

Digital Library

[25]

D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In ACL '95, pages 189--196. Association for Computational Linguistics, 1995.

Digital Library

[26]

D. Zelenko, C. Aone, and A. Richardella. Kernel methods for relation extraction. Journal of Machine Learning Research, 3:1083--1106, 2003.

Digital Library

Cited By

(2017)A crowdsourced system for user studies in information extractionInternational Journal of Knowledge Engineering and Soft Data Paradigms10.1504/IJKESDP.2017.0895066:1(44-51)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1504/IJKESDP.2017.089506
Jung YStratos KCarloni LGangemi ALeonardi SPanconesi A(2015)LN-AnnoteProceedings of the 24th International Conference on World Wide Web10.1145/2736277.2741633(538-548)Online publication date: 18-May-2015
https://dl.acm.org/doi/10.1145/2736277.2741633
Yang XZhang TXu CYang M(2015)Boosted Multifeature Learning for Cross-Domain TransferACM Transactions on Multimedia Computing, Communications, and Applications10.1145/270028611:3(1-18)Online publication date: 5-Feb-2015
https://dl.acm.org/doi/10.1145/2700286
Show More Cited By

Index Terms

SEED: a framework for extracting social events from press news
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Relation extraction and the influence of automatic named-entity recognition

We present an approach for extracting relations between named entities from natural language documents. The approach is based solely on shallow linguistic processing, such as tokenization, sentence splitting, part-of-speech tagging, and lemmatization. ...
A Flexible Text Mining System for Entity and Relation Extraction in PubMed
DTMBIO '15: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics

Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means ...
A Comparative Study of Dictionary-based and Machine Learning-based Named Entity Recognition in Pashto
NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval

Information Extraction (IE) is the process of extracting structured information from unstructured text using natural language processing (NLP). One important sub-task of IE is the extraction of names of persons, places, and organizations, called Named ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

May 2013

1636 pages

ISBN:9781450320382

DOI:10.1145/2487788

General Chairs:
Daniel Schwabe
PUC-Rio - Brazil
,
Virgílio Almeida
UFMG - Brazil
,
Hartmut Glaser
CGI.br - Brazil
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Labs - Spain & Chile
,
Sue Moon
KAIST - South Korea

Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '13

Sponsor:

NICBR
CGIBR

WWW '13: 22nd International World Wide Web Conference

May 13 - 17, 2013

Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Companion Paper Acceptance Rate 831 of 1,250 submissions, 66%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
156
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

(2017)A crowdsourced system for user studies in information extractionInternational Journal of Knowledge Engineering and Soft Data Paradigms10.1504/IJKESDP.2017.0895066:1(44-51)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1504/IJKESDP.2017.089506
Jung YStratos KCarloni LGangemi ALeonardi SPanconesi A(2015)LN-AnnoteProceedings of the 24th International Conference on World Wide Web10.1145/2736277.2741633(538-548)Online publication date: 18-May-2015
https://dl.acm.org/doi/10.1145/2736277.2741633
Yang XZhang TXu CYang M(2015)Boosted Multifeature Learning for Cross-Domain TransferACM Transactions on Multimedia Computing, Communications, and Applications10.1145/270028611:3(1-18)Online publication date: 5-Feb-2015
https://dl.acm.org/doi/10.1145/2700286
Yang XZhang TXu C(2015)Cross-Domain Feature Learning in MultimediaIEEE Transactions on Multimedia10.1109/TMM.2014.237579317:1(64-78)Online publication date: Jan-2015
https://doi.org/10.1109/TMM.2014.2375793

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten