skip to main content
10.1145/2487788.2488163acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

SEED: a framework for extracting social events from press news

Published: 13 May 2013 Publication History

Abstract

Everyday people are exchanging a huge amount of data through the Internet. Mostly, such data consist of unstructured texts, which often contain references to structured information (e.g., person names, contact records, etc.). In this work, we propose a novel solution to discover social events from actual press news edited by humans. Concretely, our method is divided in two steps, each one addressing a specific Information Extraction (IE) task: first, we use a technique to automatically recognize four classes of named-entities from press news: DATE, LOCATION, PLACE, and ARTIST. Furthermore, we detect social events by extracting ternary relations between such entities, also exploiting evidence from external sources (i.e., the Web). Finally, we evaluate both stages of our proposed solution on a real-world dataset. Experimental results highlight the quality of our first-step Named-Entity Recognition (NER) approach, which indeed performs consistently with state-of-the-art solutions. Eventually, we show how to precisely select true events from the list of all candidate events (i.e., all the ternary relations), which result from our second-step Relation Extraction (RE) method. Indeed, we discover that true social events can be detected if enough evidence of those is found in the result list of Web search engines.

References

[1]
E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. In DL '00, pages 85--94. ACM, 2000.
[2]
N. Bach and S. Badaskar. A review of relation extraction. Literature Review for Language and Statistics II, 2007.
[3]
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open Information Extraction for the Web. PhD thesis, University of Washington, 2009.
[4]
S. Brin. Extracting patterns and relations from the world wide web. The World Wide Web and Databases, pages 172--183, 1999.
[5]
R. Bunescu and R. Mooney. Subsequence kernels for relation extraction. Advances in Neural Information Processing Systems 18, pages 171--178, MIT Press, 2006.
[6]
R. C. Bunescu and R. J. Mooney. A shortest path dependency kernel for relation extraction. In HLT/EMNLP '05, pages 724--731. Association for Computational Linguistics, 2005.
[7]
M. E. Califf and R. J. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 4:177--210, 2003.
[8]
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. Gate: an architecture for development of robust hlt applications. In ACL '02, pages 168--175. Association for Computational Linguistics, 2002.
[9]
D. Downey, O. Etzioni, and S. Soderland. Analysis of a probabilistic model of redundancy in unsupervised information extraction. Artificial Intelligence, 174(11):726--748. Elsevier Science Publishers Ltd., 2010.
[10]
L. Getoor, N. Friedman, D. Koller, A. Pfeffer, and B. Taskar. 5 probabilistic relational models. Introduction to Statistical Relational Learning, page 129, 2007.
[11]
X. Han, L. Sun, and J. Zhao. Collective entity linking in web text: a graph-based method. In SIGIR '11, pages 765--774. ACM, 2011.
[12]
C.-N. Hsu and M.-T. Dung. Generating finite-state transducers for semi-structured data extraction from the web. Information Systems, 23(9):521--538. Elsevier Science Publishers Ltd., 1998.
[13]
N. Jahan, S. Morwal, and D. Chopra. Named entity recognition in indian languages using gazetteer method and hidden markov model: A hybrid approach. IJCSET, March 2012.
[14]
N. Kambhatla. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In ACL '04, page 22. Association for Computational Linguistics, 2004.
[15]
J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML, June 2001.
[16]
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2:419--444, 2002.
[17]
R. Malouf et al. A comparison of algorithms for maximum entropy parameter estimation. In COLING '02, pages 1--7. Association for Computational Linguistics, 2002.
[18]
D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks. Named entity recognition from diverse text types. In RANLP '01, pages 257--274. Association for Computational Linguistics, 2001.
[19]
R. McDonald. Extracting relations from unstructured text. Rapport technique, Department of Computer and Information Science-University of Pennsylvania, 2005.
[20]
R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In CIKM '07, pages 233--242. ACM, 2007.
[21]
A. Mikheev, M. Moens, and C. Grover. Named entity recognition without gazetteers. In EACL '99, pages 1--8. Association for Computational Linguistics, 1999.
[22]
S. Sarawagi. Information Extraction. Foundations and Trends in Databases, 1(3):261--377, March 2008.
[23]
S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1):233--272. Kluwer Academic Publishers, 1999.
[24]
B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin. Learning structured prediction models: A large margin approach. In ML '05, pages 896--903. ACM, 2005.
[25]
D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In ACL '95, pages 189--196. Association for Computational Linguistics, 1995.
[26]
D. Zelenko, C. Aone, and A. Richardella. Kernel methods for relation extraction. Journal of Machine Learning Research, 3:1083--1106, 2003.

Cited By

View all
  • (2017)A crowdsourced system for user studies in information extractionInternational Journal of Knowledge Engineering and Soft Data Paradigms10.1504/IJKESDP.2017.0895066:1(44-51)Online publication date: 1-Jan-2017
  • (2015)LN-AnnoteProceedings of the 24th International Conference on World Wide Web10.1145/2736277.2741633(538-548)Online publication date: 18-May-2015
  • (2015)Boosted Multifeature Learning for Cross-Domain TransferACM Transactions on Multimedia Computing, Communications, and Applications10.1145/270028611:3(1-18)Online publication date: 5-Feb-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web
May 2013
1636 pages
ISBN:9781450320382
DOI:10.1145/2487788

Sponsors

  • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
  • CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information extraction
  2. named-entity recognition
  3. relation extraction
  4. social event discovery

Qualifiers

  • Research-article

Conference

WWW '13
Sponsor:
  • NICBR
  • CGIBR
WWW '13: 22nd International World Wide Web Conference
May 13 - 17, 2013
Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Companion Paper Acceptance Rate 831 of 1,250 submissions, 66%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)A crowdsourced system for user studies in information extractionInternational Journal of Knowledge Engineering and Soft Data Paradigms10.1504/IJKESDP.2017.0895066:1(44-51)Online publication date: 1-Jan-2017
  • (2015)LN-AnnoteProceedings of the 24th International Conference on World Wide Web10.1145/2736277.2741633(538-548)Online publication date: 18-May-2015
  • (2015)Boosted Multifeature Learning for Cross-Domain TransferACM Transactions on Multimedia Computing, Communications, and Applications10.1145/270028611:3(1-18)Online publication date: 5-Feb-2015
  • (2015)Cross-Domain Feature Learning in MultimediaIEEE Transactions on Multimedia10.1109/TMM.2014.237579317:1(64-78)Online publication date: Jan-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media