skip to main content
10.1145/3331184.3331300acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

A Lightweight Representation of News Events on Social Media

Published: 18 July 2019 Publication History

Abstract

The sheer amount of newsworthy information published by users in social media platforms makes it necessary to have efficient and effective methods to filter and organize content. In this scenario, off-the-shelf methods fail to process large amounts of data, which is usually approached by adding more computational resources. Simple data aggregations can help to cope with space and time constraints, while at the same time improve the effectiveness of certain applications, such as topic detection or summarization. We propose a lightweight representation of newsworthy social media data. The proposed representation leverages microblog features, such as redundancy and re-sharing capabilities, by using surrogate texts from shared URLs and word embeddings. Our representation allows us to achieve comparable clustering results to those obtained by using the complete data, while reducing running time and required memory. This is useful when dealing with noisy and raw user-generated social media data.

References

[1]
Firoj Alam, Ferda Ofli, and Muhammad Imran. 2018. CrisisMMD: Multimodal Twitter Datasets from Natural Disasters. International AAAI Conference on Web and Social Media (2018).
[2]
Omar Alonso, Sushma Bannur, Kartikay Khandelwal, and Shankar Kalyanaraman. 2015. The World Conversation: Web Page Metadata Generation From Social Sources. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 385--395.
[3]
Omar Alonso, Vasileios Kandylas, Serge-Eric Tremblay, Jake M. Hofman, and Siddhartha Sen. 2017. What's Happening and What Happened: Searching the Social Web. In Proceedings of the 2017 ACM on Web Science Conference (WebSci '17). ACM, New York, NY, USA, 191--200.
[4]
David Alvarez-Melis and Martin Saveski. 2016. Topic Modeling in Twitter: Aggregating Tweets by Conversations. ICWSM, Vol. 2016 (2016), 519--522.
[5]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching Word Vectors with Subword Information. CoRR, Vol. abs/1607.04606 (2016). arxiv: 1607.04606
[6]
Liangjie Hong and Brian D. Davison. 2010. Empirical Study of Topic Modeling in Twitter. In Proceedings of the First Workshop on Social Media Analytics (SOMA '10). ACM, New York, NY, USA, 80--88.
[7]
Janani Kalyanam, Mauricio Quezada, Barbara Poblete, and Gert Lanckriet. 2016. Prediction and characterization of high-activity events in social media triggered by real-world news. PloS one, Vol. 11, 12 (2016), e0166694.
[8]
Rishabh Mehrotra, Scott Sanner, Wray Buntine, and Lexing Xie. 2013. Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '13). ACM, New York, NY, USA, 889--892.
[9]
Gilad Mishne and Jimmy Lin. 2012. Twanchor Text: A Preliminary Study of the Value of Tweets As Anchor Text. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '12). ACM, New York, NY, USA, 1159--1160.
[10]
Mauricio Quezada and Barbara Poblete. 2013. Understanding Real-World Events via Multimedia Summaries Based on Social Indicators. In Collaboration and Technology. Springer Berlin Heidelberg, Berlin, Heidelberg, 18--25.
[11]
Stéphane Raux, Nils Grünwald, and Christophe Prieur. 2011. Describing the Web in less than 140 Characters. International AAAI Conference on Web and Social Media (2011).
[12]
Hernan Sarmiento, Barbara Poblete, and Jaime Campos. 2018. Domain-Independent Detection of Emergency Situations Based on Social Activity Related to Geolocations. In Proceedings of the 10th ACM Conference on Web Science (WebSci '18). ACM, New York, NY, USA, 245--254.

Cited By

View all
  • (2023)Cross-lingual Text Clustering in a Large SystemProceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval10.1145/3639233.3639356(1-11)Online publication date: 15-Dec-2023
  • (2021)Improved Topic Modeling in Twitter Through Community PoolingString Processing and Information Retrieval10.1007/978-3-030-86692-1_17(209-216)Online publication date: 27-Sep-2021

Index Terms

  1. A Lightweight Representation of News Events on Social Media

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2019
    1512 pages
    ISBN:9781450361729
    DOI:10.1145/3331184
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. anchor text
    2. clustering
    3. data mining
    4. document models
    5. graphs
    6. modeling
    7. news events
    8. social media
    9. summarization
    10. topic detection
    11. word embeddings

    Qualifiers

    • Short-paper

    Funding Sources

    • CONICYT PCHA/Doctorado Nacional 2015
    • Millennium Institute for Foundational Research on Data

    Conference

    SIGIR '19
    Sponsor:

    Acceptance Rates

    SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Cross-lingual Text Clustering in a Large SystemProceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval10.1145/3639233.3639356(1-11)Online publication date: 15-Dec-2023
    • (2021)Improved Topic Modeling in Twitter Through Community PoolingString Processing and Information Retrieval10.1007/978-3-030-86692-1_17(209-216)Online publication date: 27-Sep-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media