Abstract
Many online news outlets, forums, and blogs provide a rich stream of publications and user comments. This rich body of data is a valuable source of information for researchers, journalists, and policymakers. However, the ever-increasing production and user engagement rate make it difficult to analyze this data without automated tools. This work presents MultiLayerET, a method to unify the representation of entities and topics in articles and comments. In MultiLayerET, articles’ content and associated comments are parsed into a multilayer graph consisting of heterogeneous nodes representing named entities and news topics. The nodes within this graph have attributed edges denoting weight, i.e., the strength of the connection between the two nodes, time, i.e., the co-occurrence contemporaneity of two nodes, and sentiment, i.e., the opinion (in aggregate) of an entity toward a topic. Such information helps in analyzing articles and their comments. We infer the edges connecting two nodes using information mined from the textual data. The multilayer representation gives an advantage over a single-layer representation since it integrates articles and comments via shared topics and entities, providing richer signal points about emerging events. MultiLayerET can be applied to different downstream tasks, such as detecting media bias and misinformation. To explore the efficacy of the proposed method, we apply MultiLayerET to a body of data gathered from six representative online news outlets. We show that with MultiLayerET, the classification F1 score of a media bias prediction model improves by \(36\%\), and that of a state-of-the-art fake news detection model improves by \(4\%\).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
He, L., Han, C., Mukherjee, A., Obradovic, Z., Dragut, E.: On the dynamics of user engagement in news comment media. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 10, e1342 (2020)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In:WSDM (2015)
Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)
Newman, D., Chemudugunta, C., Smyth, P., Steyvers, M.: Analyzing entities and topics in news articles using statistical topic models. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B., Wang, F.-Y. (eds.) ISI 2006. LNCS, vol. 3975, pp. 93–104. Springer, Heidelberg (2006). https://doi.org/10.1007/11760146_9
Spitz, A., Gertz, M.: Exploring entity-centric networks in entangled news streams. In: TheWebConf (2018)
Spitz, A., Gertz, M.: Entity-centric topic extraction and exploration: a network-based approach. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 3–15. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_1
Spitz, A., Almasian, S., Gertz, M.: Entity-centric network topic exploration in news streams. In: WSDM (2019)
Wu, C., Kanoulas, E., Rijke, M.: Learning entity-centric document representations using an entity facet topic model. Inf. Process. Manage. 57, 102216 (2020)
Kim, H., Sun, Y., Hockenmaier, J., Han, J.: ETM: entity topic models for mining documents associated with entities. In: ICDM (2012)
Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. EMNLP (2009)
Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: ICML (2009)
Hofmann, T.: Probabilistic latent semantic analysis. In: UAI (1999)
Wang, X., Grimson, E.: Spatial latent dirichlet allocation. In: NeurIPS, vol. 20 (2008)
Wu, C., Kanoulas, E., Rijke, M.: It all starts with entities: a salient entity topic model. Nat. Lang. Eng. 26, 531–549 (2020)
Kim, H., El-Kishky, A., Ren, X., Han, J.: Mining news events from comparable news corpora: a multi-attribute proximity network modeling approach. In: IEEE BigData (2019)
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: SIGKDD (2016)
Shu, K., Cui, L., Wang, S., Lee, D., Liu, H.: DEFEND: explainable fake news detection. In: SIGKDD (2019)
Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD. 19, 22–36 (2017)
Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: FakeNewsNet: a data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data 8, 171–188 (2020)
Tatar, A., Leguay, J., Antoniadis, P., Limbourg, A., Amorim, M., Fdida, S.: Predicting the popularity of online articles based on user comments. In: WIMS (2011)
Yigit-Sert, S., Altingovde, I., Ulusoy, Ö.: Towards detecting media bias by utilizing user comments. In: WebSci (2016)
Rizos, G., Papadopoulos, S., Kompatsiaris, Y.: Predicting news popularity by mining online discussions. In: The Web Conference (2016)
Tsagkias, M., Weerkamp, W., de Rijke, M.: News comments: exploring, modeling, and online prediction. In: Gurrin, C., et al. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 191–203. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12275-0_19
Lee, E.: That’s not the way it is: how user-generated comments on the news affect perceived media bias. J. Comput.-Mediat. Comm. 18, 32–45 (2012)
Yanagi, Y., Orihara, R., Sei, Y., Tahara, Y., Ohsuga, A.: Fake news detection with generated comments for news articles. In: INES (2020)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: EMNLP (2019)
Leban, G., Fortuna, B., Brank, J., Grobelnik, M.: Event registry: learning about world events from news. In: TheWebConference (2014)
Watanabe, K., Ochi, M., Okabe, M., Onai, R.: Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. In: CIKM (2011)
Sankaranarayanan, J., Samet, H., Teitler, B., Lieberman, M., Sperling, J.: TwitterStand: news in tweets. In: GIS (2009)
Panagiotou, N., Saravanou, A., Gunopulos, D.: News monitor: a framework for exploring news in real-time. Data 7, 3 (2022)
Saravanou, A., Stefanoni, G., Meij, E.: Identifying notable news stories. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 352–358. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_44
Mathioudakis, M., Koudas, N.: TwitterMonitor: trend detection over the twitter stream. In: SIGMOD (2010)
Syed, M., et al.: Unified representation of twitter and online news using graph and entities. Front. Big Data 4, 699070 (2021)
Barabási, A.: Network science. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 371, 20120375 (2013)
Trevisiol, M., Aiello, L., Schifanella, R., Jaimes, A.: Cold-start news recommendation with domain-dependent browse graph. In: RecSys (2014)
Bach, N., Hai, N., Phuong, T.: Personalized recommendation of stories for commenting in forum-based social media. Inf. Sci. 352–353 (2016)
Li, Q., Wang, J., Chen, Y., Lin, Z.: User comments for news recommendation in forum-based social media. Inf. Sci. 180, 4929–4939 (2010)
Guo, W., Li, H., Ji, H., Diab, M.: Linking tweets to news: a framework to enrich short text data in social media. In: ACL (2013)
Wei, Z., Gao, W.: Gibberish, assistant, or master? Using tweets linking to news for extractive single-document summarization. In: SIGIR (2015)
Li, M., et al.: EKNOT: event Knowledge from news and opinions in Twitter. In: AAAI (2016)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, vol. 32 (2014)
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: ACL (2004)
Stanojevic, M., Alshehri, J., Dragut, E., Obradovic, Z.: Biased news data influence on classifying social media posts. In:sIR@ SIGIR (2019)
Stanojevic, M., Alshehri, J., Obradovic, Z.: Surveying public opinion using label prediction on social media data. In: ASONAM (2019)
Alshehri, J., Stanojevic, M., Dragut, E., Obradovic, Z.: Stay on topic, please: aligning user comments to the content of a news article. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12656, pp. 3–17. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72113-8_1
Yang, F., Dragut, E., Mukherjee, A.: Predicting personal opinion on future events with fingerprints. In: COLING (2020)
Yang, F., Dragut, E., Mukherjee, A.: Claim verification under positive unlabeled learning. In: ASONAM (2020)
Yang, F., Dragut, E., Mukherjee, A.: Improving evidence retrieval with claim-evidence entailment. In: RANLP (2021)
He, L., Shen, C., Mukherjee, A., Vucetic, S., Dragut, E.: Cannot Predict comment volume of a news article before (a few) users read it. In: ICWSM (2021)
Hosseinia, M., Dragut, E., Boumber, D., Mukherjee, A.: On the usefulness of personality traits in opinion-oriented tasks. In: RANLP (2021)
Tumarada, K., Zhang, Y., Yang, F., Dragut, E., Gnawali, O., Mukherjee, A.: Opinion prediction with user fingerprinting. arXiv (2021)
Acknowledgements
This research was supported in part by the U.S. NSF awards 2026513 and 1838145, and the ARL subaward 555080-78055 under Prime Contract No. W911NF2220001 and Temple University office of the Vice President for Research 2022 Catalytic Collaborative Research Initiative Program. AI & ML Focus Area. In addition, this research includes calculations carried out on HPC resources supported in part by the U.S. NSF through major research instrumentation grant number 1625061 and by the U.S. Army Research Laboratory under contract number W911NF-16-2-0189.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Alshehri, J., Stanojevic, M., Khan, P., Rapp, B., Dragut, E., Obradovic, Z. (2023). MultiLayerET: A Unified Representation of Entities and Topics Using Multilayer Graphs. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13714. Springer, Cham. https://doi.org/10.1007/978-3-031-26390-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-031-26390-3_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26389-7
Online ISBN: 978-3-031-26390-3
eBook Packages: Computer ScienceComputer Science (R0)