skip to main content
10.1145/2505515.2505612acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Mining causal topics in text data: iterative topic modeling with time series feedback

Published: 27 October 2013 Publication History

Abstract

Many applications require analyzing textual topics in conjunction with external time series variables such as stock prices. We develop a novel general text mining framework for discovering such causal topics from text. Our framework naturally combines any given probabilistic topic model with time-series causal analysis to discover topics that are both coherent semantically and correlated with time series data. We iteratively refine topics, increasing the correlation of discovered topics with the time series. Time series data provides feedback at each iteration by imposing prior distributions on parameters. Experimental results show that the proposed framework is effective.

References

[1]
J. Berg, R. Forsythe, F. Nelson, and T. Rietz. Results from a Dozen Years of Election Futures Markets Research, volume 1 of Handbook of Experimental Economics Results, chapter 80, pages 742--751. Elsevier, 2008.
[2]
D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, pages 113--120, New York, NY, USA, 2006. ACM.
[3]
D. M. Blei and J. D. Mcauliffe. Supervised topic models. 2007.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[5]
J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. CoRR, abs/1010.3003, 2010.
[6]
J. Boyd-Graber, D. M. Blei, and X. Zhu. A topic model for word sense disambiguation. In EMNLP '07: Proceedings of the 2007 conference on Empirical Methods in Natural Language Processing, 2007.
[7]
C. W. J. Granger. Essays in econometrics. chapter Investigating causal relations by econometric models and cross-spectral methods, pages 31--47. Harvard University Press, Cambridge, MA, USA, 2001.
[8]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR '99: Proceedings of the 1999 international ACM SIGIR conference on research and development in Information Retrieval, pages 50--57, New York, NY, USA, 1999. ACM.
[9]
E. Hörster, R. Lienhart, and M. Slaney. Image retrieval on large-scale image databases. In CIVR '07: Proceedings of the 2007 ACM international conference on Image and video retrieval, pages 17--24, New York, NY, USA, 2007. ACM.
[10]
H. D. Kim, C. Zhai, T. A. Rietz, D. Diermeier, M. Hsu, M. Castellanos, and C. Ceja. Incatomi: Integrative causal topic miner between textual and non-textual time series data. In CIKM '12: Proceedings of the 2012 ACM international Conference on Information and Knowledge Management, pages 2689--2691, New York, NY, USA, 2012. ACM.
[11]
C. Lin and Y. He. Joint sentiment/topic model for sentiment analysis. In CIKM '09: Proceedings of the 2009 ACM international Conference on Information and Knowledge Management, pages 375--384, New York, NY, USA, 2009. ACM.
[12]
Y. Liu, A. Niculescu-Mizil, and W. Gryc. Topic-link lda: joint models of topic and author community. In ICML '09: Proceedings of the 2009 annual International Conference on Machine Learning, pages 665--672, New York, NY, USA, 2009. ACM.
[13]
Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on World Wide Web, pages 171--180, New York, NY, USA, 2007. ACM.
[14]
G. Mitra and L. Mitra. The handbook of news analytics in finance /. Wiley ;, Hoboken, N.J. :, 2011.
[15]
G. Pomper. The election of 2000: reports and interpretations. ELECTION OF. Chatham House Publishers, 2001.
[16]
D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP '09, pages 248--256, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.
[17]
I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. In ACL '08: Proceedings of the 2008 annual meeting on Association for Computational Linguistics, pages 308--316, Columbus, Ohio, 2008. Association for Computational Linguistics.
[18]
X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD international conference, pages 424--433, New York, NY, USA, 2006. ACM.
[19]
X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In SIGIR '06: Proceedings of the 2006 international ACM SIGIR conference on research and development in Information Retrieval, pages 178--185, New York, NY, USA, 2006. ACM.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. causal topic mining
  2. iterative topic mining
  3. time series

Qualifiers

  • Research-article

Conference

CIKM'13
Sponsor:
CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
October 27 - November 1, 2013
California, San Francisco, USA

Acceptance Rates

CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)MixEHR-SurGJournal of Biomedical Informatics10.1016/j.jbi.2024.104638153:COnline publication date: 17-Jul-2024
  • (2023)A fine-grained causality extraction model incorporating relative location codingApplied Intelligence10.1007/s10489-023-04970-153:22(27163-27176)Online publication date: 2-Sep-2023
  • (2022)STTM: an efficient approach to estimating news impact on stock movement directionPeerJ Computer Science10.7717/peerj-cs.11568(e1156)Online publication date: 16-Dec-2022
  • (2022)Event Detection in Financial Markets2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927759(1-8)Online publication date: 12-Sep-2022
  • (2022)Time Series Impact Through Topic ModelingIEEE Access10.1109/ACCESS.2022.320296010(97327-97347)Online publication date: 2022
  • (2022)A survey of the extraction and applications of causal relationsNatural Language Engineering10.1017/S135132492100036X28:3(361-400)Online publication date: 20-Jan-2022
  • (2022)A survey on extraction of causal relations from natural language textKnowledge and Information Systems10.1007/s10115-022-01665-w64:5(1161-1186)Online publication date: 1-May-2022
  • (2020)Keyword Template Based Semi-supervised Topic Modelling in TweetsInternational Conference on Innovative Computing and Communications10.1007/978-981-15-5148-2_58(659-666)Online publication date: 31-Jul-2020
  • (2018)Efficient Crowd Exploration of Large NetworksProceedings of the ACM on Human-Computer Interaction10.1145/32742932:CSCW(1-25)Online publication date: 1-Nov-2018
  • (2017)A search index-enhanced feature model for news recommendationJournal of Information Science10.1177/016555151663980143:3(328-341)Online publication date: 1-Jun-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media