Skip to main content

Visual Analysis of Topical Evolution in Unstructured Text: Design and Evaluation of TopicFlow

  • Chapter
  • First Online:
Book cover Applications of Social Media and Social Network Analysis

Part of the book series: Lecture Notes in Social Networks ((LNSN))

Abstract

Topic models are regularly used to provide directed exploration and a high-level overview of a corpus of unstructured text. In many cases, it is important to analyze the evolution of topics over a time range. In this work, we present an application of statistical topic modeling and alignment (binned topic models) to group related documents into automatically generated topics and align the topics across a time range. Additionally, we present TopicFlow , an interactive tool to visualize the evolution of these topics. The tool was developed using an iterative design process based on feedback from expert reviewers. We demonstrate the utility of the tool with a detailed analysis of a corpus of data collected over the period of an academic conference, and demonstrate the effectiveness of this visualization for reasoning about large data by a usability study with 18 participants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This work is an extension of our prior work [13], in which we originally introduced TopicFlow as a Twitter analysis tool. A video demonstrating this work can be found here: https://www.youtube.com/watch?v=qqIlvMOQaOE&feature=youtu.be

  2. 2.

    For TopicFlow, the number of topics is adjustable with a default of 15 to balance granularity and comprehensibility of the resulting topics.

  3. 3.

    For this implementation the LDA algorithm runs for 100 iterations with \(\alpha =0.5\) and \(\beta =0.5\).

  4. 4.

    For example, Twitter-specific stop words include {rt, retweet, etc.} and Spanish stop words include {el, la, tu, etc.}.

  5. 5.

    \(cos(A, B) = \frac{A\cdot B}{\left\| A \right\| \left\| B \right\| }\).

  6. 6.

    For prototyping and evaluation purposes, the threshold was set between 0.15 and 0.25 depending on the dataset.

  7. 7.

    A prototype of the TopicFlow tool is available for demo here: http://www.cs.umd.edu/~maliks/topicflow/TopicFlow.html.

  8. 8.

    Twitter’s open API and the fact that tweets are rich with metadata, specifically time stamps, makes it an appropriate data source for prototyping and testing.

References

  1. Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of 23rd international conference on machine learning. ACM Press, New York, pp 113–120

    Google Scholar 

  2. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Bostock M (2012) Data driven documents (d3). http://d3js.org

  4. Cui W, Liu S, Tan L, Shi C, Song Y, Gao Z, Qu H, Tong X (2011) TextFlow: towards better understanding of evolving topics in text. IEEE Trans Vis Comput Graph 17(12):2412–2421

    Article  Google Scholar 

  5. Hart S, Staveland L (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Hum Mental Workload 1:139–183

    Google Scholar 

  6. Havre S, Hetzler B, Nowell L (2000) ThemeRiver: visualizing theme changes over time. In: Proceedings of IEEE symposium on information visualization, pp 115–123

    Google Scholar 

  7. Hu Y, Boyd-Graber J, Satinoff B, Smith A (2013) Interactive topic modeling. Mach Learn J 95:423–469

    Google Scholar 

  8. Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7:373–397 (2003)

    Google Scholar 

  9. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:49–86

    Article  MathSciNet  Google Scholar 

  10. Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 497–506

    Google Scholar 

  11. Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151

    Article  MATH  Google Scholar 

  12. Liu Y, Niculescu-Mizil A, Gryc W (2009) Topic-link LDA: joint models of topic and author community. In: Proceedings of 26th annual international conference on machine learning. ACM Press, New York, pp 665–672

    Google Scholar 

  13. Malik S, Smith A, Hawes T, Dunne C, Papadatos P, Li J, Shneiderman B (2013) Topicflow: visualizing topic alignment of twitter data over time. In: The 2013 IEEE/ACM international conference on advances in social networks analysis and mining

    Google Scholar 

  14. Mimno D, McCallum A (2007) Organizing the OCA: learning faceted subjects from a library of digital books. In: Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries. ACM Press, New York, pp 376–385

    Google Scholar 

  15. Nikulin M (2001) Hazewinkel, Michiel, encyclopaedia of mathematics : an updated and annotated translation of the Soviet. Mathematical encyclopaedia. Reidel Sold and distributed in the U.S.A. and Canada. Kluwer Academic, Boston

    Google Scholar 

  16. O’Brien WL (2012) Preliminary investigation of the use of Sankey diagrams to enhance building performance simulation-supported design. In: Proceedings of 2012 symposium on simulation for architecture and urban design. Society for Computer Simulation International, San Diego, pp 15:1–15:8

    Google Scholar 

  17. Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, New York, pp 248–256

    Google Scholar 

  18. Shuyo N (2011) LDA implementation. https://github.com/shuyo/iir/blob/master/lda/lda.py

  19. Sopan A, Rey P, Butler B, Shneiderman B (2012) Monitoring academic conferences: real-time visualization and retrospective analysis of backchannel conversations. In: ASE international conference on social informatics, pp 63–69

    Google Scholar 

  20. Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison Wesley, New York

    Google Scholar 

  21. Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101:1566–1581

    Google Scholar 

  22. Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 424–433

    Google Scholar 

  23. Wilbur WJ, Sirotkin K (1992) The automatic identification of stop words. J Inf Sci 18(1):45–55

    Article  Google Scholar 

  24. Zhai K, Boyd-Graber J, Asadi N, Alkhouja M (2012) Mr. LDA: a flexible large scale topic modeling package using variational inference in mapreduce. In: ACM international conference on world wide web

    Google Scholar 

Download references

Acknowledgments

We would like to thank Timothy Hawes, Cody Dunne, Marc Smith, Jimmy Lin, Jordan Boyd-Graber, Catherine Plaisant, Peter David, and Jim Nolan for their input throughout the design and implementation of this project and thoughtful reviews of this paper. Additionally, we would like to thank Jianyu (Leo) Li and Panagis Papadatos for their assistance in designing, developing, and evaluating the initial version of the tool.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alison Smith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Smith, A., Malik, S., Shneiderman, B. (2015). Visual Analysis of Topical Evolution in Unstructured Text: Design and Evaluation of TopicFlow. In: Kazienko, P., Chawla, N. (eds) Applications of Social Media and Social Network Analysis. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-19003-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19003-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19002-0

  • Online ISBN: 978-3-319-19003-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics