ABSTRACT
With the advent of digital humanities and computational social sciences, machine learning techniques like topic modeling are increasingly employed by social scientists and humanities scholars. This poses the question what visualization needs these researchers have when confronted with such complex systems. In this paper, we investigate visualization needs in the context of the topic modeling algorithm Latent Dirichlet Allocation and the 950,000 articles of the New York Times corpus. We presented visualizations of how the topics in the newspaper changed over time to seven participants, who fulfilled three tasks with three visualization types. Qualitative interviews with the participants supported our assumptions that visualizations for these tasks need to be visually appealing, intuitively interpretable, and minimizing mental effort.
- Keim. D. A. 2002. Information visualization and visual data mining. IEEE Transactions on Visualization and Computer Graphics 8, 1 (Jan 2002), 1--8. Google ScholarDigital Library
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003), 993--1022. http://dl.acm.org/citation.cfm?id=944919.944937 Google Scholar
- Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108--122.Google Scholar
- M. Burmester, M. Mast, R. Tille, and W. Weber. 2010. How Users Perceive and Use Interactive Information Graphics: An Exploratory Study. In 2010 14th International Conference Information Visualisation. 361--368. Google ScholarDigital Library
- Sheelagh Carpendale. 2008. Evaluating Information Visualizations. Springer Berlin Heidelberg, Berlin, Heidelberg, 19--45. Google ScholarDigital Library
- Jonathan Chang, Sean Gerrish, Chong Wang, Jordan L. Boyd-graber, and David M. Blei. 2009. Reading Tea Leaves: How Humans Interpret Topic Models. In Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.). Curran Associates, Inc., 288--296. http://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models.pdf Google ScholarDigital Library
- Jason Chuang, Daniel Ramage, Christopher D. Manning, and Jeffrey Heer. 2012. Interpretation and Trust: Designing Model-Driven Visualizations for Text Analysis. In ACM Human Factors in Computing Systems (CHI). http://vis.stanford.edu/papers/designing-model-drivenvis Google ScholarDigital Library
- Frederick E. Croxton. 1932. Graphic Comparisons by Bars, Squares, Circles, and Cubes. J. Amer. Statist. Assoc. 27, 177 (1932), 54--60. http://www.jstor.org/stable/2277880Google ScholarCross Ref
- Frederick E. Croxton and Roy E. Stryker. 1927. Bar Charts Versus Circle Diagrams. J. Amer. Statist. Assoc. 22, 160 (1927), 473--482. http://www.jstor.org/stable/2276829Google ScholarCross Ref
- Geoffrey Ellis and Alan Dix. 2006. An explorative analysis of user evaluation studies in information visualisation. In In Proceedings of the AVI Workshop on BEyond time and errors: novel evaLuation methods for Information Visualization (BELIV) (2006), ACM. Press, 1--7. Google ScholarDigital Library
- Stephen Few. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press, Oakland. Google ScholarDigital Library
- Ilias Flaounas, Omar Ali, Thomas Lansdall-Welfare, Tijl De Bie, Nick Mosdell, Justin Lewis, and Nello Cristianini. 2013. Research Methods In The Age Of Digital Journalism. Digital Journalism 1, 1 (2013), 102--116.Google ScholarCross Ref
- C. Forsell. 2010. A Guide to Scientific Evaluation in Information Visualization. In 2010 14th International Conference Information Visualisation. 162--169. Google ScholarDigital Library
- Brynjar Gretarsson, Tobias Höllerer, David Newman, and Padhraic Smyth. 2011. Topicnets: Visual analysis of large text corpora with topic modeling. In ACM Transactions on Intelligent Systems and Technology, 2011. Google ScholarDigital Library
- S. Havre, E. Hetzler, P. Whitney, and L. Nowell. 2002. ThemeRiver: visualizing thematic changes in large document collections. IEEE Transactions on Visualization and Computer Graphics 8, 1 (Jan 2002), 9--20. Google ScholarDigital Library
- Jeffrey Heer, Nicholas Kong, and Maneesh Agrawala. 2009. Sizing the horizon: The effects of chart size and layering on the graphical perception of time series visualizations. In In Proc. ACM Human Factors in Computing Systems (CHI. 1303--1312. Google ScholarDigital Library
- H. Hong and T. S. Moh. 2015. Effective topic modeling for email. In 2015 International Conference on High Performance Computing Simulation (HPCS). 342--349.Google Scholar
- Daniel J Hopkins and Gary King. 2010. A method of automated non-parametric content analysis for social science. American Journal of Political Science 54, 1 (2010), 229--247.Google ScholarCross Ref
- Carina Jacobi, Wouter van Atteveldt, and Kasper Welbers. 2016. Quantitative analysis of large amounts of journalistic texts using topic modelling. Digital Journalism 4, 1 (2016), 89--106.Google ScholarCross Ref
- Waqas Javed, Bryan McDonnel, and Niklas Elmqvist. 2010. Graphical perception of multiple time series. IEEE transactions on visualization and computer graphics 16, 6 (2010), 927--934. Google ScholarDigital Library
- Matthew L. Jockers and David Mimno. 2013. Significant themes in 19th-century literature. Poetics 41, 6 (2013), 750--769.Google ScholarCross Ref
- Stephen Michael Kosslyn. 2006. Graph Design for the Eye and Mind. Oxford University Press, New York.Google Scholar
- David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Albert-László Barabási, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, Tony Jebara, Gary King, Michael Macy, Deb Roy, and Marshall Van Alstyne. 2009. Computational Social Science. Science 323, 5915 (2009), 721--723.Google Scholar
- Philipp Mayring. 2014. Qualitative content analysis: theoretical foundation, basic procedures and software solution. (2014).Google Scholar
- E MORSE, M LEWIS, and K.A OLSEN. 2000. Evaluating visualizations: using a taxonomic guide. International Journal of Human-Computer Studies 53, 5 (2000), 637--662. Google ScholarDigital Library
- Sergey I. Nikolenko, Sergei Koltcov, and Olessia Koltsova. 2017. Topic modelling for qualitative studies. Journal of Information Science 43, 1 (2017), 88--102. Google ScholarDigital Library
- D. A. Ostrowski. 2015. Using latent dirichlet allocation for topic modelling in twitter. In Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015). 493--497.Google ScholarCross Ref
- Brendan F. O'Connor, David Bamman, and Noah A. Smith. 2011. Computational Text Analysis for Social Science: Model Assumptions and Complexity. In Second Workshop on Computational Social Science and Wisdom of the Crowds (NIPS 2011).Google Scholar
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830. Google ScholarDigital Library
- Evan Sandhaus. 2008. The new york times annotated corpus. Linguistic Data Consortium, Philadelphia 6, 12 (2008), e26752.Google Scholar
- D Sculley and Bradley M Pasanek. 2008. Meaning and mining: the impact of implicit assumptions in data mining for the humanities. Literary and Linguistic Computing 23, 4 (2008), 409--424.Google ScholarCross Ref
- Ben Shneiderman. 2003. The eyes have it: A task by data type taxonomy for information visualizations. In The Craft of Information Visualization. Elsevier, 364--371.Google Scholar
- Yangqiu Song, Weiwei Cui, Shixia Liu, Zekai Gao, Huamin Qu, Li Tan, Conglei Shi, and Xin Tong. 2011. TextFlow: Towards Better Understanding of Evolving Topics in Text. IEEE Transactions on Visualization & Computer Graphics 17 (2011), 2412--2421. Google ScholarDigital Library
- E.R. Tufte. 1983. The Visual Display of Quantitative Information. Graphics Press. https://books.google.de/books?id=BHazAAAAIAAJ Google ScholarDigital Library
- E.R. Tufte. 1997. Visual Explanations: Images and Quantities, Evidence and Narrative. Graphics Press. https://books.google.de/books?id=nyUpnQAACAAJ Google ScholarDigital Library
- Fernanda B. Viégas and Martin Wattenberg. 2008. Timelines: Tag Clouds and the Case for Vernacular Visualization. interactions 15, 4 (July 2008), 49--52. Google ScholarDigital Library
- Max Wertheimer. 1923. 69 - Untersuchungen zur Lehre von der Gestalt. Psychologische Forschung: Zeitschrift für Psychologie und ihre Grenzwissenschaften 4 (1923), 301--350. http://vlp.mpiwg-berlin.mpg.de/library/data/lit38308/indexhtml?pn=8&ws=1.5Google Scholar
- Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan, and Xiaoming Li. 2011. Comparing Twitter and Traditional Media Using Topic Models. Springer Berlin Heidelberg, Berlin, Heidelberg, 338--349.Google Scholar
Index Terms
Visualization Needs in Computational Social Sciences
Recommendations
Knowledge structure transition in library and information science: topic modeling and visualization
AbstractThe purpose of this research is to identify topics in library and information science (LIS) using latent Dirichlet allocation (LDA) and to visualize the knowledge structure of the field as consisting of specific topics and its transition from 2000–...
A Model and Framework for Visualization Exploration
Visualization exploration is the process of extracting insight from data via interaction with visual depictions of that data. Visualization exploration is more than presentation; the interaction with both the data and its depiction is as important as ...
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementSentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Comments