ABSTRACT
In this half-day tutorial, we will show 1) how the HathiTrust Research Center (HTRC) Data Capsule can be used for non-consumptive research over collection of texts and 2) how integrated tools for LDA topic modeling and visualization can be used to drive formulation of new research questions. Participants will be given an account in the HTRC Data Capsule and taught how to use the workset manager to create a corpus, and then use the VM's secure mode to download texts and analyze their contents.
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- J. Murdock and C. Allen. Visualization techniques for topic model checking. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI-15), 2015.Google Scholar
- F. Pérez and B. E. Granger. IPython: a system for interactive scientific computing, May 2007.Google Scholar
- J. Zeng, G. Ruan, A. Crowell, A. Prakash, and B. Plale. Cloud computing data capsules for non-consumptive use of texts. In Proceedings of the 5th ACM Workshop on Scientific Cloud Computing, ScienceCloud '14, pages 9--16, 2014. Google ScholarDigital Library
Index Terms
- Topic Exploration with the HTRC Data Capsule for Non-Consumptive Research
Recommendations
Cloud computing data capsules for non-consumptiveuse of texts
ScienceCloud '14: Proceedings of the 5th ACM workshop on Scientific cloud computingAs digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, natural language processing (NLP), and other text analysis techniques. In this paper we propose a virtual machine (VM) ...
Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementAspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are ...
The dual-sparse topic model: mining focused topics and focused terms in short text
WWW '14: Proceedings of the 23rd international conference on World wide webTopic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate ...
Comments