Abstract
With the rapid development of social media services such as Twitter, Sina Weibo and so forth, short texts are becoming more and more prevalent. However, inferring topics from short texts is always full of challenges for many content analysis tasks because of the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a classification model named sentimental biterm topic model (SBTM), which is applied to sentiment classification over short texts. To alleviate the problem of sparsity in short texts, the similarity between words and documents are firstly estimated by singular value decomposition. Then, the most similar words are added to each short document in the corpus. Extensive evaluations on sentiment detection of short text validate the effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 787–788. ACM (2007)
Bao, S., Xu, S., Zhang, L., Yan, R., Su, Z., Han, D., Yu, Y.: Mining social emotions from affective text. IEEE Trans. Knowl. Data Eng. 24(9), 1658–1670 (2012)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cheng, X., Lan, Y., Guo, J., Yan, X.: Btm: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Gangemi, A., Presutti, V., Reforgiato Recupero, D.: Frame-based detection of opinion holders and topics: a model and a tool. IEEE Comput. Intell. Mag. 9(1), 20–30 (2014)
Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999)
Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 775–784. ACM (2011)
Katz, P., Singleton, M., Wicentowski, R.: Swat-mp: The semeval-2007 systems for task 5 and task 14. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval), pp. 308–313. Association for Computational Linguistics (2007)
Kim, S.B., Han, K.S., Rim, H.C., Myaeng, S.H.: Some effective techniques for naive bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web (WWW), pp. 91–100. ACM (2008)
Rao, Y., Lei, J., Liu, W., Li, Q., Chen, M.: Building emotional dictionary for sentiment analysis of online news. World Wide Web J. 17(4), 723–742 (2014)
Rao, Y., Li, Q., Liu, W., Wu, Q., Quan, X.: Affective topic model for social emotion detection. Neural Netw. 58(5), 29–37 (2014)
Rao, Y., Li, Q., Mao, X., Liu, W.: Sentiment topic models for social emotion mining. Inf. Sci. 266(5), 90–100 (2014)
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web (WWW), pp. 377–386(2006)
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 254–263. Association for Computational Linguistics (2008)
Stoyanov, V., Cardie, C.: Annotating topics of opinions. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pp. 3213–3217 (2008)
Strapparava, C., Mihalcea, R.: Semeval-2007 task 14: affective text. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval), pp. 70–74. Association for Computational Linguistics (2007)
Turney, P.D.: Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL). pp. 417–424 (2002)
Wang, J., Yao, Y., Liu, Z.: A new text classification method based on hmm-svm. In: International Symposium on Communications and Information Technologies (ISCIT). pp. 1516–1519 (2007)
Xie, H., Li, Q., Cai, Y.: Community-aware resource profiling for personalized search in folksonomy. J. Comput. Sci. Technol. 27(3), 599–610 (2012)
Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Rao, Y.: Community-aware user profile enrichment in folksonomy. Neural Netw. 58, 111–121 (2014)
Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Zheng, Q.: Mining latent user community for tag-based and content-based search in social media. Comput. J. 57(9), 1415–1430 (2014)
Acknowledgements
The authors are thankful to the anonymous reviewers for their constructive comments and suggestions on an earlier version of this paper. This research has been supported by the National Natural Science Foundation of China (61502545, 61472453, U1401256, U1501252), a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS11/E06/14), and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Pang, J., Li, X., Xie, H., Rao, Y. (2016). SBTM: Topic Modeling over Short Texts. In: Gao, H., Kim, J., Sakurai, Y. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9645. Springer, Cham. https://doi.org/10.1007/978-3-319-32055-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-32055-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32054-0
Online ISBN: 978-3-319-32055-7
eBook Packages: Computer ScienceComputer Science (R0)