Abstract
Short text classification, such as snippets, search queries, micro-blogs and product reviews, is a challenging task mainly because short texts have insufficient co-occurrence information between words and have a very spare document-term representation. To address this problem, we propose a novel multi-view classification method by combining both the original document-term representation and a new graph based feature representation. Our proposed method uses all documents to construct a neighbour graph by using the shared co-occurrence words. Multi-Dimensional Scaling (MDS) is further applied to extract a low-dimensional feature representation from the graph, which is augmented with the original text features for learning. Experiments on several benchmark datasets show that the proposed multi-view classifier, trained from augmented feature representation, obtains significant performance gain compared to the baseline methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM (2008)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, pp. 377–386. ACM (2006)
Vitale, D., Ferragina, P., Scaiella, U.: Classification of short texts by deploying topical annotations. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 376–387. Springer, Heidelberg (2012)
Long, G., Chen, L., Zhu, X., Zhang, C.: Tcsst: transfer classification of short & sparse text using external data. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 764–772. ACM, New York (2012)
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 513–520 (2011)
Hughes, T., Ramage, D.: Lexical semantic relatedness with random graph walks. In: EMNLP-CoNLL, pp. 581–589 (2007)
Ramage, D., Rafferty, A.N., Manning, C.D.: Random walks for text semantic similarity. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, pp. 23–31. Association for Computational Linguistics (2009)
Xu, Y., Yi, X., Zhang, C.: A random walks method for text classification. In: SDM (2006)
Zhu, X., Lafferty, J., Rosenfeld, R.: Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University, Language Technologies Institute, School of Computer Science (2005)
Goldberg, A.B., Zhu, X.: Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pp. 45–52. Association for Computational Linguistics (2006)
Borg, I., Groenen, P.J.: Modern multidimensional scaling: Theory and applications. Springer (2005)
Tang, L., Liu, H.: Community detection and mining in social media. Synthesis Lectures on Data Mining and Knowledge Discovery 2(1), 1–137 (2010)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Christoudias, C., Urtasun, R., Darrell, T.: Multi-view learning in the presence of view disagreement. arXiv preprint arXiv:1206.3242 (2012)
Twitter sentiment data, http://www.sentiment140.com/
Joachims, T.: Making large scale svm learning practical (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Long, G., Jiang, J. (2013). Graph Based Feature Augmentation for Short and Sparse Text Classification. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-53914-5_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)