Graph Based Feature Augmentation for Short and Sparse Text Classification

Long, Guodong; Jiang, Jing

doi:10.1007/978-3-642-53914-5_39

Guodong Long²⁵ &
Jing Jiang²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8346))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2454 Accesses
1 Citations

Abstract

Short text classification, such as snippets, search queries, micro-blogs and product reviews, is a challenging task mainly because short texts have insufficient co-occurrence information between words and have a very spare document-term representation. To address this problem, we propose a novel multi-view classification method by combining both the original document-term representation and a new graph based feature representation. Our proposed method uses all documents to construct a neighbour graph by using the shared co-occurrence words. Multi-Dimensional Scaling (MDS) is further applied to extract a low-dimensional feature representation from the graph, which is augmented with the original text features for learning. Experiments on several benchmark datasets show that the proposed multi-view classifier, trained from augmented feature representation, obtains significant performance gain compared to the baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM (2008)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, pp. 377–386. ACM (2006)
Google Scholar
Vitale, D., Ferragina, P., Scaiella, U.: Classification of short texts by deploying topical annotations. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 376–387. Springer, Heidelberg (2012)
Chapter Google Scholar
Long, G., Chen, L., Zhu, X., Zhang, C.: Tcsst: transfer classification of short & sparse text using external data. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 764–772. ACM, New York (2012)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 513–520 (2011)
Google Scholar
Hughes, T., Ramage, D.: Lexical semantic relatedness with random graph walks. In: EMNLP-CoNLL, pp. 581–589 (2007)
Google Scholar
Ramage, D., Rafferty, A.N., Manning, C.D.: Random walks for text semantic similarity. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, pp. 23–31. Association for Computational Linguistics (2009)
Google Scholar
Xu, Y., Yi, X., Zhang, C.: A random walks method for text classification. In: SDM (2006)
Google Scholar
Zhu, X., Lafferty, J., Rosenfeld, R.: Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University, Language Technologies Institute, School of Computer Science (2005)
Google Scholar
Goldberg, A.B., Zhu, X.: Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pp. 45–52. Association for Computational Linguistics (2006)
Google Scholar
Borg, I., Groenen, P.J.: Modern multidimensional scaling: Theory and applications. Springer (2005)
Google Scholar
Tang, L., Liu, H.: Community detection and mining in social media. Synthesis Lectures on Data Mining and Knowledge Discovery 2(1), 1–137 (2010)
Article Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Google Scholar
Christoudias, C., Urtasun, R., Darrell, T.: Multi-view learning in the presence of view disagreement. arXiv preprint arXiv:1206.3242 (2012)
Google Scholar
Twitter sentiment data, http://www.sentiment140.com/
Joachims, T.: Making large scale svm learning practical (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Quantum Computation & Intelligent Systems, University of Technology, Sydney, Australia
Guodong Long & Jing Jiang

Authors

Guodong Long
View author publications
You can also search for this author in PubMed Google Scholar
Jing Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, University of Alberta, T6G 2E8, Edmonton, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Long, G., Jiang, J. (2013). Graph Based Feature Augmentation for Short and Sparse Text Classification. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-53914-5_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics