Improving Short Text Classification Using Public Search Engines

Meng, Wang; Lanfen, Lin; Jing, Wang; Penghua, Yu; Jiaolong, Liu; Fei, Xie

doi:10.1007/978-3-642-39515-4_14

Wang Meng²¹,
Lin Lanfen²¹,
Wang Jing²¹,
Yu Penghua²¹,
Liu Jiaolong²¹ &
…
Xie Fei²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8032))

Included in the following conference series:

International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making

1209 Accesses
9 Citations

Abstract

In Web2.0 applications, lots of the texts provided by users are as short as 3 to 10 words. A good classification against the short texts can help the readers find needed messages more quickly. In this paper, we proposed a method to expand the short texts with the help of public search engines through two steps. First we searched the short text in a public search engine and crawled the result pages. Secondly we regarded the texts in result pages as some background knowledge of the original short text, and extracted the feature vector from them. Therefore we can choose a proper number of the result pages to obtain enough corpuses for feature vector extraction to solve the data sparseness problem. We conducted some experiments under different situations and the empirical results indicated that this enriched representation of short texts can substantially improve the classification effects.

This work is supported by the Doctoral Program of Higher Education of China (No.2011010111006S), National Science and Technology Support Program of China (No.2011BAGOSB04, No.2012BAD3SBOl), and Zhejiang Key Science and Technology Innovation Team Plan of China (No. 2009RSOOIS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beitzel, S.M., Jensen, E.C., Frieder, O., Grossman, D., Lewis, D.D., Chowdhury, A., Kolcz, A.: Automatic Web Query Classification Using Labeled and Unlabeled Training Data. In: 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, pp. 581–582 (2005)
Google Scholar
Sahami, M., Heilman, T.D.: A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets. In: 15th International Conference on World Wide Web, Edinburgh, Scotland, pp. 377–386 (2006)
Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring Semantic Similarity Between Words Using Web Search Engines. In: 16th International Conference on World Wide Web, Banff, Canada, vol. 7, pp. 757–786 (2007)
Google Scholar
Yih, W.T., Meek, C.: Improving Similarity Measures for Short Segments of Text. In: 22nd National Conference on Artificial Intelligence, Vancouver, Canada, pp. 1489–1494 (2007)
Google Scholar
Yih, W.T., Goodman, J., Carvalho, V.R.: Finding Advertising Keywords on Web Pages. In: 15th International Conference on World Wide Web, Edinburgh, Scotland, pp. 213–222 (2006)
Google Scholar
Banerjee, S., Ramanathan, K., Gupta, A.: Clustering Short Texts Using Wikipedia. In: 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 787–788 (2007)
Google Scholar
Zelikovitz, S., Hirsh, H.: Improving Short Text Classification Using Unlabeled Background Knowledge to Assess Document Similarity. In: 17th International Conference on Machine Learning, Stanford University, USA, pp. 1191–1198 (2000)
Google Scholar
Qiang, P., Guo, W.Y.: Short-Text Classification Based on ICA and LSA. In: Wang, J., Yi, Z., Żurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3972, pp. 265–270. Springer, Heidelberg (2006)
Chapter Google Scholar
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-Scale Data Collections. In: 17th International Conference on World Wide Web, Beijing, China, pp. 91–100 (2008)
Google Scholar
Kim, K., Chung, B.S., Choi, Y.R., Park, J.: Semantic Pattern Tree Kernels for Short-Text Classification. In: IEEE 9th International Conference on Dependable, Autonomic and Secure Computing, Sydney, Australia, pp. 1250–1252 (2011)
Google Scholar
Efron, M., Organisciak, P., Fenlon, K.: Improving Retrieval of Short Texts Through Document Expansion. In: 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, United States, pp. 911–920 (2012)
Google Scholar
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short Text Classification in Twitter to Improve Information Filtering. In: 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, pp. 841–842 (2010)
Google Scholar
Xu, Y., Jones, G.J.F., Wang, B.: Query dependent pseudo-relevance feedback based on wikipedia. In: 32nd ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, MA, United States, pp. 59–66 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, HangZhou, China
Wang Meng, Lin Lanfen, Wang Jing, Yu Penghua, Liu Jiaolong & Xie Fei

Authors

Wang Meng
View author publications
You can also search for this author in PubMed Google Scholar
Lin Lanfen
View author publications
You can also search for this author in PubMed Google Scholar
Wang Jing
View author publications
You can also search for this author in PubMed Google Scholar
Yu Penghua
View author publications
You can also search for this author in PubMed Google Scholar
Liu Jiaolong
View author publications
You can also search for this author in PubMed Google Scholar
Xie Fei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Automation Science and Electrical Engineering, Beihang University, 37 Xueyuan Road, 100191, Beijing, China
Zengchang Qin
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai Nomi, 923-1292, Ishikawa, Japan
Van-Nam Huynh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meng, W., Lanfen, L., Jing, W., Penghua, Y., Jiaolong, L., Fei, X. (2013). Improving Short Text Classification Using Public Search Engines. In: Qin, Z., Huynh, VN. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2013. Lecture Notes in Computer Science(), vol 8032. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39515-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-39515-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39514-7
Online ISBN: 978-3-642-39515-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics