Skip to main content

Effectively Classifying Short Texts via Improved Lexical Category and Semantic Features

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9771))

Included in the following conference series:

Abstract

Classification of short text is challenging due to its severe sparseness and high dimension, which are typical characteristics of short text. In this paper, we propose a novel approach to classify short texts based on both lexical and semantic features. Firstly, the term dictionary is constructed by selecting lexical features that are most representative words of a certain category, and then the optimal topic distribution from the background knowledge repository is extracted via Latent Dirichlet Allocation. The new feature for short text is thereafter constructed. The experimental results show that our method achieved significant quality enhancement in terms of short text classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gupta, V., Lehal, G.S.: A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1), 60–76 (2009)

    Google Scholar 

  2. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  3. Cheng, Q.Q., Wang, L.L., Zheng, T., et al.: Microblog friend recommendation based on multi-feature classification. Comput. Eng. 41(4), 65–69 (2015)

    Google Scholar 

  4. Sun, A.: Short text classification using very few words. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, USA, pp. 1145–1146 (2012)

    Google Scholar 

  5. Vo, D.T., Ock, C.Y.: Learning to classify short text from scientific documents using topic models with various types of knowledge. Expert Syst. Appl. 42(3), 1684–1698 (2015)

    Article  Google Scholar 

  6. Hu, X., Zhang, X., Lu, C., et al.: Exploiting Wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, pp. 389–396 (2009)

    Google Scholar 

  7. Hu, J., Fang, L., Cao, Y.: Enhancing text clustering by leveraging Wikipedia semantics. In: Proceedings of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, pp. 179–186 (2008)

    Google Scholar 

  8. Song, S., Zhu, H., Chen, L.: Probabilistic correlation-based similarity measure on text records. Inf. Sci. 289(1), 8–24 (2014)

    Article  Google Scholar 

  9. Yang, L.L., Li, C.P., Ding, Q., et al.: Combining lexical and semantic features for short text classification. In: Proceedings of the 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems, KES, pp. 78–86 (2013)

    Google Scholar 

  10. Cheng, H., Qin, Z., Qian, W., et al.: Conditional mutual information based feature selection. In: International Symposium on Knowledge Acquisition and Modeling, pp. 103–107 (2008)

    Google Scholar 

  11. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  12. Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM, New York (2008)

    Google Scholar 

  13. Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: Proceedings of the 22th International Joint Conference on Artificial Intelligence, pp. 1776–1781 (2011)

    Google Scholar 

  14. Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  15. Sogou Labs: Text Categorization Dataset [EB/OL]. http://www.sogou.com/labs/dl/c.html. Accessed 01 Sept 2008

  16. ICTCLAS, ICTCLAS2012-SDK-0101, rar [EB/OL]. http://www.nlpir.org/download/. Accessed 18 Aug 2014

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 61363058), Youth Science and technology support program of Gansu Province (145RJZA232, 145RJYA259), 2016 undergraduate innovation capacity enhancement program and 2016 annual public record open space Fund Project 1505JTCA007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huifang Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ma, H., Zhou, R., Liu, F., Lu, X. (2016). Effectively Classifying Short Texts via Improved Lexical Category and Semantic Features. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9771. Springer, Cham. https://doi.org/10.1007/978-3-319-42291-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42291-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42290-9

  • Online ISBN: 978-3-319-42291-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics