Skip to main content

A Semantic Representation Enhancement Method for Chinese News Headline Classification

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Abstract

Recently there has been an increasing research interest in short text such as news headline. Due to the inherent sparsity of short text, the current text classification methods perform badly when applied to the classification of news headlines. To overcome this problem, a novel method which enhances the semantic representation of headlines is proposed in this paper. Firstly, we add some keywords extracted from the most similar news to expand the word features. Secondly, we use the corpus in news domain to pre-train the word embedding so as to enhance the word representation. Moreover, Fasttext classifier, which uses a liner method to classify text with fast speed and high accuracy, is adopted for news headline classification. On the task for Chinese news headline categorization in NLPCC2017, the proposed method achieved 83.1% of the F-measure, which got the first rank in 33 teams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tang, Q., Guo, Q.-L., Li, Y.-M.: Similarity computing of documents based on VSMJ. Appl. Res. Comput. 25(11), 3256–3258 (2008)

    Google Scholar 

  2. Corrado, G., Mikolov, T., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv: 1607.04606 (2016)

  4. Lachiche, N., Flach, P.A.: Naive Bayesian classification of structured data. Mach. Learn. 57(3), 233–269 (2004)

    Article  MATH  Google Scholar 

  5. Sontag, D., Rush, A.M., Kim, Y., Jernite, Y.: Character-aware neural language models. Comput. Sci. 2741–2749 (2015)

    Google Scholar 

  6. LeCun, Y., Zhang, X., Zhao, J.: Character-level convolutional networks for text classification. arXiv:1509.01626 (2015)

  7. Bojanowski, P., Mikolov, T., Joulin, A., Grave, E.: Bag of tricks for efficient text classification. arXiv:1607.04606 (2016)

  8. Horiguchi, S., Phan, X.H., Nguyen, L.M.: Learning to classify short and sparse text and web with hidden topics from large-scale data collections. In: WWW 2008 Refereed Track: Data Mining - Learning, pp. 91–100 (2008)

    Google Scholar 

  9. Hu, H., Fan, X.: A new model for Chinese short-text classification considering feature expansion. In: International Conference on Artificial Intelligence and Computational Intelligence, vol. 2, pp. 7–11 (2010)

    Google Scholar 

  10. Xu, J., Yang, L., Li., C., Zhou, Y., Xu, B.: Compositional recurrent neural networks for Chinese short text classification. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 137–144 (2016)

    Google Scholar 

  11. Cai, Y.Q., Chen, Y.W., Wang, J.L., et al.: A method for Chinese text classification based on apparent semantics and latent aspects. J. Ambient Intell. Human. Comput. 6(4), 473–480 (2015)

    Article  Google Scholar 

  12. Probabilistic latent semantic analysis. Proceedings of 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, pp. 289–296 (1999)

    Google Scholar 

  13. Luo, W., Du, J.X., Chen, Y.W., Zhou, Q.: Classification of Chinese text based on recognition of semantic topics. Cogn. Comput. 8(1), 114–124 (2016)

    Article  Google Scholar 

  14. Liu, X., Wu, X., Sang, L., Xie, F.: Wefest: word embedding feature expansion for short text classification. In: IEEE International Conference on Data Mining Workshops (2017)

    Google Scholar 

  15. Huang, J., Zhu, J., Yao, D., Bi, J.: A word distributed representation based framework for large-scale short text classification. In: International Joint Conference on Neural Networks, pp. 1–7 (2015)

    Google Scholar 

  16. Zhang, Z., Li, T., Zhang., Y., Ma, C., Wan, X.: Short text classification based on semantics. In: International Conference on Intelligent Computing, vol. 9227, pp. 463–470 (2015)

    Google Scholar 

  17. Zhang, H., Yin, C., Xiang, J., A new SVM method for short text classification based on semi-supervised learning. In: Advanced Information Technology and Sensor Application (AITS), pp. 100–103 (2016)

    Google Scholar 

  18. Xu, J., Wang, P., Xua, B., et al.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174(PB), 806–814 (2016)

    Google Scholar 

  19. Sequential short-text classification with recurrent and convolutional neural networks. Proceedings of NAACL-HLT 2016, pp. 515–520 (2016)

    Google Scholar 

  20. Huiyou, C., Yongjun, H., Jiaxin, J.: A new method of keywords extraction for Chinese short - text classification. New Technol. Libr. Inf. Serv. 234(6), 42–48 (2013)

    Google Scholar 

  21. Jieba Chinese text segmentation, June 2017

    Google Scholar 

  22. Stop word list, June 2017

    Google Scholar 

  23. Senécal, J.S., Morin, F., Gauvain, J.L., Bengio, Y., Schwenk, H.: Neural probabilistic language models. J. Mach. Learn. Res. 3(6), 1137–1155 (2006). Springer, Heidelberg

    Google Scholar 

  24. Dagan, I., Levy, O., Goldberg, Y.: Improving distributional similarity with lessons learned from word embeddings. Bulletin De La Société Botanique De France 75(3), 552–555 (2015)

    Google Scholar 

  25. Corpus for Chinese news headline categorization, June 2017

    Google Scholar 

  26. Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  27. Kim, Y.: Convolutional neural networks for sentence classfication. arXiv:1408.5882 (2014)

Download references

Acknowledgements

Firstly, we would like to thank Jintao Tang and Ting Wang for their valuable suggestions on the initial version of this paper, which have helped a lot to improve the paper. Secondly, we also want to express gratitudes to the anonymous reviewers for their hard work and kind comments, which will further improve our work in the future. This work was supported by the National Natural Science Foundation of China (No. 61602490).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yin, Z., Tang, J., Ru, C., Luo, W., Luo, Z., Ma, X. (2018). A Semantic Representation Enhancement Method for Chinese News Headline Classification. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73618-1_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73617-4

  • Online ISBN: 978-3-319-73618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics