A Semantic Representation Enhancement Method for Chinese News Headline Classification

Yin, Zhongbo; Tang, Jintao; Ru, Chengsen; Luo, Wei; Luo, Zhunchen; Ma, Xiaolei

doi:10.1007/978-3-319-73618-1_27

Zhongbo Yin¹⁸,
Jintao Tang¹⁹,
Chengsen Ru¹⁹,
Wei Luo¹⁸,
Zhunchen Luo¹⁸ &
…
Xiaolei Ma¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Included in the following conference series:

National CCF Conference on Natural Language Processing and Chinese Computing

3376 Accesses
6 Citations

Abstract

Recently there has been an increasing research interest in short text such as news headline. Due to the inherent sparsity of short text, the current text classification methods perform badly when applied to the classification of news headlines. To overcome this problem, a novel method which enhances the semantic representation of headlines is proposed in this paper. Firstly, we add some keywords extracted from the most similar news to expand the word features. Secondly, we use the corpus in news domain to pre-train the word embedding so as to enhance the word representation. Moreover, Fasttext classifier, which uses a liner method to classify text with fast speed and high accuracy, is adopted for news headline classification. On the task for Chinese news headline categorization in NLPCC2017, the proposed method achieved 83.1% of the F-measure, which got the first rank in 33 teams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tang, Q., Guo, Q.-L., Li, Y.-M.: Similarity computing of documents based on VSMJ. Appl. Res. Comput. 25(11), 3256–3258 (2008)
Google Scholar
Corrado, G., Mikolov, T., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv: 1607.04606 (2016)
Lachiche, N., Flach, P.A.: Naive Bayesian classification of structured data. Mach. Learn. 57(3), 233–269 (2004)
Article MATH Google Scholar
Sontag, D., Rush, A.M., Kim, Y., Jernite, Y.: Character-aware neural language models. Comput. Sci. 2741–2749 (2015)
Google Scholar
LeCun, Y., Zhang, X., Zhao, J.: Character-level convolutional networks for text classification. arXiv:1509.01626 (2015)
Bojanowski, P., Mikolov, T., Joulin, A., Grave, E.: Bag of tricks for efficient text classification. arXiv:1607.04606 (2016)
Horiguchi, S., Phan, X.H., Nguyen, L.M.: Learning to classify short and sparse text and web with hidden topics from large-scale data collections. In: WWW 2008 Refereed Track: Data Mining - Learning, pp. 91–100 (2008)
Google Scholar
Hu, H., Fan, X.: A new model for Chinese short-text classification considering feature expansion. In: International Conference on Artificial Intelligence and Computational Intelligence, vol. 2, pp. 7–11 (2010)
Google Scholar
Xu, J., Yang, L., Li., C., Zhou, Y., Xu, B.: Compositional recurrent neural networks for Chinese short text classification. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 137–144 (2016)
Google Scholar
Cai, Y.Q., Chen, Y.W., Wang, J.L., et al.: A method for Chinese text classification based on apparent semantics and latent aspects. J. Ambient Intell. Human. Comput. 6(4), 473–480 (2015)
Article Google Scholar
Probabilistic latent semantic analysis. Proceedings of 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, pp. 289–296 (1999)
Google Scholar
Luo, W., Du, J.X., Chen, Y.W., Zhou, Q.: Classification of Chinese text based on recognition of semantic topics. Cogn. Comput. 8(1), 114–124 (2016)
Article Google Scholar
Liu, X., Wu, X., Sang, L., Xie, F.: Wefest: word embedding feature expansion for short text classification. In: IEEE International Conference on Data Mining Workshops (2017)
Google Scholar
Huang, J., Zhu, J., Yao, D., Bi, J.: A word distributed representation based framework for large-scale short text classification. In: International Joint Conference on Neural Networks, pp. 1–7 (2015)
Google Scholar
Zhang, Z., Li, T., Zhang., Y., Ma, C., Wan, X.: Short text classification based on semantics. In: International Conference on Intelligent Computing, vol. 9227, pp. 463–470 (2015)
Google Scholar
Zhang, H., Yin, C., Xiang, J., A new SVM method for short text classification based on semi-supervised learning. In: Advanced Information Technology and Sensor Application (AITS), pp. 100–103 (2016)
Google Scholar
Xu, J., Wang, P., Xua, B., et al.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174(PB), 806–814 (2016)
Google Scholar
Sequential short-text classification with recurrent and convolutional neural networks. Proceedings of NAACL-HLT 2016, pp. 515–520 (2016)
Google Scholar
Huiyou, C., Yongjun, H., Jiaxin, J.: A new method of keywords extraction for Chinese short - text classification. New Technol. Libr. Inf. Serv. 234(6), 42–48 (2013)
Google Scholar
Jieba Chinese text segmentation, June 2017
Google Scholar
Stop word list, June 2017
Google Scholar
Senécal, J.S., Morin, F., Gauvain, J.L., Bengio, Y., Schwenk, H.: Neural probabilistic language models. J. Mach. Learn. Res. 3(6), 1137–1155 (2006). Springer, Heidelberg
Google Scholar
Dagan, I., Levy, O., Goldberg, Y.: Improving distributional similarity with lessons learned from word embeddings. Bulletin De La Société Botanique De France 75(3), 552–555 (2015)
Google Scholar
Corpus for Chinese news headline categorization, June 2017
Google Scholar
Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kim, Y.: Convolutional neural networks for sentence classfication. arXiv:1408.5882 (2014)

Download references

Acknowledgements

Firstly, we would like to thank Jintao Tang and Ting Wang for their valuable suggestions on the initial version of this paper, which have helped a lot to improve the paper. Secondly, we also want to express gratitudes to the anonymous reviewers for their hard work and kind comments, which will further improve our work in the future. This work was supported by the National Natural Science Foundation of China (No. 61602490).

Author information

Authors and Affiliations

China Defense Science and Technology Information Center, Beijing, 100142, China
Zhongbo Yin, Wei Luo & Zhunchen Luo
National University of Defense Technology, Changsha, 410073, China
Jintao Tang, Chengsen Ru & Xiaolei Ma

Authors

Zhongbo Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jintao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Chengsen Ru
View author publications
You can also search for this author in PubMed Google Scholar
Wei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhunchen Luo
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolei Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Luo .

Editor information

Editors and Affiliations

Fudan University, Shanghai, China
Xuanjing Huang
Singapore Management University, Singapore, Singapore
Jing Jiang
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Yansong Feng
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, Z., Tang, J., Ru, C., Luo, W., Luo, Z., Ma, X. (2018). A Semantic Representation Enhancement Method for Chinese News Headline Classification. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-73618-1_27
Published: 05 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics