Abstract
Recently there has been an increasing research interest in short text such as news headline. Due to the inherent sparsity of short text, the current text classification methods perform badly when applied to the classification of news headlines. To overcome this problem, a novel method which enhances the semantic representation of headlines is proposed in this paper. Firstly, we add some keywords extracted from the most similar news to expand the word features. Secondly, we use the corpus in news domain to pre-train the word embedding so as to enhance the word representation. Moreover, Fasttext classifier, which uses a liner method to classify text with fast speed and high accuracy, is adopted for news headline classification. On the task for Chinese news headline categorization in NLPCC2017, the proposed method achieved 83.1% of the F-measure, which got the first rank in 33 teams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tang, Q., Guo, Q.-L., Li, Y.-M.: Similarity computing of documents based on VSMJ. Appl. Res. Comput. 25(11), 3256–3258 (2008)
Corrado, G., Mikolov, T., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv: 1607.04606 (2016)
Lachiche, N., Flach, P.A.: Naive Bayesian classification of structured data. Mach. Learn. 57(3), 233–269 (2004)
Sontag, D., Rush, A.M., Kim, Y., Jernite, Y.: Character-aware neural language models. Comput. Sci. 2741–2749 (2015)
LeCun, Y., Zhang, X., Zhao, J.: Character-level convolutional networks for text classification. arXiv:1509.01626 (2015)
Bojanowski, P., Mikolov, T., Joulin, A., Grave, E.: Bag of tricks for efficient text classification. arXiv:1607.04606 (2016)
Horiguchi, S., Phan, X.H., Nguyen, L.M.: Learning to classify short and sparse text and web with hidden topics from large-scale data collections. In: WWW 2008 Refereed Track: Data Mining - Learning, pp. 91–100 (2008)
Hu, H., Fan, X.: A new model for Chinese short-text classification considering feature expansion. In: International Conference on Artificial Intelligence and Computational Intelligence, vol. 2, pp. 7–11 (2010)
Xu, J., Yang, L., Li., C., Zhou, Y., Xu, B.: Compositional recurrent neural networks for Chinese short text classification. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 137–144 (2016)
Cai, Y.Q., Chen, Y.W., Wang, J.L., et al.: A method for Chinese text classification based on apparent semantics and latent aspects. J. Ambient Intell. Human. Comput. 6(4), 473–480 (2015)
Probabilistic latent semantic analysis. Proceedings of 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, pp. 289–296 (1999)
Luo, W., Du, J.X., Chen, Y.W., Zhou, Q.: Classification of Chinese text based on recognition of semantic topics. Cogn. Comput. 8(1), 114–124 (2016)
Liu, X., Wu, X., Sang, L., Xie, F.: Wefest: word embedding feature expansion for short text classification. In: IEEE International Conference on Data Mining Workshops (2017)
Huang, J., Zhu, J., Yao, D., Bi, J.: A word distributed representation based framework for large-scale short text classification. In: International Joint Conference on Neural Networks, pp. 1–7 (2015)
Zhang, Z., Li, T., Zhang., Y., Ma, C., Wan, X.: Short text classification based on semantics. In: International Conference on Intelligent Computing, vol. 9227, pp. 463–470 (2015)
Zhang, H., Yin, C., Xiang, J., A new SVM method for short text classification based on semi-supervised learning. In: Advanced Information Technology and Sensor Application (AITS), pp. 100–103 (2016)
Xu, J., Wang, P., Xua, B., et al.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174(PB), 806–814 (2016)
Sequential short-text classification with recurrent and convolutional neural networks. Proceedings of NAACL-HLT 2016, pp. 515–520 (2016)
Huiyou, C., Yongjun, H., Jiaxin, J.: A new method of keywords extraction for Chinese short - text classification. New Technol. Libr. Inf. Serv. 234(6), 42–48 (2013)
Jieba Chinese text segmentation, June 2017
Stop word list, June 2017
Senécal, J.S., Morin, F., Gauvain, J.L., Bengio, Y., Schwenk, H.: Neural probabilistic language models. J. Mach. Learn. Res. 3(6), 1137–1155 (2006). Springer, Heidelberg
Dagan, I., Levy, O., Goldberg, Y.: Improving distributional similarity with lessons learned from word embeddings. Bulletin De La Société Botanique De France 75(3), 552–555 (2015)
Corpus for Chinese news headline categorization, June 2017
Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Kim, Y.: Convolutional neural networks for sentence classfication. arXiv:1408.5882 (2014)
Acknowledgements
Firstly, we would like to thank Jintao Tang and Ting Wang for their valuable suggestions on the initial version of this paper, which have helped a lot to improve the paper. Secondly, we also want to express gratitudes to the anonymous reviewers for their hard work and kind comments, which will further improve our work in the future. This work was supported by the National Natural Science Foundation of China (No. 61602490).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Yin, Z., Tang, J., Ru, C., Luo, W., Luo, Z., Ma, X. (2018). A Semantic Representation Enhancement Method for Chinese News Headline Classification. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-73618-1_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)