Abstract
The performance of short text classification is limited due to its intrinsic shortness of sentences which causes the sparseness of vector space model. Traditional classifiers like SVM are extremely sensitive to the features space, thereby making classification performance unsatisfying in short text related applications. It is believed that using external information to help better represent input data would possibly yield satisfying results. In this paper, we target on the problem of news title classification which is an essential and typical member in short text family and propose an approach which employs external information from long text to address the problem the sparseness. Afterwards Restricted Boltzman Machine are utilised to select features and then finally perform classification using Support Vector Machine. The experimental study on Reuters-21578 and Sogou Chinese news corpus has demonstrates the effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahonen-Myka, H.: Discovery of frequent word sequences in text. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 180–189. Springer, Heidelberg (2002)
Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788. ACM (2007)
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International Conference on World Wide Web, pp. 757–766 (2007)
Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: Proceedings of IEEE 23rd International Conference on Data Engineering, pp. 716–725. IEEE (2007)
Dauphin, Y., Bengio, Y.: Stochastic ratio matching of rbms for sparse high-dimensional inputs. In: Advances in Neural Information Processing Systems, pp. 1340–1348 (2013)
Dilrukshi, I., De Zoysa, K., Caldera, A.: Twitter news classification using svm. In: Proceedings of 8th International Conference on Computer Science Education, pp. 287–291 (April 2013)
Drury, B., Torgo, L., Almeida, J.: Classifying news stories to estimate the direction of a stock market index. In: Proceedings of 6th Iberian Conference on Information Systems and Technologies, pp. 1–4 (June 2011)
Hinton, G.: A practical guide to training restricted boltzmann machines. Momentum 9(1), 926 (2010)
Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 775–784. ACM (2011)
Kehagias, A., Petridis, V., Kaburlasos, V.G., Fragkou, P.: A comparison of word-and sense-based text categorization using several classification algorithms. Journal of Intelligent Information Systems 21(3), 227–247 (2003)
Larochelle, H., Bengio, Y.: Classification using discriminative restricted boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)
Li, R., Tao, X., Tang, L., Hu, Y.-F.: Using maximum entropy model for chinese text categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 578–587. Springer, Heidelberg (2004)
Li, Y., Chung, S.M., Holt, J.D.: Text document clustering based on frequent word meaning sequences. Data & Knowledge Engineering 64(1), 381–404 (2008)
Phan, X.H., Nguyen, C.T., Le, D.T., Nguyen, L.M., Horiguchi, S., Ha, Q.T.: A hidden topic-based framework toward building applications with short web documents. IEEE Transactions on Knowledge and Data Engineering 23(7), 961–976 (2011)
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM (2008)
Srivastava, N., Salakhutdinov, R.R., Hinton, G.E.: Modeling documents with deep boltzmann machines. arXiv preprint arXiv:1309.6865 (2013)
Zhang, C.-X., Zhang, J.-S., Ji, N.-N., Guo, G.: Learning ensemble classifiers via restricted boltzmann machines. Pattern Recognition Letters 36, 161–170 (2014)
Zhang, W., Yoshida, T., Tang, X.: Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21(8), 879–886 (2008)
Zhang, W., Yoshida, T., Tang, X., Wang, Q.: Text clustering using frequent itemsets. Knowledge-Based Systems 23(5), 379–388 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ouyang, Y., Huangfu, Y., Sheng, H., Xiong, Z. (2014). News Title Classification with Support from Auxiliary Long Texts. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8835. Springer, Cham. https://doi.org/10.1007/978-3-319-12640-1_70
Download citation
DOI: https://doi.org/10.1007/978-3-319-12640-1_70
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12639-5
Online ISBN: 978-3-319-12640-1
eBook Packages: Computer ScienceComputer Science (R0)