News Title Classification with Support from Auxiliary Long Texts

Ouyang, Yuanxin; Huangfu, Yao; Sheng, Hao; Xiong, Zhang

doi:10.1007/978-3-319-12640-1_70

Yuanxin Ouyang^20,21,
Yao Huangfu²⁰,
Hao Sheng^20,21 &
…
Zhang Xiong^20,21

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8835))

Included in the following conference series:

International Conference on Neural Information Processing

2384 Accesses
1 Citations

Abstract

The performance of short text classification is limited due to its intrinsic shortness of sentences which causes the sparseness of vector space model. Traditional classifiers like SVM are extremely sensitive to the features space, thereby making classification performance unsatisfying in short text related applications. It is believed that using external information to help better represent input data would possibly yield satisfying results. In this paper, we target on the problem of news title classification which is an essential and typical member in short text family and propose an approach which employs external information from long text to address the problem the sparseness. Afterwards Restricted Boltzman Machine are utilised to select features and then finally perform classification using Support Vector Machine. The experimental study on Reuters-21578 and Sogou Chinese news corpus has demonstrates the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahonen-Myka, H.: Discovery of frequent word sequences in text. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 180–189. Springer, Heidelberg (2002)
Chapter Google Scholar
Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788. ACM (2007)
Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International Conference on World Wide Web, pp. 757–766 (2007)
Google Scholar
Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: Proceedings of IEEE 23rd International Conference on Data Engineering, pp. 716–725. IEEE (2007)
Google Scholar
Dauphin, Y., Bengio, Y.: Stochastic ratio matching of rbms for sparse high-dimensional inputs. In: Advances in Neural Information Processing Systems, pp. 1340–1348 (2013)
Google Scholar
Dilrukshi, I., De Zoysa, K., Caldera, A.: Twitter news classification using svm. In: Proceedings of 8th International Conference on Computer Science Education, pp. 287–291 (April 2013)
Google Scholar
Drury, B., Torgo, L., Almeida, J.: Classifying news stories to estimate the direction of a stock market index. In: Proceedings of 6th Iberian Conference on Information Systems and Technologies, pp. 1–4 (June 2011)
Google Scholar
Hinton, G.: A practical guide to training restricted boltzmann machines. Momentum 9(1), 926 (2010)
Google Scholar
Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 775–784. ACM (2011)
Google Scholar
Kehagias, A., Petridis, V., Kaburlasos, V.G., Fragkou, P.: A comparison of word-and sense-based text categorization using several classification algorithms. Journal of Intelligent Information Systems 21(3), 227–247 (2003)
Article Google Scholar
Larochelle, H., Bengio, Y.: Classification using discriminative restricted boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)
Google Scholar
Li, R., Tao, X., Tang, L., Hu, Y.-F.: Using maximum entropy model for chinese text categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 578–587. Springer, Heidelberg (2004)
Chapter Google Scholar
Li, Y., Chung, S.M., Holt, J.D.: Text document clustering based on frequent word meaning sequences. Data & Knowledge Engineering 64(1), 381–404 (2008)
Article Google Scholar
Phan, X.H., Nguyen, C.T., Le, D.T., Nguyen, L.M., Horiguchi, S., Ha, Q.T.: A hidden topic-based framework toward building applications with short web documents. IEEE Transactions on Knowledge and Data Engineering 23(7), 961–976 (2011)
Article Google Scholar
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM (2008)
Google Scholar
Srivastava, N., Salakhutdinov, R.R., Hinton, G.E.: Modeling documents with deep boltzmann machines. arXiv preprint arXiv:1309.6865 (2013)
Google Scholar
Zhang, C.-X., Zhang, J.-S., Ji, N.-N., Guo, G.: Learning ensemble classifiers via restricted boltzmann machines. Pattern Recognition Letters 36, 161–170 (2014)
Article Google Scholar
Zhang, W., Yoshida, T., Tang, X.: Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21(8), 879–886 (2008)
Article Google Scholar
Zhang, W., Yoshida, T., Tang, X., Wang, Q.: Text clustering using frequent itemsets. Knowledge-Based Systems 23(5), 379–388 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Beihang University, Beijing, China
Yuanxin Ouyang, Yao Huangfu, Hao Sheng & Zhang Xiong
Research Institute of Beihang University in Shenzhen, Shenzhen, China
Yuanxin Ouyang, Hao Sheng & Zhang Xiong

Authors

Yuanxin Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Yao Huangfu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Faculty of Computer Science and Information Technology Building, University of Malaya, 50603, Kuala Lumpur, Malaysia
Chu Kiong Loo
Department of Electronics and Communication Engineering, College of Engineering, Universiti Tenaga Nasional, Jalan IKRAM-UNITEN, 43009, Kajang, Selangor, Malaysia
Keem Siah Yap
School of Engineering and Information Technology, Murdoch University, South St., 6150, Murdoch, Western Australia, Australia
Kok Wai Wong
Department of Electrical and Electronics Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, 120-749, Seoul, South Korea
Andrew Teoh
Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Ren’ai Road 111, SIP 215123, Suzhou, Jiangsu Province, China
Kaizhu Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ouyang, Y., Huangfu, Y., Sheng, H., Xiong, Z. (2014). News Title Classification with Support from Auxiliary Long Texts. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8835. Springer, Cham. https://doi.org/10.1007/978-3-319-12640-1_70

Download citation

DOI: https://doi.org/10.1007/978-3-319-12640-1_70
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12639-5
Online ISBN: 978-3-319-12640-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics