Skip to main content

News Title Classification with Support from Auxiliary Long Texts

  • Conference paper
Neural Information Processing (ICONIP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8835))

Included in the following conference series:

Abstract

The performance of short text classification is limited due to its intrinsic shortness of sentences which causes the sparseness of vector space model. Traditional classifiers like SVM are extremely sensitive to the features space, thereby making classification performance unsatisfying in short text related applications. It is believed that using external information to help better represent input data would possibly yield satisfying results. In this paper, we target on the problem of news title classification which is an essential and typical member in short text family and propose an approach which employs external information from long text to address the problem the sparseness. Afterwards Restricted Boltzman Machine are utilised to select features and then finally perform classification using Support Vector Machine. The experimental study on Reuters-21578 and Sogou Chinese news corpus has demonstrates the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahonen-Myka, H.: Discovery of frequent word sequences in text. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 180–189. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788. ACM (2007)

    Google Scholar 

  3. Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International Conference on World Wide Web, pp. 757–766 (2007)

    Google Scholar 

  4. Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: Proceedings of IEEE 23rd International Conference on Data Engineering, pp. 716–725. IEEE (2007)

    Google Scholar 

  5. Dauphin, Y., Bengio, Y.: Stochastic ratio matching of rbms for sparse high-dimensional inputs. In: Advances in Neural Information Processing Systems, pp. 1340–1348 (2013)

    Google Scholar 

  6. Dilrukshi, I., De Zoysa, K., Caldera, A.: Twitter news classification using svm. In: Proceedings of 8th International Conference on Computer Science Education, pp. 287–291 (April 2013)

    Google Scholar 

  7. Drury, B., Torgo, L., Almeida, J.: Classifying news stories to estimate the direction of a stock market index. In: Proceedings of 6th Iberian Conference on Information Systems and Technologies, pp. 1–4 (June 2011)

    Google Scholar 

  8. Hinton, G.: A practical guide to training restricted boltzmann machines. Momentum 9(1), 926 (2010)

    Google Scholar 

  9. Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 775–784. ACM (2011)

    Google Scholar 

  10. Kehagias, A., Petridis, V., Kaburlasos, V.G., Fragkou, P.: A comparison of word-and sense-based text categorization using several classification algorithms. Journal of Intelligent Information Systems 21(3), 227–247 (2003)

    Article  Google Scholar 

  11. Larochelle, H., Bengio, Y.: Classification using discriminative restricted boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)

    Google Scholar 

  12. Li, R., Tao, X., Tang, L., Hu, Y.-F.: Using maximum entropy model for chinese text categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 578–587. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Li, Y., Chung, S.M., Holt, J.D.: Text document clustering based on frequent word meaning sequences. Data & Knowledge Engineering 64(1), 381–404 (2008)

    Article  Google Scholar 

  14. Phan, X.H., Nguyen, C.T., Le, D.T., Nguyen, L.M., Horiguchi, S., Ha, Q.T.: A hidden topic-based framework toward building applications with short web documents. IEEE Transactions on Knowledge and Data Engineering 23(7), 961–976 (2011)

    Article  Google Scholar 

  15. Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM (2008)

    Google Scholar 

  16. Srivastava, N., Salakhutdinov, R.R., Hinton, G.E.: Modeling documents with deep boltzmann machines. arXiv preprint arXiv:1309.6865 (2013)

    Google Scholar 

  17. Zhang, C.-X., Zhang, J.-S., Ji, N.-N., Guo, G.: Learning ensemble classifiers via restricted boltzmann machines. Pattern Recognition Letters 36, 161–170 (2014)

    Article  Google Scholar 

  18. Zhang, W., Yoshida, T., Tang, X.: Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21(8), 879–886 (2008)

    Article  Google Scholar 

  19. Zhang, W., Yoshida, T., Tang, X., Wang, Q.: Text clustering using frequent itemsets. Knowledge-Based Systems 23(5), 379–388 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ouyang, Y., Huangfu, Y., Sheng, H., Xiong, Z. (2014). News Title Classification with Support from Auxiliary Long Texts. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8835. Springer, Cham. https://doi.org/10.1007/978-3-319-12640-1_70

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12640-1_70

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12639-5

  • Online ISBN: 978-3-319-12640-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics