Skip to main content

Text Classification and Transfer Learning Based on Character-Level Deep Convolutional Neural Networks

  • Conference paper
  • First Online:
Agents and Artificial Intelligence (ICAART 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10839))

Included in the following conference series:

Abstract

Temporal (one-dimensional) Convolutional Neural Network (Temporal CNN, ConvNet) is an emergent technology for text understanding. The input for the ConvNets could be either a sequence of words or a sequence of characters. In the latter case there are no needs for natural language processing. Past studies showed that the character-level ConvNets worked well for text classification in English and romanized Chinese corpus. In this article we apply the character-level ConvNets to Japanese corpus. We confirmed that meaningful representations are extracted by the ConvNets in English corpus and Japanese corpus. We attempt to reuse the meaningful representations that are learned in the ConvNets from a large-scale dataset in the form of transfer learning. As for the application to the news categorization and the sentiment analysis tasks in Japanese corpus, the ConvNets outperformed N-gram-based classifiers. In addition, our ConvNets transfer learning frameworks worked well for a task which is similar to one used for pre-training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://kakasi.namazu.org/.

  2. 2.

    http://www.afpbb.com/.

  3. 3.

    Rakuten, Inc. is one of the largest Japanese electronic commerce and Internet companies based in Tokyo, Japan.

  4. 4.

    http://www.nii.ac.jp/dsc/idr/en/rakuten/rakuten.html.

  5. 5.

    https://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html.

  6. 6.

    https://snap.stanford.edu/data/web-Amazon.html.

  7. 7.

    http://www.imdb.com/.

  8. 8.

    http://www.nii.ac.jp/dsc/idr/en/rakuten/rakuten.html.

References

  1. Agrawal, P., Girshick, R., Malik, J.: Analyzing the performance of multilayer neural networks for object recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 329–344. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_22

    Chapter  Google Scholar 

  2. Bengio, Y., Boulanger-Lewandowski, N., Pascanu, R.: Advances in optimizing recurrent networks. In: The Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013) (2013)

    Google Scholar 

  3. Del Corso, G.M., Gullí, A., Romani, F.: Ranking a stream of news. In: The Proceedings of the 14th International Conference on World Wide Web (WWW 2005), pp. 97–106 (2005)

    Google Scholar 

  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: The Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009) (2009)

    Google Scholar 

  5. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: The Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014) (2014)

    Google Scholar 

  6. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: The Proceedings of the 13rd International Conference on Artificial Intelligence and Statistics (AISTATS 2010) (2010)

    Google Scholar 

  7. Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: The Proceedings of the 28th International Conference on Machine Learning (ICML 2011) (2011)

    Google Scholar 

  8. Gulli, A.: The anatomy of a news search engine. In: International Conference on World Wide Web (WWW) Special Interest Tracks and Posters, WWW 2005, pp. 880–881 (2005)

    Google Scholar 

  9. Kim, Y.: Convolutional neural networks for sentence classification. In: The Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), pp. 1746–1751 (2014)

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: The Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS 2012), pp. 1097–1105 (2012)

    Google Scholar 

  11. Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying conditional random fields to japanese morphological analysis. In: The Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pp. 230–237 (2004)

    Google Scholar 

  12. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: The Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011), pp. 142–150 (2011)

    Google Scholar 

  13. McAuley, J., Pandey, R., Leskovec, J.: Inferring networks of substitutable and complementary products. In: The Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015), pp. 785–794 (2015)

    Google Scholar 

  14. McAuley, J., Targett, C., Shi, Q., van den Hengel, A.: Image-based recommendations on styles and substitutes. In: The Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015), pp. 43–52 (2015)

    Google Scholar 

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: The Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS 2013), pp. 3111–3119 (2013)

    Google Scholar 

  16. Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: The Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2013), pp. 746–751 (2013)

    Google Scholar 

  17. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: The Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)

    Google Scholar 

  18. dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: The Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), pp. 69–78 (2014)

    Google Scholar 

  19. dos Santos, C.N., Xiang, B., Zhou, B.: Classifying relations by ranking with convolutional neural networks. In: The Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015), pp. 626–634 (2015)

    Google Scholar 

  20. Sato, M., Orihara, R., Sei, Y., Tahara, Y., Ohsuga, A.: Japanese text classification by character-level deep ConvNets and transfer learning. In: The Proceedings of the 9th International Conference on Agents and Artificial Intelligence, vol. 2, pp. 175–184 (2017)

    Google Scholar 

  21. Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks. In: The Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015), pp. 959–962 (2015)

    Google Scholar 

  22. Severyn, A., Moschitti, A.: UNITN: training deep convolutional neural network for Twitter sentiment classification. In: The Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 464–469 (2015)

    Google Scholar 

  23. Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, CVPR 2014 (2014)

    Google Scholar 

  24. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: The Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), pp. 1631–1642 (2013)

    Google Scholar 

  25. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  26. Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR abs/1502.01710 (2015)

    Google Scholar 

  27. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: The Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS 2015), pp. 649–657 (2015)

    Google Scholar 

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Numbers 26330081, 26870201, 16K12411, 17H04705. We use the Rakuten dataset which is provided by the National Institute of Informatics (NII) according to the contract between NII and Rakuten, Inc. We would like to thank NII and Rakuten, Inc.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Minato Sato , Ryohei Orihara , Yuichi Sei , Yasuyuki Tahara or Akihiko Ohsuga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sato, M., Orihara, R., Sei, Y., Tahara, Y., Ohsuga, A. (2018). Text Classification and Transfer Learning Based on Character-Level Deep Convolutional Neural Networks. In: van den Herik, J., Rocha, A., Filipe, J. (eds) Agents and Artificial Intelligence. ICAART 2017. Lecture Notes in Computer Science(), vol 10839. Springer, Cham. https://doi.org/10.1007/978-3-319-93581-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93581-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93580-5

  • Online ISBN: 978-3-319-93581-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics