Ld-CNNs: A Deep Learning System for Structured Text Categorization Based on LDA in Content Security

Liu, Jinshuo; Xu, Yabo; Deng, Juan; Wang, Lina; Zhang, Lanxin

doi:10.1007/978-3-319-46298-1_8

Jinshuo Liu¹⁷,
Yabo Xu¹⁷,
Juan Deng¹⁷,
Lina Wang¹⁷ &
…
Lanxin Zhang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9955))

Included in the following conference series:

International Conference on Network and System Security

1673 Accesses

Abstract

Text categorization is a foundational task in many NLP applications. Traditional text classifiers often rely on hand engineering features, and recently Convolutional Neural Networks (CNNs) with word vectors have achieved remarkably better performance than traditional methods [15, 20]. In this paper, we combined prior knowledge into deep learning method for structured text categorization. In our model, we apply word embedding to capture both semantic and syntactic information of words, and apply different convolutional neural networks to capture advanced features of different parts of the structured text. Since different text parts perform different impact on the text categorization result, a linear SVM kernel is then applied to decide the final categorization result. Moreover, in order to enhance discriminativeness of the word, we employ latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings based on both words and their topics. We conduct experiments on several datasets. The experimental results show that our model outperforms typical text categorization models, especially when the text in the dataset have a similar structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bengio, Y., Schwenk, H., Senécal, J.S., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3(6), 1137–1155 (2003)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)
MATH Google Scholar
Cai, L., Hofmann, T.: Text categorization by boosting automatically extracted concepts. In: SIGIR, pp. 182–189 (2003)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by backpropagating errors. In: Anderson, J.A., Rosenfeld, E. (eds.) Neurocomputing: Foundations of Research, pp. 696–699. MIT Press, Cambridge (1988)
Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155. ACM Press (1998)
Google Scholar
Elman, J.L.: Finding struture in time. Cogn. Sci. 14(90), 179–211 (1990)
Article Google Scholar
Fragos, K., Maistros, I., Skourlas, C.: A X2-weighted maximum entropy model for text classification. In: Proceedings of 2nd International Conference on Natural Language Understanding and Cognitive Science, Miami, Florida, pp. 22–23 (2005)
Google Scholar
Goyal, R.D.: Knowledge based neural network for text classification. In: IEEE International Conference on Granular Computing, p. 542 (2007)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101, 5228–5235 (2004)
Article Google Scholar
Grossman, D., Domingos, P.: Learning Bayesian network classifiers by maximizing conditional likelihood. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 361–368. ACM Press (2005)
Google Scholar
Hamad, A.: Weighted naive Bayesian classifier. In: IEEE/ACS International Conference, on Computer Systems and Applications, AICCSA apos 2007, vol. 1(1), pp. 437–441 (2007)
Google Scholar
Hingmire, S., Chougule, S., Palshikar, G.K., Chakraborti, S.: Document classification by topic labeling. In: SIGIR, pp. 877–880 (2013)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. Eprint Arxiv, p. 1 (2014)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. Eprint Arxiv (2014)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. Eprint Arxiv, vol. 4, pp. 1188–1196 (2014)
Google Scholar
Liu, P., Qiu, X., Huang, X.: Learning context-sensitive word embeddings with neural tensor skip-gram model. In: International Conference on Artificial Intelligence. AAAI Press (2015)
Google Scholar
Liu, Y., Liu, Z., Chua, T.S., et al.: Topical word embeddings. In: Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI Press (2015)
Google Scholar
Mikolov, T., Yih, W.-T., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL-HLT, pp. 746–751 (2013)
Google Scholar
Johnson, R., Zhang, T.: Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058 (2014)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceeding of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2267–2273 (2014)
Google Scholar
Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoen-coders for predicting sentiment distributions. In: EMNLP, pp. 151–161 (2011)
Google Scholar
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp. 1631–1642 (2013)
Google Scholar
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Meeting of the Association for Computational Linguistics: Short Papers, pp. 90–94 (2012)
Google Scholar
Boureau, Y.-L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: International Conference on Machine Learning, p. 459 (2010)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Zeng, L., Li, Z.: Text classification based on paragraph distributed representation and extreme learning machine. In: Tan, Y., Shi, Y., Buarque, F., Gelbukh, A., Das, S., Engelbrecht, A. (eds.) ICSI-CCI 2015. LNCS, vol. 9141, pp. 81–88. Springer, Heidelberg (2015)
Chapter Google Scholar
Zhang, X., Zhao, J., Lecun, Y.: Character-level convolutional networks for text classification. In: Neural Information Processing Systems (2015)
Google Scholar

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for the constructive comments. This work was sponsored by the National Natural Science Foundation of China (No. 61303214 and No. 61303025, project approval number: U1536204).

Author information

Authors and Affiliations

Computer School of Wuhan University, Wuhan, Hubei, China
Jinshuo Liu, Yabo Xu, Juan Deng, Lina Wang & Lanxin Zhang

Authors

Jinshuo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yabo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Juan Deng
View author publications
You can also search for this author in PubMed Google Scholar
Lina Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lanxin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yabo Xu .

Editor information

Editors and Affiliations

Central China Normal University , Wuhan, China
Jiageng Chen
Università degli Studi di Milano , Crema (CR), Italy
Vincenzo Piuri
Osaka University , Osaka, Japan
Chunhua Su
Columbia University , New York, New York, USA
Moti Yung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Xu, Y., Deng, J., Wang, L., Zhang, L. (2016). Ld-CNNs: A Deep Learning System for Structured Text Categorization Based on LDA in Content Security. In: Chen, J., Piuri, V., Su, C., Yung, M. (eds) Network and System Security. NSS 2016. Lecture Notes in Computer Science(), vol 9955. Springer, Cham. https://doi.org/10.1007/978-3-319-46298-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-46298-1_8
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46297-4
Online ISBN: 978-3-319-46298-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics