Abstract
This paper proposes a fast training method for graph classification based on a boosting algorithm and its application to sentimental analysis with input texts represented by graphs. Graph format is very suitable for representing texts structured with Natural Language Processing techniques such as morphological analysis, Named Entity Recognition, and parsing. A number of classification methods which represent texts as graphs have been proposed so far. However, many of them limit candidate features in advance because of quite large size of feature space. Instead of limiting search space in advance, we propose two approximation methods for learning of graph-based rules in a boosting. Experimental results on a sentimental analysis dataset show that our method contributes to improved training speed. In addition, the graph representation-based classification method exploits rich structural information of texts, which is impossible to be detected when using other simpler input formats, and shows higher accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In (Kudo et al. 2004), a weak classifier is defined to return \(-\alpha \) if \(g\not \subseteq x\). Considering the results of preliminary experiments, we decided to use the above definition instead.
- 2.
We may omit the iteration index j when no confusion can arise.
- 3.
With a slight modification, we can start searches from single node graphs so that the result may contain single node feature graphs.
- 4.
To convert the output of SENNA into tree format, we used Penn2Malt 0.2 (http://stp.lingfil.uu.se/~nivre/research/Penn2Malt.html) with the following options: head rules in (http://stp.lingfil.uu.se/~nivre/research/headrules.txt), deprel 1, and punctuation 1.
References
Arora, S., Mayfield, E., Rosé, C.P., Nyberg, E.: Sentiment classification using automatically extracted subgraph features. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 131–139 (2010)
Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of Seventh International Conference on Language Resources and Evaluation, pp. 2200–2204 (2010)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440–447 (2007)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Conference on Computational Learning Theory, pp. 144–152 (1992)
Collobert, R.: Deep learning for efficient discriminative parsing. In: International Conference on Artificial Intelligence and Statistics (2011)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Fei, H., Huan, J.: Structured sparse boosting for graph classification. ACM Trans. Knowl. Discov. Data 9, 1–22 (2014)
Frank, R.: The perceptron: A probabilistic model for information storage and organization in the brain. Psycholog. Rev. 65, 386–408 (1958)
Freund, Y.: The alternating decision tree algorithm. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 124–133 (1999)
Gee, K.R., Cook, D.J.: Text classification using graph-encoded linguistic elements. In: Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference, pp. 487–492 (2005)
Iwakura, T.: A boosting-based algorithm for classification of semi-structured text using frequency of substructures. In: Proceedings of 9th International Conference on Recent Advances in Natural Language Processing, pp. 319–326 (2013)
Iwakura, T., Okamoto, S.: A fast boosting-based learner for feature-rich tagging and chunking. In: Proceedings of Twelfth Conference on Computational Natural Language Learning, pp. 17–24 (2008)
Jiang, C., Coenen, F., Sanderson, R., Zito, M.: Text classification using graph mining-based feature extraction. Knowl-Bas. Syst. 23, 302–308 (2010)
Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328 (2003)
Kudo, T., Maeda, E., Matsumoto, Y.: An application of boosting to graph classification. Adv. Neural Inf. Process. Syst. 17, 729–736 (2004)
Kudo, T., Matsumoto, Y.: A boosting algorithm for classification of semi-structured text. In: Proceedings of 9th Conference on Empirical Methods in Natural Language Processing, pp. 301–308 (2004)
Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 301–311. Springer, Heidelberg (2005)
Okazaki, N.: Classias: a collection of machine-learning algorithms for classification (2009). http://www.chokkan.org/software/classias/
Pan, S., Wu, J., Zhu, X.: CogBoost: boosting for fast cost-sensitive graph classification. IEEE Trans. Knowl. Data Eng. 27, 2933–2946 (2015)
Pan, S., Wu, J., Zhu, X., Long, G., Zhang, C.: Boosting for graph classification with universum. Knowl. Inf. Syst. 47, 1–25 (2016)
Pan, S., Wu, J., Zhu, X., Zhang, C.: Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Trans. Cybern. 45, 940–954 (2015)
Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., Tsuda, K.: gBoost: a mathematical programming approach to graph classification and regression. Mach. Learn. 75, 69–89 (2009)
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)
Wu, J., Pan, S., Zhu, X., Cai, Z.: Boosting for multi-graph classification. IEEE Trans. Cybern. 45, 430–443 (2015)
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of 2002 IEEE International Conference on Data Mining, pp. 721–724 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Yoshikawa, H., Iwakura, T. (2016). Fast Training of a Graph Boosting for Large-Scale Text Classification. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_53
Download citation
DOI: https://doi.org/10.1007/978-3-319-42911-3_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42910-6
Online ISBN: 978-3-319-42911-3
eBook Packages: Computer ScienceComputer Science (R0)