Abstract
We describe a variant of Child-Sum Tree-LSTM deep neural network [16] fine-tuned for working with dependency trees and morphologically rich languages using the example of Polish. Fine-tuning included applying a custom regularization technique (zoneout, described by Krueger et al. [9], and further adapted for Tree-LSTMs) as well as using pre-trained word embeddings enhanced with sub-word information [2]. The system was implemented in PyTorch and evaluated on phrase-level sentiment labeling task as part of the PolEval competition.
Tomasz Korbak was funded by the Ministry of Science and Higher Education (Poland) research Grant DI2015010945 as part of Diamentowy Grant program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
Although recursive neural networks are used primarily in natural language processing, they were also applied in other domains, for instance scene parsing [13].
- 4.
The other variant described by [16], N-ary Tree-LSTM assumes that each node has at most N children and that children are linearly ordered, making it natural for (binary) dependency trees. The choice between these two variant really boils down to the syntactic theory we assume for representing sentences. As PolEval dataset assumes dependency grammar, we decided to go along with Child-Sum Tree-LSTM.
- 5.
References
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). http://dl.acm.org/citation.cfm?id=1953048.2021068
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990). https://doi.org/10.1207/s15516709cog1402_1
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 1st edn. Prentice Hall PTR, Upper Saddle River, NJ, USA (2000)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR Arxiv:1412.6980 (2014)
Krueger, D., et al.: Zoneout: Regularizing RNNS by randomly preserving hidden activations. CoRR ArXiv:1606.01305 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR ArXiv:1301.3781 (2013)
Przepiórkowski, A., Górski, R.L., Lewandowska-Tomaszczyk, B., Łaziński, M.: Towards the national corpus of polish. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC 2008. ELRA, Marrakech (2008)
Siegelman, H., Sontag, E.: Neural Nets are Universal Computing Devices. Sycon-91-08, Rutger University (1991)
Socher, R., Lin, C.C.Y., Ng, A.Y., Manning, C.D.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, Omnipress, USA, pp. 129–136 (2011). http://dl.acm.org/citation.cfm?id=3104482.3104499
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.P.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP (2013)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. CoRR ArXiv:1503.00075 (2015)
Thet, T.T., Na, J.C., Khoo, C.S.: Aspect-based sentiment analysis of movie reviews on discussion boards. J. Inf. Sci. 36(6), 823–848 (2010). https://doi.org/10.1177/0165551510388123
Waszczuk, J.: Harnessing the CRF complexity with domain-specific constraints. the case of morphosyntactic tagging of a highly inflected language. In: Proceedings of COLING 2012. The COLING 2012 Organizing Committee, pp. 2789–2804 (2012). http://aclanthology.coli.uni-saarland.de/pdf/C/C12/C12-1170.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Korbak, T., Żak, P. (2020). Fine-Tuning Tree-LSTM for Phrase-Level Sentiment Classification on a Polish Dependency Treebank. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2017. Lecture Notes in Computer Science(), vol 12598. Springer, Cham. https://doi.org/10.1007/978-3-030-66527-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-66527-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66526-5
Online ISBN: 978-3-030-66527-2
eBook Packages: Computer ScienceComputer Science (R0)