Abstract
Parameter Sharing (or weight sharing) is widely used in Neural Networks, such as Recursive Neural Networks (RvNNs) and its variants, to control model complexities and extract prior knowledge. The parameter sharing in RvNNs for language model assumes that non-leaf nodes in treebanks are generated by similar semantic compositionality, where hidden units of all the non-leaf nodes in RvNNs share model parameters. However, treebanks have several semantic levels with significantly different semantic compositionality. Accordingly, this leads to a poor classification performance if nodes in high semantic levels share the same parameters with those in low levels. In the paper, a novel parameter sharing strategy in a hierarchical manner is proposed over Long Short-Term Memory (LSTM) cells in Recursive Neural Networks, denoted as shLSTM-RvNN, in which weight connections in hidden units are clustered according to hierarchical semantic levels defined in Penn Treebank tagsets. Accordingly, the parameters in the same semantic level can be shared but those in different semantic levels should have different sets of connections weights. The proposed shLSTM-RvNN model is evaluated in benchmark data sets containing semantic compositionality. Empirical results show that the shLSTM-RvNN model increases classification accuracies but significantly reduces time complexities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rumelhart, D.E., Mcclelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations. Parallel Distributed Processing (1986)
Lang, K.J., Waibel, A.H., Hinton, G.E.: A time-delay neural network architecture for isolated word recognition. Neural Netw. 3(1), 23–43 (1990)
Lecun, Y.: Generalization and network design strategies. Connectionism in Perspective, pp. 143–155 (1989)
Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Robinson, A.J.: An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw. 5(2), 298–305 (1994)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2012)
Socher, R., Manning, C.D., Ng, A.Y.: Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop, pp. 1–9 (2010)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Clark, A., Fox, C., Lappin, S. (eds.).: The Handbook of Computational Linguistics and Natural Language Processing. Wiley, Malden (2013)
Zhu, X., Sobihani, P., Guo, H.: Long short-term memory over recursive structures. In: International Conference on Machine Learning, pp. 1604–1612 (2015)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of Association for Computational Linguistics (2015)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)
Gers, F.A., Schmidhuber, J.: Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference, vol. 3, pp. 189–194 (2000)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment Treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Bies, A., Ferguson, M., Katz, K., Macintyre, R., Tredinnick, V., Kim, G., Schasberger, B.: Bracketing guidelines for Treebank II style penn Treebank project. University of Pennsylvania, pp. 97–100 (1995)
De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report, Stanford University, pp. 338–345 (2008)
Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: Conference on Empirical Methods in Natural Language Processing, pp. 740–750 (2014)
Acknowledgments
This work was supported in part by the Natural Science Foundation of China under Contract 71331005 and in part by the State Key Research and Development Program of China under Contract 2016YFE0100300.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Li, F., Chi, M., Wu, D., Niu, J. (2017). Hierarchical Parameter Sharing in Recursive Neural Networks with Long Short-Term Memory. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10635. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-70096-0_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70095-3
Online ISBN: 978-3-319-70096-0
eBook Packages: Computer ScienceComputer Science (R0)