Hierarchical Parameter Sharing in Recursive Neural Networks with Long Short-Term Memory

Li, Fengyu; Chi, Mingmin; Wu, Dong; Niu, Junyu

doi:10.1007/978-3-319-70096-0_60

Fengyu Li¹⁸,
Mingmin Chi¹⁹,
Dong Wu¹⁹ &
…
Junyu Niu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10635))

Included in the following conference series:

International Conference on Neural Information Processing

7867 Accesses

Abstract

Parameter Sharing (or weight sharing) is widely used in Neural Networks, such as Recursive Neural Networks (RvNNs) and its variants, to control model complexities and extract prior knowledge. The parameter sharing in RvNNs for language model assumes that non-leaf nodes in treebanks are generated by similar semantic compositionality, where hidden units of all the non-leaf nodes in RvNNs share model parameters. However, treebanks have several semantic levels with significantly different semantic compositionality. Accordingly, this leads to a poor classification performance if nodes in high semantic levels share the same parameters with those in low levels. In the paper, a novel parameter sharing strategy in a hierarchical manner is proposed over Long Short-Term Memory (LSTM) cells in Recursive Neural Networks, denoted as shLSTM-RvNN, in which weight connections in hidden units are clustered according to hierarchical semantic levels defined in Penn Treebank tagsets. Accordingly, the parameters in the same semantic level can be shared but those in different semantic levels should have different sets of connections weights. The proposed shLSTM-RvNN model is evaluated in benchmark data sets containing semantic compositionality. Empirical results show that the shLSTM-RvNN model increases classification accuracies but significantly reduces time complexities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rumelhart, D.E., Mcclelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations. Parallel Distributed Processing (1986)
Google Scholar
Lang, K.J., Waibel, A.H., Hinton, G.E.: A time-delay neural network architecture for isolated word recognition. Neural Netw. 3(1), 23–43 (1990)
Article Google Scholar
Lecun, Y.: Generalization and network design strategies. Connectionism in Perspective, pp. 143–155 (1989)
Google Scholar
Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992)
Article Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Robinson, A.J.: An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw. 5(2), 298–305 (1994)
Article Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2012)
Google Scholar
Socher, R., Manning, C.D., Ng, A.Y.: Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop, pp. 1–9 (2010)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Clark, A., Fox, C., Lappin, S. (eds.).: The Handbook of Computational Linguistics and Natural Language Processing. Wiley, Malden (2013)
Google Scholar
Zhu, X., Sobihani, P., Guo, H.: Long short-term memory over recursive structures. In: International Conference on Machine Learning, pp. 1604–1612 (2015)
Google Scholar
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of Association for Computational Linguistics (2015)
Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)
Google Scholar
Gers, F.A., Schmidhuber, J.: Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference, vol. 3, pp. 189–194 (2000)
Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)
Article Google Scholar
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment Treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Bies, A., Ferguson, M., Katz, K., Macintyre, R., Tredinnick, V., Kim, G., Schasberger, B.: Bracketing guidelines for Treebank II style penn Treebank project. University of Pennsylvania, pp. 97–100 (1995)
Google Scholar
De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report, Stanford University, pp. 338–345 (2008)
Google Scholar
Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: Conference on Empirical Methods in Natural Language Processing, pp. 740–750 (2014)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of China under Contract 71331005 and in part by the State Key Research and Development Program of China under Contract 2016YFE0100300.

Author information

Authors and Affiliations

Software School, Fudan University, Shanghai, China
Fengyu Li
School of Computer Science, Fudan University, Shanghai, China
Mingmin Chi, Dong Wu & Junyu Niu

Authors

Fengyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Mingmin Chi
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Niu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengyu Li .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, F., Chi, M., Wu, D., Niu, J. (2017). Hierarchical Parameter Sharing in Recursive Neural Networks with Long Short-Term Memory. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10635. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_60

Download citation

DOI: https://doi.org/10.1007/978-3-319-70096-0_60
Published: 26 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70095-3
Online ISBN: 978-3-319-70096-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics