Skip to main content

Hierarchical Parameter Sharing in Recursive Neural Networks with Long Short-Term Memory

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10635))

Included in the following conference series:

  • 7867 Accesses

Abstract

Parameter Sharing (or weight sharing) is widely used in Neural Networks, such as Recursive Neural Networks (RvNNs) and its variants, to control model complexities and extract prior knowledge. The parameter sharing in RvNNs for language model assumes that non-leaf nodes in treebanks are generated by similar semantic compositionality, where hidden units of all the non-leaf nodes in RvNNs share model parameters. However, treebanks have several semantic levels with significantly different semantic compositionality. Accordingly, this leads to a poor classification performance if nodes in high semantic levels share the same parameters with those in low levels. In the paper, a novel parameter sharing strategy in a hierarchical manner is proposed over Long Short-Term Memory (LSTM) cells in Recursive Neural Networks, denoted as shLSTM-RvNN, in which weight connections in hidden units are clustered according to hierarchical semantic levels defined in Penn Treebank tagsets. Accordingly, the parameters in the same semantic level can be shared but those in different semantic levels should have different sets of connections weights. The proposed shLSTM-RvNN model is evaluated in benchmark data sets containing semantic compositionality. Empirical results show that the shLSTM-RvNN model increases classification accuracies but significantly reduces time complexities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rumelhart, D.E., Mcclelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations. Parallel Distributed Processing (1986)

    Google Scholar 

  2. Lang, K.J., Waibel, A.H., Hinton, G.E.: A time-delay neural network architecture for isolated word recognition. Neural Netw. 3(1), 23–43 (1990)

    Article  Google Scholar 

  3. Lecun, Y.: Generalization and network design strategies. Connectionism in Perspective, pp. 143–155 (1989)

    Google Scholar 

  4. Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992)

    Article  Google Scholar 

  5. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  6. Robinson, A.J.: An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw. 5(2), 298–305 (1994)

    Article  Google Scholar 

  7. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2012)

    Google Scholar 

  8. Socher, R., Manning, C.D., Ng, A.Y.: Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop, pp. 1–9 (2010)

    Google Scholar 

  9. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  10. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)

    Article  Google Scholar 

  11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  12. Clark, A., Fox, C., Lappin, S. (eds.).: The Handbook of Computational Linguistics and Natural Language Processing. Wiley, Malden (2013)

    Google Scholar 

  13. Zhu, X., Sobihani, P., Guo, H.: Long short-term memory over recursive structures. In: International Conference on Machine Learning, pp. 1604–1612 (2015)

    Google Scholar 

  14. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of Association for Computational Linguistics (2015)

    Google Scholar 

  15. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)

    Google Scholar 

  16. Gers, F.A., Schmidhuber, J.: Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference, vol. 3, pp. 189–194 (2000)

    Google Scholar 

  17. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)

    Article  Google Scholar 

  18. Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment Treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)

    Google Scholar 

  19. Bies, A., Ferguson, M., Katz, K., Macintyre, R., Tredinnick, V., Kim, G., Schasberger, B.: Bracketing guidelines for Treebank II style penn Treebank project. University of Pennsylvania, pp. 97–100 (1995)

    Google Scholar 

  20. De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report, Stanford University, pp. 338–345 (2008)

    Google Scholar 

  21. Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: Conference on Empirical Methods in Natural Language Processing, pp. 740–750 (2014)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of China under Contract 71331005 and in part by the State Key Research and Development Program of China under Contract 2016YFE0100300.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fengyu Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Li, F., Chi, M., Wu, D., Niu, J. (2017). Hierarchical Parameter Sharing in Recursive Neural Networks with Long Short-Term Memory. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10635. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70096-0_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70095-3

  • Online ISBN: 978-3-319-70096-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics