Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent

Liu, Jin; Lin, Li; Ren, Haoliang; Gu, Minghao; Wang, Jin; Youn, Geumran; Kim, Jeong-Uk

doi:10.1007/s00500-018-3181-2

Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent

Focus
Published: 24 April 2018

Volume 22, pages 6705–6717, (2018)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jin Liu¹,
Li Lin¹,
Haoliang Ren¹,
Minghao Gu¹,
Jin Wang^2,3,
Geumran Youn⁴ &
…
Jeong-Uk Kim⁴

443 Accesses
9 Citations
Explore all metrics

Abstract

Traditional statistical language model is a probability distribution over sequences of words. It has the problem of curse of dimensionality incurred by the exponentially increasing number of possible sequences of words in training text. To solve this issue, neural network language models are proposed by representing words in a distributed way. Due to computation cost on updating a large number of word vectors’ gradients, neural network model needs much training time to converge. To alleviate this problem, in this paper, we propose a gradient descent algorithm based on stochastic conjugate gradient to accelerate the convergence of the neural network’s parameters. To improve the performance of the neural language model, we also propose a negative sampling algorithm based on POS (part of speech) tagging, which can optimize the negative sampling process and improve the quality of the final language model. A novel evaluation model is also used with perplexity to demonstrate the performance of the improved language model. Experiment results prove the effectiveness of our novel methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Fig. 10

Revisiting Skip-Gram Negative Sampling Model with Rectification

Parts-of-speech tagging of Nepali texts with Bidirectional LSTM, Conditional Random Fields and HMM

Article 27 June 2023

Part of speech tagging: a systematic review of deep learning and machine learning approaches

Article Open access 24 January 2022

References

Bahl LR, Brown PF, Souza PVD, Mercer RL (1990) A tree-based statistical language model for natural language speech recognition. Read Speech Recogn 37(7):507–514
Article Google Scholar
Bengio Y, Ducharme R, Vincent P, Jauvin P, Jaz K (2003) A neural probabilistic language model. J Mach Learn Res (JMLR) 3:1137–1155
MATH Google Scholar
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT2010
Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
Google Scholar
Carneiro HCC, Franca FMG, Lima PMV (2015) Multilingual part-of-speech tagging with weightless neural networks. Neural Netw 66:11–21
Article Google Scholar
Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Proceedings of the international conference on neural information processing systems. MIT Press, pp 1646–1654
Feyzmahdavian HR, Aytekin A, Johansson M (2014) A delayed proximal gradient method with linear convergence rate. In: Proceedings of IEEE international workshop on machine learning for signal processing. IEEE, pp 1–6
Finogeev AG, Alexey G, Parygin Danila S, Finogeev Anton A (2017) The convergence computing model for big sensor data mining and knowledge discovery. Human Centric Comput Inf Sci 7(1):11–27
Article Google Scholar
Fu ZJ, Shu JG, Wang J, Liu YL, Lee SY (2015) Privacy-preserving smart similarity search based on simhash over encrypted data in cloud computing. J Internet Technol 16(3):453–460
Google Scholar
Goldberg Y, Levy O (2014) word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. Eprint Arxiv: 1-5
Hinton GE (1986) Learning distributed representations of concepts. In: Proceedings of the 8th Annual Conference of the Cognitive Science Society, pp 1–12
Huang F, Ahuja A, Downey D, Yang Y, Guo Y (2016) Learning representations for weakly supervised natural language processing tasks. Comput Linguist 40(1):85–120
Article Google Scholar
Jelinek F (1997) Statistical method for speech recognition. A Bradford Book, Cambridge
Google Scholar
Jiang M, Zhu X, Yuan B (1999) Smoothing algorithm of the task adaptation Chinses N-gram model. Tsinghua Univ (Sci&Tech)
Jurafsky D, Martin JH (2015) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition. Int J Comput Sci Eng 2(08):2670–2676
Google Scholar
Karpov A, Markov K, Kipyatkova I, Vazhenina D, Ronzhin A (2014) Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun 56:213–228
Article Google Scholar
Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI-16), pp 2741–2749
Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language model. In: Proceedings of the 31st international conference on machine learning, pp 595–604
Kombrink S, Mikolov T, Karafiát M, Burget L (2011) Recurrent neural network based language modeling in meeting recognition. In: Interspeech, Conference of the international speech communication association Florence, Italy, August, pp 2877–2880
Lafferty JD, Mccallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Eighteenth Int Conf Mach Learn 3(2):282–289
Google Scholar
Lebret R, Grangier D, Auli M (2016) Neural text generation from structured data with application to the biography domain. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1203–1213
Li Q, Chen YP (2010) Personalized text snippet extraction using statistical language models. Pattern Recogn 43(1):378–386
Article Google Scholar
Li M, Zhang T, Chen Y, Smola AJ (2014) Efficient mini-batch training for stochastic optimization. Acm Sigkdd Int Conf Knowl Discov Data Min 2014:661–670
Article Google Scholar
Mikolov T, Kombrink S, Deoras A, Burget L, Cernocky JH (2011) RNNLM—Recurrent Neural Network Language Modeling Toolkit. ASRU 2011
Ming Y, Zhao Y, Wu C, Li K, Yin J (2018) Distributed and asynchronous stochastic gradient descent with variance reduction. Neurocomputing 281:27–36
Article Google Scholar
Miyamoto Y, Cho K (2016) Gated word-character recurrent language model. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1992–1997
Mulder WD, Bethard S, Moens M (2015) A survey on the application of recurrent neural networks to statistical language modeling. Comput Speech Lang 30(1):61–98
Article Google Scholar
Nagata R, Takamura H, Neubig G (2017) Adaptive spelling error correction models for learner english. Procedia Comput Sci 112:474–483
Article Google Scholar
Nejja M, Yousfi A (2015) The context in automatic spell correction. Procedia Comput Sci 73:109–114
Article Google Scholar
Nguyen AT, Nguyen TN (2015) Graph-based statistical language model for code. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, pp 858–868
Novais EMD, Tadeu TD, Paraboni I (2010) Improved text generation using N-gram statistics. Springer, Berlin, Heidelberg 6433(1):316–325
Novoa J, Fredes J, Poblete V, Yoma NB (2017) Uncertainty weighting and propagation in DNN-HMM-based speech recognition. Comput Speech Lang 47:30–46
Article Google Scholar
Park KM, Cho HC, Rim HC (2011) Utilizing various natural language processing techniques for biomedical interaction extraction. J Inf Process Syst 7(3):459–472
Article Google Scholar
Peris A, Domingo M, Casacuberta F (2017) Interactive neural machine translation. Comput Speech Lang 45:201–220
Article Google Scholar
Peter J, Klakow D (1999) Compact maximum entropy language models. In: Proceedings of the IEEE workshop on automatic speech recognition & understanding
Phangtriastu MR, Harefa J, Tanoto DF (2017) Comparison between neural network and support vector machine in optical character recognition. Procedia Comput Sci 116:351–357
Article Google Scholar
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw Off J Int Neural Netw Soc 12(1):145–151
Article Google Scholar
Rosenfeld R, Carbonell J, Rudnicky A, Roukos S, Corporation I (1994) Adaptive statistical language modeling: a maximum entropy approach. Carnegie Mellon University, Pittsburgh
Book Google Scholar
Shen J, Shen J, Chen XF, Huang XY, Susilo Willy (2017) An efficient public auditing protocol with novel dynamic structure for cloud data. IEEE Trans Inf Forensics Secur 12:2402–2415. https://doi.org/10.1109/TIFS.2017.2705620
Article Google Scholar
Shtykh RY, Roman Y, Jin Q (2011) A human-centric integrated approach to web information search and sharing. Human Centric Comput Inf Sci 1(1):2–38
Article Google Scholar
Wang S, Schuurmans D, Peng F, Zhao Y (2005) Combining statistical language models via the latent maximum entropy principle. Mach Learn 60(1–3):229–250
Article Google Scholar
Wang L, Yang Y, Min R, Chakradhar S (2017) Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Netw 93:219–229
Article Google Scholar
Wei Z, Yao S, Liu L (2006) The convergence properties of some new conjugate gradient methods. Appl Math Comput 183(2):1341–1350
MathSciNet MATH Google Scholar
Xing EP, Ho Q, Dai W, Kim JK, Wei J (2015) Petuum: a new platform for distributed machine learning on big data. Acm Sigkdd Int Conf Knowl Discov Data Min 1(2):1335–1344
Google Scholar
Xu W, Rudnicky AI (2000) Can artificial neural networks learn language models? In: Sixth international conference on spoken language processing, ICSLP 2000/INTERSPEECH 2000, pp 202–205
Zamora-Martinez F, Frinken V, España-Boquera S, Castro-Bleda MJ, Fischer A, Bunke H (2014) Neural network language models for off-line handwriting recognition. Pattern Recogn 47(4):1642–1652
Article Google Scholar
Zamora E, Sossa H (2017) Dendrite morphological neurons trained by stochastic gradient descent. Neurocomputing 260:420–431
Article Google Scholar
Zinkevich M, Weimer M, Smola AJ, Li L (2011) Parallelized stochastic gradient descent. Adv Neural Inf Process Syst 23(23):2595–2603
Google Scholar

Download references

Acknowledgements

This work was supported by Shanghai Maritime University research fund project (20130469), and by State Oceanic Administration China research fund project (201305026), and by the open research fund of the Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education. Prof. Jeong-Uk Kim is the corresponding author.

Author information

Authors and Affiliations

College of Information Engineering, Shanghai Maritime University, Shanghai, China
Jin Liu, Li Lin, Haoliang Ren & Minghao Gu
Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education, Nanjing, China
Jin Wang
College of Information Engineering, Yangzhou University, Yangzhou, China
Jin Wang
Department of Electrical Engineering, Sangmyung University, Seoul, Korea
Geumran Youn & Jeong-Uk Kim

Authors

Jin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Li Lin
View author publications
You can also search for this author in PubMed Google Scholar
Haoliang Ren
View author publications
You can also search for this author in PubMed Google Scholar
Minghao Gu
View author publications
You can also search for this author in PubMed Google Scholar
Jin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Geumran Youn
View author publications
You can also search for this author in PubMed Google Scholar
Jeong-Uk Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeong-Uk Kim.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by G. Yi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Lin, L., Ren, H. et al. Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent. Soft Comput 22, 6705–6717 (2018). https://doi.org/10.1007/s00500-018-3181-2

Download citation

Published: 24 April 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s00500-018-3181-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent

Abstract

Access this article

Similar content being viewed by others

Revisiting Skip-Gram Negative Sampling Model with Rectification

Parts-of-speech tagging of Nepali texts with Bidirectional LSTM, Conditional Random Fields and HMM

Part of speech tagging: a systematic review of deep learning and machine learning approaches

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Building neural network language model with POS-based negative sampling and stochastic conjugate gradient descent

Abstract

Access this article

Similar content being viewed by others

Revisiting Skip-Gram Negative Sampling Model with Rectification

Parts-of-speech tagging of Nepali texts with Bidirectional LSTM, Conditional Random Fields and HMM

Part of speech tagging: a systematic review of deep learning and machine learning approaches

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation