ABSTRACT
This tutorial addresses the advances in deep Bayesian mining and learning for natural language with ubiquitous applications ranging from speech recognition to document summarization, text classification, text segmentation, information extraction, image caption generation, sentence generation, dialogue control, sentiment classification, recommendation system, question answering and machine translation, to name a few. Traditionally, "deep learning" is taken to be a learning process where the inference or optimization is based on the real-valued deterministic model. The "semantic structure" in words, sentences, entities, actions and documents drawn from a large vocabulary may not be well expressed or correctly optimized in mathematical logic or computer programs. The "distribution function" in discrete or continuous latent variable model for natural language may not be properly decomposed or estimated. This tutorial addresses the fundamentals of statistical models and neural networks, and focus on a series of advanced Bayesian models and deep models including hierarchical Dirichlet process, Chinese restaurant process, hierarchical Pitman-Yor process, Indian buffet process, recurrent neural network, long short-term memory, sequence-to-sequence model, variational auto-encoder, generative adversarial network, attention mechanism, memory-augmented neural network, skip neural network, stochastic neural network, predictive state neural network, policy neural network. We present how these models are connected and why they work for a variety of applications on symbolic and complex patterns in natural language. The variational inference and sampling method are formulated to tackle the optimization for complicated models. The word and sentence embeddings, clustering and co-clustering are merged with linguistic and semantic constraints. A series of case studies are presented to tackle different issues in deep Bayesian mining, learning and understanding. At last, we will point out a number of directions and outlooks for future studies.
Supplemental Material
- Jen-Tzung Chien. 2015a. Hierarchical Pitman-Yor-Dirichlet Language Model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, 8 (2015), 1259--1272.Google ScholarDigital Library
- Jen-Tzung Chien. 2015b. Laplace Group Sensing for Acoustic Models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, 5 (2015), 909--922.Google ScholarDigital Library
- Jen-Tzung Chien. 2016. Hierarchical Theme and Topic Modeling. IEEE Transactions on Neural Networks and Learning Systems, Vol. 27, 3 (2016), 565--578.Google ScholarCross Ref
- Jen-Tzung Chien. 2018. Bayesian Nonparametric Learning for Hierarchical and Sparse Topics. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 26, 2 (2018), 422--435. Google ScholarDigital Library
- Jen-Tzung Chien and Ying-Lan Chang. 2014. Bayesian Sparse Topic Model. Journal of Signal Processing Systems , Vol. 74, 3 (2014), 375--389. Google ScholarDigital Library
- Jen-Tzung Chien and Chuang-Hua Chueh. 2011. Dirichlet Class Language Models for Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing , Vol. 19, 3 (2011), 482--495. Google ScholarDigital Library
- Jen-Tzung Chien and Chuang-Hua Chueh. 2012. Topic-Based Hierarchical Segmentation. IEEE Transactions on Audio, Speech, and Language Processing , Vol. 20, 1 (2012), 55--66. Google ScholarDigital Library
- Jen-Tzung Chien and Yuan-Chu Ku. 2016. Bayesian Recurrent Neural Network for Language Modeling. IEEE Transactions on Neural Networks and Learning Systems , Vol. 27, 2 (2016), 361--374.Google ScholarCross Ref
- Jen-Tzung Chien and Kuan-Ting Kuo. 2017. Variational Recurrent Neural Networks for Speech Separation. In Proc. of Annual Conference of International Speech Communication Association . 1193--1197.Google ScholarCross Ref
- Jen-Tzung Chien and Chao-Hsi Lee. 2018. Deep Unfolding for Topic Models. IEEE Transactions on Pattern Analysis and Machine Intelligence , Vol. 40, 2 (2018), 318--331. Google ScholarDigital Library
- Jen-Tzung Chien and Ting-An Lin. 2018. Supportive Attention in End-To-End Memory Networks. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing. 1--6.Google ScholarCross Ref
- Jen-Tzung Chien and Chun-Wei Wang. 2019. Variational and Hierarchical Recurrent Autoencoder. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing . 3202--3206.Google ScholarCross Ref
- Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C Courville, and Yoshua Bengio. 2015. A Recurrent Latent Variable Model for Sequential Data. In Advances in Neural Information Processing Systems. 2980--2988. Google ScholarDigital Library
- Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language Modeling with Gated Convolutional Networks. In Proc. of International Conference on Machine Learning. 933--941. Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Marco Fraccaro, Sø ren Kaae Sø nderby, Ulrich Paquet, and Ole Winther. 2016. Sequential Neural Models with Stochastic Layers. In Advances in Neural Information Processing Systems. 2199--2207. Google ScholarDigital Library
- Yarin Gal and Zoubin Ghahramani. 2016. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In Advances in Neural Information Processing Systems. 1019--1027. Google ScholarDigital Library
- Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional Sequence to Sequence Learning. In Proc. of International Conference on Machine Learning. 1243--1252. Google ScholarDigital Library
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems. 2672--2680. Google ScholarDigital Library
- Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Ke, and Yoshua Bengio. 2017. Z-Forcing: Training Stochastic Recurrent Networks. In Advances in Neural Information Processing Systems 30. 6713--6723. Google ScholarDigital Library
- Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In Proc. of International Conference on Machine Learning. 369--376. Google ScholarDigital Library
- E. Jang, S. Gu, and B. Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. In Proc. of International Conference on Learning Representations .Google Scholar
- Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A Convolutional Neural Network for Modelling Sentences. In Proc. of Annual Meeting of the Association for Computational Linguistics. 655--665.Google ScholarCross Ref
- Che-Yu Kuo and Jen-Tzung Chien. 2018. Markov Recurrent Neural Networks. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing. 1--6.Google ScholarCross Ref
- Isabeau Prémont-Schwarz, Alexander Ilin, Tele Hao, Antti Rasmus, Rinu Boney, and Harri Valpola. 2017. Recurrent Ladder Networks. In Advances in Neural Information Processing Systems. 6011--6021. Google ScholarDigital Library
- Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-Supervised Learning with Ladder Networks. In Advances in Neural Information Processing Systems. 3546--3554. Google ScholarDigital Library
- Iulian V. Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues. In Proc. of AAAI Conference on Artificial Intelligence. 3295--3301. Google ScholarDigital Library
- Jen-Chieh Tsai and Jen-Tzung Chien. 2017. Adversarial Domain Separation and Adaptation. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing. 1--6.Google ScholarCross Ref
- Kai-Wei Tsou and Jen-Tzung Chien. 2017. Memory Augmented Neural Network for Source Separation. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing. 1--6.Google ScholarCross Ref
- Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural Discrete Representation Learning. In Advances in Neural Information Processing Systems. 6309--6318. Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems. 5998--6008. Google Scholar
- Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris Kitani, and J Bagnell. 2017. Predictive-State Decoders: Encoding the Future into Recurrent Networks. In Advances in Neural Information Processing Systems. 1172--1183. Google ScholarDigital Library
- Shinji Watanabe and Jen-Tzung Chien. 2015. Bayesian Speech and Language Processing .Cambridge University Press. Google ScholarDigital Library
- Shi Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, and Wang-chun Woo. 2015. Convolutional LS™ Network: A Machine Learning Approach for Precipitation Nowcasting. In Advances in Neural Information Processing Systems. 802--810. Google ScholarDigital Library
- Dong Yu, Geoffrey Hinton, Nelson Morgan, Jen-Tzung Chien, and Shigeki Sagayama. 2011. Introduction to the Special Section on Deep Learning for Speech and Language Processing. IEEE Transactions on Audio, Speech, and Language Processing , Vol. 20, 1 (2011), 4--6. Google ScholarDigital Library
- Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Prof. of AAAI Conference on Artificial Intelligence, Vol. 31. 2852--2858. Google ScholarDigital Library
Index Terms
- Deep Bayesian Mining, Learning and Understanding
Recommendations
Neural Bayesian Information Processing
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementDeep learning is developed as a learning process from source inputs to target outputs where the inference or optimization is performed over an assumed deterministic model with deep structure. A wide range of temporal and spatial data in language and ...
Deep Bayesian Multimedia Learning
MM '20: Proceedings of the 28th ACM International Conference on MultimediaDeep learning has been successfully developed as a complicated learning process from source inputs to target outputs in presence of multimedia environments. The inference or optimization is performed over an assumed deterministic model with deep ...
Deep Bayesian Data Mining
WSDM '20: Proceedings of the 13th International Conference on Web Search and Data MiningThis tutorial addresses the fundamentals and advances in deep Bayesian mining and learning for natural language with ubiquitous applications ranging from speech recognition to document summarization, text classification, text segmentation, information ...
Comments