skip to main content
10.1145/3292500.3332267acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
tutorial

Deep Bayesian Mining, Learning and Understanding

Published:25 July 2019Publication History

ABSTRACT

This tutorial addresses the advances in deep Bayesian mining and learning for natural language with ubiquitous applications ranging from speech recognition to document summarization, text classification, text segmentation, information extraction, image caption generation, sentence generation, dialogue control, sentiment classification, recommendation system, question answering and machine translation, to name a few. Traditionally, "deep learning" is taken to be a learning process where the inference or optimization is based on the real-valued deterministic model. The "semantic structure" in words, sentences, entities, actions and documents drawn from a large vocabulary may not be well expressed or correctly optimized in mathematical logic or computer programs. The "distribution function" in discrete or continuous latent variable model for natural language may not be properly decomposed or estimated. This tutorial addresses the fundamentals of statistical models and neural networks, and focus on a series of advanced Bayesian models and deep models including hierarchical Dirichlet process, Chinese restaurant process, hierarchical Pitman-Yor process, Indian buffet process, recurrent neural network, long short-term memory, sequence-to-sequence model, variational auto-encoder, generative adversarial network, attention mechanism, memory-augmented neural network, skip neural network, stochastic neural network, predictive state neural network, policy neural network. We present how these models are connected and why they work for a variety of applications on symbolic and complex patterns in natural language. The variational inference and sampling method are formulated to tackle the optimization for complicated models. The word and sentence embeddings, clustering and co-clustering are merged with linguistic and semantic constraints. A series of case studies are presented to tackle different issues in deep Bayesian mining, learning and understanding. At last, we will point out a number of directions and outlooks for future studies.

Skip Supplemental Material Section

Supplemental Material

p3197-chien_part1.mp4

mp4

5.1 GB

p3197-chien_part2.mp4

mp4

6.8 GB

References

  1. Jen-Tzung Chien. 2015a. Hierarchical Pitman-Yor-Dirichlet Language Model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, 8 (2015), 1259--1272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jen-Tzung Chien. 2015b. Laplace Group Sensing for Acoustic Models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, 5 (2015), 909--922.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jen-Tzung Chien. 2016. Hierarchical Theme and Topic Modeling. IEEE Transactions on Neural Networks and Learning Systems, Vol. 27, 3 (2016), 565--578.Google ScholarGoogle ScholarCross RefCross Ref
  4. Jen-Tzung Chien. 2018. Bayesian Nonparametric Learning for Hierarchical and Sparse Topics. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 26, 2 (2018), 422--435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jen-Tzung Chien and Ying-Lan Chang. 2014. Bayesian Sparse Topic Model. Journal of Signal Processing Systems , Vol. 74, 3 (2014), 375--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jen-Tzung Chien and Chuang-Hua Chueh. 2011. Dirichlet Class Language Models for Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing , Vol. 19, 3 (2011), 482--495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jen-Tzung Chien and Chuang-Hua Chueh. 2012. Topic-Based Hierarchical Segmentation. IEEE Transactions on Audio, Speech, and Language Processing , Vol. 20, 1 (2012), 55--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jen-Tzung Chien and Yuan-Chu Ku. 2016. Bayesian Recurrent Neural Network for Language Modeling. IEEE Transactions on Neural Networks and Learning Systems , Vol. 27, 2 (2016), 361--374.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jen-Tzung Chien and Kuan-Ting Kuo. 2017. Variational Recurrent Neural Networks for Speech Separation. In Proc. of Annual Conference of International Speech Communication Association . 1193--1197.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jen-Tzung Chien and Chao-Hsi Lee. 2018. Deep Unfolding for Topic Models. IEEE Transactions on Pattern Analysis and Machine Intelligence , Vol. 40, 2 (2018), 318--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jen-Tzung Chien and Ting-An Lin. 2018. Supportive Attention in End-To-End Memory Networks. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jen-Tzung Chien and Chun-Wei Wang. 2019. Variational and Hierarchical Recurrent Autoencoder. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing . 3202--3206.Google ScholarGoogle ScholarCross RefCross Ref
  13. Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C Courville, and Yoshua Bengio. 2015. A Recurrent Latent Variable Model for Sequential Data. In Advances in Neural Information Processing Systems. 2980--2988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language Modeling with Gated Convolutional Networks. In Proc. of International Conference on Machine Learning. 933--941. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  16. Marco Fraccaro, Sø ren Kaae Sø nderby, Ulrich Paquet, and Ole Winther. 2016. Sequential Neural Models with Stochastic Layers. In Advances in Neural Information Processing Systems. 2199--2207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yarin Gal and Zoubin Ghahramani. 2016. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In Advances in Neural Information Processing Systems. 1019--1027. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional Sequence to Sequence Learning. In Proc. of International Conference on Machine Learning. 1243--1252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems. 2672--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Ke, and Yoshua Bengio. 2017. Z-Forcing: Training Stochastic Recurrent Networks. In Advances in Neural Information Processing Systems 30. 6713--6723. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In Proc. of International Conference on Machine Learning. 369--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Jang, S. Gu, and B. Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. In Proc. of International Conference on Learning Representations .Google ScholarGoogle Scholar
  23. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A Convolutional Neural Network for Modelling Sentences. In Proc. of Annual Meeting of the Association for Computational Linguistics. 655--665.Google ScholarGoogle ScholarCross RefCross Ref
  24. Che-Yu Kuo and Jen-Tzung Chien. 2018. Markov Recurrent Neural Networks. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  25. Isabeau Prémont-Schwarz, Alexander Ilin, Tele Hao, Antti Rasmus, Rinu Boney, and Harri Valpola. 2017. Recurrent Ladder Networks. In Advances in Neural Information Processing Systems. 6011--6021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-Supervised Learning with Ladder Networks. In Advances in Neural Information Processing Systems. 3546--3554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Iulian V. Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues. In Proc. of AAAI Conference on Artificial Intelligence. 3295--3301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jen-Chieh Tsai and Jen-Tzung Chien. 2017. Adversarial Domain Separation and Adaptation. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  29. Kai-Wei Tsou and Jen-Tzung Chien. 2017. Memory Augmented Neural Network for Source Separation. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  30. Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural Discrete Representation Learning. In Advances in Neural Information Processing Systems. 6309--6318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems. 5998--6008. Google ScholarGoogle Scholar
  32. Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris Kitani, and J Bagnell. 2017. Predictive-State Decoders: Encoding the Future into Recurrent Networks. In Advances in Neural Information Processing Systems. 1172--1183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Shinji Watanabe and Jen-Tzung Chien. 2015. Bayesian Speech and Language Processing .Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Shi Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, and Wang-chun Woo. 2015. Convolutional LS™ Network: A Machine Learning Approach for Precipitation Nowcasting. In Advances in Neural Information Processing Systems. 802--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Dong Yu, Geoffrey Hinton, Nelson Morgan, Jen-Tzung Chien, and Shigeki Sagayama. 2011. Introduction to the Special Section on Deep Learning for Speech and Language Processing. IEEE Transactions on Audio, Speech, and Language Processing , Vol. 20, 1 (2011), 4--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Prof. of AAAI Conference on Artificial Intelligence, Vol. 31. 2852--2858. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deep Bayesian Mining, Learning and Understanding

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
          July 2019
          3305 pages
          ISBN:9781450362016
          DOI:10.1145/3292500

          Copyright © 2019 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 July 2019

          Check for updates

          Qualifiers

          • tutorial

          Acceptance Rates

          KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader