Skip to main content
Log in

A hybrid neural network hidden Markov model approach for automatic story segmentation

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

We propose a hybrid neural network hidden Markov model (NN-HMM) approach for automatic story segmentation. A story is treated as an instance of an underlying topic (a hidden state) and words are generated from the distribution of the topic. The transition from one topic to another indicates a story boundary. Different from the traditional HMM approach, in which the emission probability of each state is calculated from a topic-dependent language model, we use deep neural network (DNN) to directly map the word distribution into topic posterior probabilities. DNN is known to be able to learn meaningful continuous features for words and hence has better discriminative and generalization capability than n-gram models. Specifically, we investigate three neural network structures: a feed-forward neural network, a recurrent neural network with long short-term memory cells (LSTM-RNN) and a modified LSTM-RNN with multi-task learning ability. Experimental results on the TDT2 corpus show that the proposed NN-HMM approach outperforms the traditional HMM approach significantly and achieves state-of-the-art performance in story segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abdel-Hamid O, Deng L, Yu D (2013) Exploring convolutional neural network structures and optimization techniques for speech recognition. In: Proceedings of INTERSPEECH, pp 3366–3370

  • Banerjee S, Rudnicky AI (2006) A texttiling based approach to topic boundary detection in meetings. In: Proceedings of INTERSPEECH

  • Beeferman D, Berger A, Lafferty J (1999) Statistical models for text segmentation. Mach Learn 34(1–3):177–210

    Article  MATH  Google Scholar 

  • Blei DM, Moreno PJ (2001) Topic segmentation with an aspect hidden Markov model. In: Proceedings of SIGIR, pp 343–348

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Bouchekif A, Damnati G, Charlet D (2014) Intra-content term weighting for topic segmentation. In: Proceedings of ICASSP, pp 7113–7117

  • Bourlard HA, Morgan N (2012) Connectionist speech recognition: a hybrid approach, vol 247. Springer, Berlin

    Google Scholar 

  • Chaisorn L, Chua TS, Lee CH (2003) A multi-modal approach to story segmentation for news video. World Wide Web Internet Web Inf Syst 6(2):187–208

    Article  Google Scholar 

  • Charlet D, Damnati G, Bouchekif A, Douib A (2015) Fusion of speaker and lexical information for topic segmentation: a co-segmentation approach. In: Proceedings of ICASSP, pp 5261–5265

  • Chen H, Guo B, Yu Z, Han Q (2016) Toward real-time and cooperative mobile visual sensing and sharing. In: Proceedings of INFOCOM, pp 1–9

  • Chen D, Manning CD (2014) A fast and accurate dependency parser using neural networks. In: EMNLP, pp 740–750

  • Chunwijitra V, Chotimongkol A, Wutiwiwatchai C (2016) A hybrid input-type recurrent neural network for LVCSR language modeling. Eurasip J Audio Speech Music Process 1:15

    Article  Google Scholar 

  • Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of ICML, pp 160–167

  • Cullar MP, Delgado M, Pegalajar MC (2005) An application of non-linear programming to train recurrent neural networks in time series prediction problems. In: Proceedings of ICEIS, pp 35–42

  • Damavandi B, Kumar S, Shazeer N, Bruguier A (2016) Nn-grams: Unifying neural network and n-gram language models for speech recognition. In: Proceedings of INTERSPEECH, pp 3499–3503

  • Eisenstein J, Barzilay R (2008) Bayesian unsupervised topic segmentation. In: Proceedings of EMNLP, pp 334–343

  • Fiscus J, Doddington G, Garofolo J, Martin A (1999) NISTs 1998 topic detection and tracking evaluation (TDT2). In: Proceedings of the 1999 DARPA Broadcast News Workshop, pp 19–24

  • Fragkou P, Petridis V, Kehagias A (2004) A dynamic programming algorithm for linear text segmentation. J Intell Inf Syst 23(2):179–197

    Article  MATH  Google Scholar 

  • Ghosh S, Vinyals O, Strope B, Roy S, Dean T, Heck L (2016) Contextual LSTM (CLSTM) models for large scale NLP tasks. In: Proceedings of DL-KDD

  • Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: Proceedings of ICASSP, pp 6645–6649

  • Grezl F, Karafiat M, Vesely K (2014) Adaptation of multilingual stacked bottle-neck neural network structure for new language. In: Proceedings of ICASSP, pp 7654–7658

  • Haidar M, Kurimo M (2016) Recurrent neural network language model with incremental updated context information generated using bag-of-words representation. In: Proceedings of INTERSPEECH

  • Hearst MA (1997) Texttiling: segmenting text into multi-paragraph subtopic passages. Comput Linguist 23(1):33–64

    Google Scholar 

  • Heinonen O (1998) Optimal multi-paragraph text segmentation by dynamic programming. In: Proceedings of ACL, pp 1484–1486

  • Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of SIGIR, pp 50–57

  • Huang Z, Li J, Siniscalchi SM, Chen IF, Wu J, Lee CH (2015) Rapid adaptation for deep neural networks through multi-task learning. In: Proceedings of INTERSPEECH, pp 3625–3629

  • James A (2002) Introduction to topic detection and tracking. Topic detection and tracking, pp 1–16

  • Karypis G (2002) Cluto—a clustering toolkit. Tech. Rep, DTIC Document

  • Kim Y (2014) Convolutional neural networks for sentence classification. In: EMNLP, pp 1746–1751

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of EMNLP, pp 388–395

  • Kumar G, FD’Haro L (2015) Deep autoencoder topic model for short texts. In: Proceedings of IWES

  • Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Proceedings of national conference on artificial intelligence, pp 2267–2273

  • Larochelle H, Lauly S (2012) A neural autoregressive topic model. In: Proceedings of NIPS, pp 2717–2725

  • Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of ICML, pp 1188–1196

  • Lee L, Chen B (2005) Spoken document understanding and organization. Signal Process Mag IEEE 22(5):42–60

    Article  Google Scholar 

  • Li J, Cheng JH, Shi JY, Huang F (2012) Advances in computer science and information engineering. Springer, Berlin

    Google Scholar 

  • Liu X, Gao J, He X, Deng L, Duh K, Wang YY (2015) Representation learning using multi-task deep neural networks for semantic classification and information retrieval. In: Proceedings of HLT, pp 912–921

  • Li C, Wang H, Zhang Z, Sun A, Ma Z (2016) Topic modeling for short texts with auxiliary word embeddings. In: Proceedings of SIGIR, pp 165–174

  • Lu M, Leung CC, Xie L, Ma B, Li H (2011a) Probabilistic latent semantic analysis for broadcast news story segmentation. In: Proceedings of INTERSPEECH, pp 1109–1112

  • Lu M, Zheng L, Leung CC, Xie L, Ma B, Li H (2011b) Broadcast news story segmentation using probabilistic latent semantic analysis and Laplacian eigenmaps. In: Proceedings of APSIPA, pp 356–360

  • Malioutov I, Barzilay R (2006) Minimum cut model for spoken lecture segmentation. In: Proceedings of ACL, pp 25–32

  • Malioutov I, Parkand A, Barzilay R, Glass J (2007) Making sense of sound: unsupervised topic segmentation over acoustic input. In: Proceedings of ACL, p 504

  • Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv:1301.3781 (preprint)

  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013b) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119

    Google Scholar 

  • Rabiner LR, Juang BH (1986) An introduction to hidden Markov models. ASSP Mag IEEE 3(1):4–16

    Article  Google Scholar 

  • Rau LF, Jacobs PS, Zernik U (1989) Information extraction and text summarization using linguistic knowledge acquisition. Inf Process Manag 25(4):419–428

    Article  Google Scholar 

  • Reynar JC (1994) An automatic method of finding topic boundaries. In: Proceedings of ACL, pp 331–333

  • Rosenberg A, Hirschberg J (2006) Story segmentation of broadcast news in English, Mandarin and Arabic. In: Proceedings of HLT, pp 125–128

  • Schultz T, Waibel A (2001) Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Commun 35(1):31–51

    Article  MATH  Google Scholar 

  • Seltzer M, Droppo J (2013) Multi-task learning in deep neural networks for improved phoneme recognition. In: Proceedings of ICASSP, pp 6965–6969

  • Sherman M, Liu Y (2008) Using hidden Markov models for topic segmentation of meeting transcripts. In: Proceedings of SLT, pp 185–188

  • Shriberg E, Stolcke A, Hakkani-Tur D, Tür G (2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun 32(1–2):127–154

    Article  Google Scholar 

  • Soderland S (1999) Learning information extraction rules for semi-structured and free text. Mach Learn 34(1–3):233–272

    Article  MATH  Google Scholar 

  • Sundermeyer M, Schlter R, Ney H (2012) LSTM neural networks for language modeling. In: Proceedings of INTERSPEECH, pp 194–197

  • Tan T, Qian Y, Yu D, Kundu S, Lu L, Sim KC, Xiao X, Zhang Y (2016) Speaker-aware training of LSTM-RNNS for acoustic modelling. In: Proceedings of ICASSP, pp 5280–5284

  • Tian F, Gao B, He D, Liu TY (2016) Sentence level recurrent topic model: letting topics speak for themselves. arXiv:160402038 (preprint)

  • Van Mulbregt P, Carp I, Gillick L, Lowe S, Yamron J (1998) Text segmentation and topic tracking on broadcast news via a hidden Markov model approach. In: Proceedings of ICSLP

  • Wan L, Zhu L, Fergus R (2012) A hybrid neural network-latent topic model. In: Proceedings of AISTATS, vol 12, pp 1287–1294

  • Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Netw 1(4):339–356

    Article  Google Scholar 

  • Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280

    Article  Google Scholar 

  • Wu Z, Valentinibotinhao C, Watts O, King S (2015) Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis. In: Proceedings of ICASSP, pp 4460–4464

  • Xie L, Yang YL, Liu ZQ (2011) On the effectiveness of subwords for lexical cohesion based story segmentation of Chinese broadcast news. Inf Sci 181(13):2873–2891

    Article  Google Scholar 

  • Xie L, Zheng L, Liu Z, Zhang Y (2012) Laplacian eigenmaps for automatic story segmentation of broadcast news. Audio Speech Lang Proces IEEE Trans 20(1):276–289

    Article  Google Scholar 

  • Xu K, Xie L, Yao K (2016) Investigating LSTM for punctuation prediction. In: Proceedings of ISCSLP, pp 5280–5284

  • Yamron JP, Carp I, Gillick L, Lowe S, van Mulbregt P (1998) A hidden Markov model approach to text segmentation and event tracking. In: Proceedings of ICASSP, pp 333–336

  • Yang C, Xie L, Zhou X (2014) Unsupervised broadcast news story segmentation using distance dependent Chinese restaurant processes. In: Proceedings of ICASSP, pp 4062–4066

  • Yu D, Deng L (2015) Automatic speech recognition—a deep learning approach. Springer, Berlin

    MATH  Google Scholar 

  • Yu D, Seltzer ML, Li J, Huang JT, Seide F (2013) Feature learning in deep neural networks-studies on speech recognition tasks. arXiv:13013605 (preprint)

  • Zhang Z, Wang L, Kai A, Yamada T, Li W, Iwahashi M (2015) Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification. EURASIP J Audio Speech Music Process 1:12

    Article  Google Scholar 

  • Zhang Y, Chuangsuwanich E, Glass JR (2014) Extracting deep neural network bottleneck features using low-rank matrix factorization. In: Proceedings of ICASSP, pp 185–189

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (61571363), Aeronautical Science Foundation of China (20155553038 and 20155553040), Science and Technology on Avionics Integration Laboratory.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia Yu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, J., Xie, L., Xiao, X. et al. A hybrid neural network hidden Markov model approach for automatic story segmentation. J Ambient Intell Human Comput 8, 925–936 (2017). https://doi.org/10.1007/s12652-017-0501-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-017-0501-9

Keywords

Navigation