A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection

Xu, Chenglin; Xie, Lei; Xiao, Xiong

doi:10.1007/s11265-017-1289-8

A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection

Published: 30 September 2017

Volume 90, pages 1063–1075, (2018)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Chenglin Xu^1,2,
Lei Xie¹ &
Xiong Xiao²

853 Accesses
12 Citations
Explore all metrics

Abstract

Recovering sentence boundaries from speech and its transcripts is essential for readability and downstream speech and language processing tasks. In this paper, we propose to use deep recurrent neural network to detect sentence boundaries in broadcast news by modeling rich prosodic and lexical features extracted at each inter-word position. We introduce an unsupervised word embedding to represent word identity, learned from the Continuous Bag-of-Words (CBOW) model, into sentence boundary detection task as an effective feature. The word embedding contains syntactic information that is essential for this detection task. In addition, we propose another two low-dimensional word embeddings derived from a neural network that includes class and context information to represent words by supervised learning: one is extracted from the projection layer, the other one comes from the last hidden layer. Furthermore, we propose a deep bidirectional Long Short Term Memory (LSTM) based architecture with Viterbi decoding for sentence boundary detection. Under this framework, the long-range dependencies of prosodic and lexical information in temporal sequences are modeled effectively. Compared with previous state-of-the-art DNN-CRF method, the proposed LSTM approach reduces 24.8% and 9.8% relative NIST SU error in reference and recognition transcripts, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

https://catalog.ldc.upenn.edu/LDC2005T24.
http://www.darpa.mil/iao/EARS.htm.
http://www.nist.gov/speech/tests/rt/.
In word2vec tool, the energy function is simply defined as E(A, C) = −(A ⋅ C), where A is the vector of a word, and C is the sum of context vectors of A. Then the probability \(p(A|C)=\frac {e^{-E(A,C)}}{{\sum }_{v=1}^{V}e^{-E(W_{v},C)}}\).
http://mattmahoney.net/dc/text8.zip.
https://catalog.ldc.upenn.edu/LDC2004T12.
https://catalog.ldc.upenn.edu/LDC2005T24.
https://code.google.com/p/word2vec/.
The corresponding Wikipedia data set with sentence boundaries is used.
LDC2005S16, LDC2004S08 for speech data and LDC2005T24, LDC2004T12 for reference transcriptions.
http://www.itl.nist.gov/iad/mig/tests/rt/2003-fall/.
See http://www.itl.nist.gov/iad/894.01/tests/rt/2004-fall/.
Available at: http://www.cs.waikato.ac.nz/ml/weka/index.html.
Available at: https://code.google.com/p/crfpp/.
http://sourceforge.net/projects/currennt/.
The initial dimension parameter of the tool is equal to each vector’s size. The perplexity parameter is 50.

References

Yu, D., & Deng, L. (2014). Automatic speech recognition: a deep learning approach. New York: Springer.
MATH Google Scholar
Jones, D.A., Wolf, F., Gibson, E., Williams, E., Fedorenko, E., Reynolds, D.A., & Zissman, M.A. (2003). Measuring the readability of automatic speech-to-text transcripts. In INTERSPEECH.
Kahn, J.G., Ostendorf, M., & Chelba, C. (2004). Parsing conversational speech using enhanced segmentation. In Proceedings of HLT-NAACL 2004: short papers (pp. 125–128). Association for Computational Linguistics.
Favre, B., Grishman, R., Hillard, D., Ji, H., Hakkani-Tur, D., & Ostendorf, M. (2008). Punctuating speech for information extraction. In ICASSP IEEE international conference on acoustics, speech and signal processing, 2008 (pp. 5013–5016). IEEE.
Mrozinski, J., Whittaker, E.W., Chatain, P., & Furui, S. (2006). Automatic sentence segmentation of speech for automatic summarization. In ICASSP 2006 proceedings ieee international conference on acoustics, speech and signal processing, 2006, (Vol. 1 pp. I–I). IEEE (p. 2006).
Shriberg, E., Stolcke, A., Hakkani-Tür, D., & Tür, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1), 127–154.
Article Google Scholar
Wang, X., Xie, L., Lu, M., CHNG, E.S., & Li, H. (2012). Broadcast news story segmentation using conditional random fields and multimodal features. IEICE TRANSACTIONS on Information and Systems, 95(5), 1206–1215.
Article Google Scholar
Xu, J., Zens, R., & Ney, H. (2005). Sentence segmentation using IBM word alignment model 1. In Proceedings of EAMT (pp. 280–287).
Matusov, E., Hillard, D., Magimai-Doss, M., Hakkani-Tür, D.Z., Ostendorf, M., & Ney, H. (2007). Improving speech translation with automatic boundary prediction. In INTERSPEECH, (Vol. 7 pp. 2449–2452).
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., & et al. (2012). Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Article Google Scholar
Graves, A., Mohamed, A.R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6645–6649). IEEE.
Zheng, X., Chen, H., & Xu, T. (2013). Deep learning for chinese word segmentation and POS tagging. In EMNLP (pp. 647–657).
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug), 2493–2537.
MATH Google Scholar
Xu, C., Xie, L., Huang, G., Xiao, X., Chng, E.S., & Li, H. (2014). A deep neural network approach for sentence boundary detection in broadcast news. In Fifteenth annual conference of the international speech communication association.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Tseng, C., Pin, S., Lee, Y., Wang, H., & Chen, Y. (2005). Fluent speech prosody: Framework and modeling. Speech Communication, 46(3), 284–309.
Article Google Scholar
Mo, Y. (2008). Duration and intensity as perceptual cues for naïve listeners’ prominence and boundary perception. In Proceedings of the 4th speech prosody conference, Campinas, Brazil (pp. 739–742).
Xie, L. (2008). Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news. Multimedia Systems, 14(4), 237–253.
Article Google Scholar
Mahrt, T., Cole, J., Fleck, M., & Hasegawa-Johnson, M. (2012). F0 and the perception of prominence. In Thirteenth annual conference of the international speech communication association.
Liu, Y., Shriberg, E., Stolcke, A., Hillard, D., Ostendorf, M., & Harper, M. (2006). Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1526–1540.
Article Google Scholar
Xie, L., Xu, C., & Wang, X. (2012). Prosody-based sentence boundary detection in chinese broadcast news. In 2012 8th international symposium on chinese spoken language processing (ISCSLP) (pp. 261–265). IEEE.
Haase, M., Kriechbaum, W., Möhler, G., & Stenzel, G. (2001). Deriving document structure from prosodic cues. In Seventh European conference on speech communication and technology.
Gavalda, M., & Zechner, K. (1997). High performance segmentation of spontaneous speech using part of speech and trigger word information. In Proceedings of the fifth conference on applied natural language processing (pp. 12–15). Association for Computational Linguistics.
Lu, W., & Ng, H.T. (2010). Better punctuation prediction with dynamic conditional random fields. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 177–186). Association for Computational Linguistics.
Ueffing, N., Bisani, M., & Vozila, P. (2013). Improved models for automatic punctuation prediction for spoken and written text. In INTERSPEECH (pp. 3097–3101).
Xu, C., Xie, L., & Fu, Z. (2014). Sentence boundary detection in Chinese broadcast news using conditional random fields and prosodic features. In 2014 IEEE China summit and international conference on signal and information processing (ChinaSIP) (pp. 37—41). IEEE.
Hirschberg, J., & Nakatani, C.H. (1996). A prosodic analysis of discourse segments in direction-giving monologues. In Proceedings of the 34th annual meeting on association for computational linguistics (pp. 286–293). Association for Computational Linguistics.
Fung, J.G., Hakkani-Tür, D., Magimai-Doss, M., Shriberg, E., Cuendet, S., & Mirghafori, N. (2007). Cross-linguistic analysis of prosodic features for sentence segmentation. In Eighth annual conference of the international speech communication association.
Zimmerman, M., Hakkani-Tür, D., Fung, J., Mirghafori, N., Gottlieb, L., Shriberg, E., & Liu, Y. (2006). The ICSI + multilingual sentence segmentation system. International Computer Science Inst Berkeley, CA.
Kolá, J., & Liu, Y. (2010). Automatic sentence boundary detection in conversational speech: a cross-lingual evaluation on English and Czech. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 5258–5261). IEEE.
Stolcke, A., & Shriberg, E. (1996). Automatic linguistic segmentation of conversational speech. In Proceedings of the fourth international conference on spoken language, 1996. ICSLP 96, (Vol. 2 pp. 1005–1008). IEEE.
Stevenson, M., & Gaizauskas, R. (2000). Experiments on sentence boundary detection. In Proceedings of the sixth conference on applied natural language processing (pp. 84–89). Association for Computational Linguistics.
Beeferman, D., Berger, A., & Lafferty, J. (1998). Cyberpunc: a lightweight punctuation annotation system for speech. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (Vol. 2 pp. 689–692). IEEE.
Mori, S. (2002). An automatic sentence boundary detector based on a structured language model. In Seventh international conference on spoken language processing.
Gravano, A., Jansche, M., & Bacchiani, M. (2009). Restoring punctuation capitalization in transcribed speech. In IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009 (pp. 4741–4744). IEEE.
Batista, F., Moniz, H., Trancoso, I., & Mamede, N. (2012). Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 474–485.
Article Google Scholar
Gotoh, Y., & Renals, S. (2000). Sentence boundary detection in broadcast speech transcripts.
Christensen, H., Gotoh, Y., & Renals, S. (2001). Punctuation annotation using statistical prosody models. In ISCA tutorial and research workshop (ITRW) on prosody in speech recognition and understanding.
Kim, J.-H., & Woodland, P.C. (2001). The use of prosody in a combined system for punctuation generation and speech recognition. In Seventh European conference on speech communication and technology.
Graves, A. (2012). Supervised sequence labelling with recurrent neural networks Vol. 385. Heidelberg: Springer.
Book MATH Google Scholar
Gers, F.A., Schraudolph, N.N., & Schmidhuber, J. (2002). Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, 3(Aug), 115–143.
MathSciNet MATH Google Scholar
Schuster, M., & Paliwal, K.K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681.
Article Google Scholar
Williams, R.J., & Zipser, D. (1995). Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropagation: Theory, Architectures, and Applications, 1, 433–486.
Google Scholar
Huang, Z., Chen, L., & Harper, M. (2006). An open source prosodic feature extraction tool. In Proceedings of the language resources and evaluation conference (LREC).
Gao, B., Bian, J., & Liu, T.-Y. (2014). Wordrep: a benchmark for research on learning word representations. arXiv:1407.1640.
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137–1155.
MATH Google Scholar
Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: a deep learning approach. In Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 513–520).
Tur, G., Deng, L., Hakkani-Tür, D., & He, X. (2012). Towards deeper understanding: deep convex networks for semantic utterance classification. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5045–5048). IEEE.
Morin, F., & Bengio, Y. (2005). Hierarchical probabilistic neural network language model. In Aistats, (Vol. 5 pp. 246–252).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
Yu, D., Eversole, A., Seltzer, M., Yao, K., Huang, Z., Guenter, B., Kuchaiev, O., Zhang, Y., Seide, F., Wang, H., & et al. (2014). An introduction to computational networks and the computational network toolkit. Microsoft Technical Report MSR-TR-2014–112.
Strassel, S. (2004). Simple metadata annotation specification. V6.2.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.
MATH Google Scholar
Koehn, P. (2004). Statistical significance tests for machine translation evaluation. In EMNLP (pp. 388–395).

Download references

Author information

Authors and Affiliations

School of Computer Science, Northwestern Polytechnical University, Xi’an, China
Chenglin Xu & Lei Xie
Temasek Laboratories@NTU, Nanyang Technological University, Singapore, Singapore
Chenglin Xu & Xiong Xiao

Authors

Chenglin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xiong Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenglin Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, C., Xie, L. & Xiao, X. A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection. J Sign Process Syst 90, 1063–1075 (2018). https://doi.org/10.1007/s11265-017-1289-8

Download citation

Received: 25 April 2017
Revised: 29 August 2017
Accepted: 18 September 2017
Published: 30 September 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11265-017-1289-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection

Abstract

Access this article

Similar content being viewed by others

Deletion-Based Sentence Compression Using Bi-enc-dec LSTM

Is Local Window Essential for Neural Network Based Chinese Word Segmentation?

Multi-class Short Text Classification Using Ensemble of Deep Learning Classifier

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection

Abstract

Access this article

Similar content being viewed by others

Deletion-Based Sentence Compression Using Bi-enc-dec LSTM

Is Local Window Essential for Neural Network Based Chinese Word Segmentation?

Multi-class Short Text Classification Using Ensemble of Deep Learning Classifier

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation