Skip to main content

Tipster: A Topic-Guided Language Model for Topic-Aware Text Segmentation

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13247))

Included in the following conference series:

  • 2932 Accesses

Abstract

The accurate segmentation and structural topics of plain documents not only meet people’s reading habit, but also facilitate various downstream tasks. Recently, some works have consistently given positive hints that text segmentation and segment topic labeling could be regarded as a mutual task, and cooperating with word distributions has the potential to model latent topics in a certain document better. To this end, we present a novel model namely Tipster to solve text segmentation and segment topic labeling collaboratively. We first utilize a neural topic model to infer latent topic distributions of sentences considering word distributions. Then, our model divides the document into topically coherent segments based on the topic-guided contextual sentence representations of the pre-trained language model and assign relevant topic labels to each segment. Finally, we conduct extensive experiments which demonstrate that Tipster achieves the state-of-the-art performance in both text segmentation and segment topic labeling tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arnold, S., Schneider, R., Cudré-Mauroux, P., Gers, F.A., Löser, A.: SECTOR: a neural model for coherent topic segmentation and classification. Trans. Assoc. Comput. Linguist. 7, 169–184 (2019)

    Article  Google Scholar 

  2. Barrow, J., Jain, R., Morariu, V., Manjunatha, V., Oard, D.W., Resnik, P.: A joint model for document segmentation and segment labeling. In: Proceedings of ACL (2020)

    Google Scholar 

  3. Chen, H., Branavan, S., Barzilay, R., Karger, D.R.: Content modeling using latent permutations. J. Artif. Intell. Res. 36, 129–163 (2009)

    Article  MathSciNet  Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of ACL (2019)

    Google Scholar 

  5. Eisenstein, J., Barzilay, R.: Bayesian unsupervised topic segmentation. In: Proceedings of EMNLP (2008)

    Google Scholar 

  6. Glavaš, G., Somasundaran, S.: Two-level transformer and auxiliary coherence modeling for improved text segmentation. In: Proceedings of AAAI (2020)

    Google Scholar 

  7. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of ICLR (2014)

    Google Scholar 

  8. Koshorek, O., Cohen, A., Mor, N., Rotman, M., Berant, J.: Text segmentation as a supervised learning task. In: Proceedings of ACL (2018)

    Google Scholar 

  9. Li, B., Zhou, H., He, J., Wang, M., Yang, Y., Li, L.: On the sentence embeddings from pre-trained language models. In: Proceedings of EMNLP (2020)

    Google Scholar 

  10. Lukasik, M., Dadachev, B., Papineni, K., Simões, G.: Text segmentation by cross segment attention. In: Proceedings of EMNLP (2020)

    Google Scholar 

  11. Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: Proceedings of ICML (2015)

    Google Scholar 

  12. Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of ICML (2014)

    Google Scholar 

  13. Riedl, M., Biemann, C.: TopicTiling: a text segmentation algorithm based on LDA. In: Proceedings of ACL 2012 Student Research Workshop (2012)

    Google Scholar 

  14. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of ACL (2016)

    Google Scholar 

Download references

Acknowledgement

This research was partially supported by grants from the National Natural Science Foundation of China (Grants No. 61922073 and U20A20229), and the Foundation of State Key Laboratory of Cognitive Intelligence, iFLYTEK, P. R. China (No. CI0S-2020SC05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gong, Z. et al. (2022). Tipster: A Topic-Guided Language Model for Topic-Aware Text Segmentation. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13247. Springer, Cham. https://doi.org/10.1007/978-3-031-00129-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-00129-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-00128-4

  • Online ISBN: 978-3-031-00129-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics