Tipster: A Topic-Guided Language Model for Topic-Aware Text Segmentation

Gong, Zheng; Tong, Shiwei; Wu, Han; Liu, Qi; Tao, Hanqing; Huang, Wei; Yu, Runlong

doi:10.1007/978-3-031-00129-1_14

Zheng Gong¹⁶,
Shiwei Tong¹⁶,
Han Wu¹⁶,
Qi Liu¹⁶,
Hanqing Tao¹⁶,
Wei Huang¹⁶ &
…
Runlong Yu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13247))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

2932 Accesses

Abstract

The accurate segmentation and structural topics of plain documents not only meet people’s reading habit, but also facilitate various downstream tasks. Recently, some works have consistently given positive hints that text segmentation and segment topic labeling could be regarded as a mutual task, and cooperating with word distributions has the potential to model latent topics in a certain document better. To this end, we present a novel model namely Tipster to solve text segmentation and segment topic labeling collaboratively. We first utilize a neural topic model to infer latent topic distributions of sentences considering word distributions. Then, our model divides the document into topically coherent segments based on the topic-guided contextual sentence representations of the pre-trained language model and assign relevant topic labels to each segment. Finally, we conduct extensive experiments which demonstrate that Tipster achieves the state-of-the-art performance in both text segmentation and segment topic labeling tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

STTS: A Novel Span-Based Approach for Topic-Aware Text Segmentation

On Text Tiling for Documents: A Neural-Network Approach

Topic Segmentation of Semi-structured and Unstructured Conversational Datasets Using Language Models

References

Arnold, S., Schneider, R., Cudré-Mauroux, P., Gers, F.A., Löser, A.: SECTOR: a neural model for coherent topic segmentation and classification. Trans. Assoc. Comput. Linguist. 7, 169–184 (2019)
Article Google Scholar
Barrow, J., Jain, R., Morariu, V., Manjunatha, V., Oard, D.W., Resnik, P.: A joint model for document segmentation and segment labeling. In: Proceedings of ACL (2020)
Google Scholar
Chen, H., Branavan, S., Barzilay, R., Karger, D.R.: Content modeling using latent permutations. J. Artif. Intell. Res. 36, 129–163 (2009)
Article MathSciNet Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of ACL (2019)
Google Scholar
Eisenstein, J., Barzilay, R.: Bayesian unsupervised topic segmentation. In: Proceedings of EMNLP (2008)
Google Scholar
Glavaš, G., Somasundaran, S.: Two-level transformer and auxiliary coherence modeling for improved text segmentation. In: Proceedings of AAAI (2020)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of ICLR (2014)
Google Scholar
Koshorek, O., Cohen, A., Mor, N., Rotman, M., Berant, J.: Text segmentation as a supervised learning task. In: Proceedings of ACL (2018)
Google Scholar
Li, B., Zhou, H., He, J., Wang, M., Yang, Y., Li, L.: On the sentence embeddings from pre-trained language models. In: Proceedings of EMNLP (2020)
Google Scholar
Lukasik, M., Dadachev, B., Papineni, K., Simões, G.: Text segmentation by cross segment attention. In: Proceedings of EMNLP (2020)
Google Scholar
Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: Proceedings of ICML (2015)
Google Scholar
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of ICML (2014)
Google Scholar
Riedl, M., Biemann, C.: TopicTiling: a text segmentation algorithm based on LDA. In: Proceedings of ACL 2012 Student Research Workshop (2012)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of ACL (2016)
Google Scholar

Download references

Acknowledgement

This research was partially supported by grants from the National Natural Science Foundation of China (Grants No. 61922073 and U20A20229), and the Foundation of State Key Laboratory of Cognitive Intelligence, iFLYTEK, P. R. China (No. CI0S-2020SC05).

Author information

Authors and Affiliations

Anhui Province Key Laboratory of Big Data Analysis and Application, University of Science and Technology of China, Hefei, 230026, China
Zheng Gong, Shiwei Tong, Han Wu, Qi Liu, Hanqing Tao, Wei Huang & Runlong Yu

Authors

Zheng Gong
View author publications
You can also search for this author in PubMed Google Scholar
Shiwei Tong
View author publications
You can also search for this author in PubMed Google Scholar
Han Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hanqing Tao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Runlong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Liu .

Editor information

Editors and Affiliations

Indian Institute of Technology Kanpur, Kanpur, India
Arnab Bhattacharya
National University of Singapore, Singapore, Singapore
Janice Lee Mong Li
University of California, Santa Barbara, Santa Barbara, CA, USA
Divyakant Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
Indraprastha Institute of Information Technology Delhi, New Delhi, India
Mukesh Mohania
Ashoka University, Sonepat, Haryana, India
Anirban Mondal
Indraprastha Institute of Information Technology Delhi, New Delhi, India
Vikram Goyal
University of Aizu, Aizu, Japan
Rage Uday Kiran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gong, Z. et al. (2022). Tipster: A Topic-Guided Language Model for Topic-Aware Text Segmentation. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13247. Springer, Cham. https://doi.org/10.1007/978-3-031-00129-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-00129-1_14
Published: 08 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00128-4
Online ISBN: 978-3-031-00129-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics