skip to main content
10.1145/3477495.3531817acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Dialogue Topic Segmentation via Parallel Extraction Network with Neighbor Smoothing

Published: 07 July 2022 Publication History

Abstract

Dialogue topic segmentation is a challenging task in which dialogues are split into segments with pre-defined topics. Existing works on topic segmentation adopt a two-stage paradigm, including text segmentation and segment labeling. However, such methods tend to focus on the local context in segmentation, and the inter-segment dependency is not well captured. Besides, the ambiguity and labeling noise in dialogue segment bounds bring further challenges to existing models. In this work, we propose the Parallel Extraction Network with Neighbor Smoothing (PEN-NS) to address the above issues. Specifically, we propose the parallel extraction network to perform segment extractions, optimizing the bipartite matching cost of segments to capture inter-segment dependency. Furthermore, we propose neighbor smoothing to handle the segment-bound noise and ambiguity. Experiments on a dialogue-based and a document-based topic segmentation dataset show that PEN-NS outperforms state-the-of-art models significantly.

References

[1]
Sebastian Arnold, Rudolf Schneider, Philippe Cudré-Mauroux, Felix A. Gers, and Alexander Löser. 2019. SECTOR: A Neural Model for Coherent Topic Segmentation and Classification. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 169--184.
[2]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In ICLR .
[3]
Joe Barrow, R. Jain, Vlad I. Morariu, Varun Manjunatha, Douglas W. Oard, and Philip Resnik. 2020. A Joint Model for Document Segmentation and Segment Labeling. In ACL .
[4]
Doug Beeferman, Adam L. Berger, and John D. Lafferty. 2004. Statistical Models for Text Segmentation. Machine Learning, Vol. 34 (2004), 177--210.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL .
[6]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, Vol. 9 (1997), 1735--1780.
[7]
Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In ACL .
[8]
Joo-Kyung Kim, Guoyin Wang, Sungjin Lee, and Young-Bum Kim. 2021. Deciding Whether to Ask Clarifying Questions in Large-Scale Spoken Language Understanding. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2021), 869--876.
[9]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR, Vol. abs/1412.6980 (2015).
[10]
Omri Koshorek, Adir Cohen, Noam Mor, Michael Rotman, and Jonathan Berant. 2018. Text Segmentation as a Supervised Learning Task. In NAACL .
[11]
Harold W. Kuhn. 2010. The Hungarian Method for the Assignment Problem. In 50 Years of Integer Programming .
[12]
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In NAACL .
[13]
Chunyi Liu, Peng Wang, Jiang Xu, Zang Li, and Jieping Ye. 2019. Automatic Dialogue Summary Generation for Customer Service. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019).
[14]
Kelvin Lo, Yuan Jin, Weicong Tan, Ming Liu, Lan Du, and Wray L. Buntine. 2021. Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence. In EMNLP .
[15]
Fabrizio Macagno and Sarah Bigi. 2018. Types of dialogue and pragmatic ambiguity.
[16]
Ryo Masumura, Takanobu Oba, Hirokazu Masataki, Osamu Yoshioka, and Satoshi Takahashi. 2014. Role play dialogue topic model for language model adaptation in multi-party conversation speech recognition. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), 4873--4877.
[17]
Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR .
[18]
Pedro Mota, Maxine Eskénazi, and Luísa Coheur. 2019. BeamSeg: A Joint Model for Multi-Document Segmentation and Topic Identification. In CoNLL .
[19]
Rafael Müller, Simon Kornblith, and Geoffrey E. Hinton. 2019. When Does Label Smoothing Help?. In NeurIPS .
[20]
Artem Popov, Victor Bulatov, Darya Polyudova, and Eugenia Veselova. 2019. Unsupervised dialogue intent detection via hierarchical topic model. In RANLP .
[21]
MengNan Qi, Hao Liu, Yuzhuo Fu, and Ting Liu. 2021. Improving Abstractive Dialogue Summarization with Hierarchical Pretraining and Topic Segment. In EMNLP .
[22]
Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text Chunking using Transformation-Based Learning. ArXiv, Vol. cmp-lg/9505040 (1995).
[23]
Imran A. Sheikh, D. Fohr, and Irina Illina. 2017. Topic segmentation in ASR transcripts using bidirectional RNNS for change detection. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2017), 512--518.
[24]
Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., Vol. 15 (2014), 1929--1958.
[25]
Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS .
[26]
Linzi Xing, Bradley Alexander Hackinen, Giuseppe Carenini, and Francesco Trebbi. 2020. Improving Context Modeling in Neural Topic Segmentation. In AACL .
[27]
Xingqian Xu, Zhifei Zhang, Zhaowen Wang, Brian L. Price, Zhonghao Wang, and Humphrey Shi. 2021 a. Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 12040--12050.
[28]
Yi Xu, Hai Zhao, and Zhuosheng Zhang. 2021 b. Topic-Aware Multi-turn Dialogue Modeling. In AAAI .
[29]
Seunghyun Yoon, Joongbo Shin, and Kyomin Jung. 2018. Learning to Rank Question-Answer Pairs Using Hierarchical Recurrent Encoder with Latent Topic Clustering. In NAACL .
[30]
Hainan Zhang, Yanyan Lan, Liang Pang, Hongshen Chen, Zhuoye Ding, and Dawei Yin. 2020. Modeling Topical Relevance for Multi-Turn Dialogue Generation. In IJCAI .
[31]
Yujun Zhou, Changliang Li, Saike He, Xiaoqi Wang, and Yiming Qiu. 2019. Pre-trained Contextualized Representation for Chinese Conversation Topic Classification. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI) (2019), 122--127.
[32]
Lin Zhu, Xinnan Dai, Qihao Huang, Hai Xiang, and Jie Zheng. 2019. Topic Judgment Helps Question Similarity Prediction in Medical FAQ Dialogue Systems. 2019 International Conference on Data Mining Workshops (ICDMW) (2019), 966--972.
[33]
Yicheng Zou, Lujun Zhao, Yangyang Kang, Jun Lin, Minlong Peng, Zhuoren Jiang, Changlong Sun, Qi Zhang, Xuanjing Huang, and Xiaozhong Liu. 2021. Topic-Oriented Spoken Dialogue Summarization for Customer Service with Saliency-Aware Topic Modeling. In AAAI .

Cited By

View all
  • (2024)Detecting AI-generated sentences in human-AI collaborative hybrid textsProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/835(7545-7553)Online publication date: 3-Aug-2024
  • (2024)PODTILE: Facilitating Podcast Episode Browsing with Auto-generated ChaptersProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680081(4487-4495)Online publication date: 21-Oct-2024
  • (2024)Global-SEG: Text Semantic Segmentation Based on Global Semantic Pair RelationsDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70546-5_15(253-269)Online publication date: 11-Sep-2024
  • Show More Cited By

Index Terms

  1. Dialogue Topic Segmentation via Parallel Extraction Network with Neighbor Smoothing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2022
      3569 pages
      ISBN:9781450387323
      DOI:10.1145/3477495
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 July 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. boundary ambiguity
      2. data noise
      3. dialogue topic segmentation
      4. neighbor smoothing.
      5. parallel extraction

      Qualifiers

      • Short-paper

      Funding Sources

      • PKU-Baidu Fund
      • National Natural Science Foundation of China

      Conference

      SIGIR '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)68
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Detecting AI-generated sentences in human-AI collaborative hybrid textsProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/835(7545-7553)Online publication date: 3-Aug-2024
      • (2024)PODTILE: Facilitating Podcast Episode Browsing with Auto-generated ChaptersProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680081(4487-4495)Online publication date: 21-Oct-2024
      • (2024)Global-SEG: Text Semantic Segmentation Based on Global Semantic Pair RelationsDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70546-5_15(253-269)Online publication date: 11-Sep-2024
      • (2023)Unsupervised Dialogue Topic Segmentation with Topic-aware Contrastive LearningProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592081(2481-2485)Online publication date: 19-Jul-2023
      • (2023)Multimodal Dialogue Understanding via Holistic Modeling and Sequence LabelingNatural Language Processing and Chinese Computing10.1007/978-3-031-44699-3_36(399-411)Online publication date: 12-Oct-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media