Skip to main content

Topic Segmentation: Application of Mathematical Morphology to Textual Data

  • Conference paper
Mathematical Morphology and Its Applications to Image and Signal Processing (ISMM 2011)

Abstract

Mathematical Morphology (MM) offers a generic theoretical framework for data processing and analysis. Nevertheless, it remains essentially used in the context of image analysis and processing, and the attempts to use MM on other kinds of data are still quite rare. We believe MM can provide relevant solutions for data analysis and processing in a far broader range of application fields. To illustrate, we focus here on textual data and we show how morphological operators (here the morphological segmentation using watershed transform) may be applied on these data. We thus provide an original MM-based solution to the thematic segmentation problem, which is a typical problem in the fields of natural language processing and information retrieval (IR).

More precisely, we consider here TV broadcasts through their transcription obtained by automatic speech recognition. To perform topic segmentation, we compute the similarity between successive segments using a technique called vectorization which has recently been introduced in the IR field. We then apply a gradient operator to build a topographic surface to be segmented using the watershed transform. This new topic segmentation technique is evaluated on two corpora of TV broadcasts on which it outperforms other existing approaches. Despite using very common morphological operators (i.e., the standard Watershed Transform), we thus show the potential interest of MM to be applied on non-image data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abraham, I., Bartal, Y., Neiman, O.: Advances in metric embedding theory. In: Proc. of Symposium on Theory of Computing, Seattle, USA (2006)

    Google Scholar 

  2. Claveau, V., Tavenard, R., Amsaleg, L.: Vectorisation des processus d’appariement document-requête. In: 7e Conférence en Recherche d’informations et Applications, CORIA 2010, Sousse, Tunisie, pp. 313–324 (March 2010)

    Google Scholar 

  3. Derivaux, S., Forestier, G., Wemmert, C., Lefèvre, S.: Supervised segmentation using machine learning and evolutionnary computation. Pattern Recognition Letters 31(15), 2364–2374 (2010)

    Article  Google Scholar 

  4. Guinaudeau, C., Gravier, G., Sébillot, P.: Utilisation de relations sémantiques pour améliorer la segmentation thématique de documents télévisuels. In: Actes de la Conférence Traitement automatique des Langues, Montréal, Canada (2010)

    Google Scholar 

  5. Hearst, M.: Text-tiling: segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1), 33–64 (1997)

    Google Scholar 

  6. Huet, S., Gravier, G., Sébillot, P.: Morpho-syntactic post-processing with n-best lists for improved french automatic speech recognition. Computer Speech and Language 24(4), 663–684 (2010)

    Article  Google Scholar 

  7. Rivest, J., Beucher, S., Delhomme, J.: Marker-controlled segmentation: an application to electrical borehole imaging. Journal of Electronic Imaging 1(2), 136–142 (1992)

    Article  Google Scholar 

  8. Salton, G.: A Theory of Indexing. Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphy (1975)

    Book  MATH  Google Scholar 

  9. Utiyama, M., Isahara, H.: A statistical model for domain-independent text segmentation. In: Proceedings of the 9th Conference of the ACL (2001)

    Google Scholar 

  10. Vincent, L., Soille, P.: Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(6), 583–598 (1991)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lefèvre, S., Claveau, V. (2011). Topic Segmentation: Application of Mathematical Morphology to Textual Data. In: Soille, P., Pesaresi, M., Ouzounis, G.K. (eds) Mathematical Morphology and Its Applications to Image and Signal Processing. ISMM 2011. Lecture Notes in Computer Science, vol 6671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21569-8_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21569-8_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21568-1

  • Online ISBN: 978-3-642-21569-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics