Topic Identification in Dynamical Text by Complexity Pursuit

Bingham, Ella; Kabán, Ata; Girolami, Mark

doi:10.1023/A:1022990829563

Topic Identification in Dynamical Text by Complexity Pursuit

Published: February 2003

Volume 17, pages 69–83, (2003)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Ella Bingham¹,
Ata Kabán¹ &
Mark Girolami¹

153 Accesses
26 Citations
Explore all metrics

Abstract

The problem of analysing dynamically evolving textual data has arisen within the last few years. An example of such data is the discussion appearing in Internet chat lines. In this Letter a recently introduced source separation method, termed as complexity pursuit, is applied to the problem of finding topics in dynamical text and is compared against several blind separation algorithms for the problem considered. Complexity pursuit is a generalisation of projection pursuit to time series and it is able to use both higher-order statistical measures and temporal dependency information in separating the topics. Experimental results on chat line and newsgroup data demonstrate that the minimum complexity time series indeed do correspond to meaningful topics inherent in the dynamical text data, and also suggest the applicability of the method to query-based retrieval from a temporally changing text stream.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Allan, J., Carbonell, J., Doddington, G., Yamron, J. and Yang, Y.: Topic detection and tracking pilot study. Final report, In: Proc. of DARPA Broadcast News Transcription and Understanding Workshop, 1998, pp. 194–218.
Baeza-Yates, R. A. and Ribeiro-Neto, B.: Modern Information Retrieval, New York: ACM Press, 1999.
Google Scholar
Belouchrani, A., Meraim, K. A., Cardoso, J.-F. and Moulines, E.: A blind source separation technique based on second order statistics, IEEE Tr. on Signal Processing, 45(2) (1997), 434–444.
Article Google Scholar
Berry, M. W., Dumais, S. T. and Letsche, T. A.: Computational methods for intelligent information access, In: Proc. of Supercomputing '95,San Diego,CA: USA, 1995.
Bingham, E., Kabán, A. and Girolami, M.: Finding topics in dynamical text: application to chat line discussions, In: 10th Int. World Wide Web Conf. Poster Proc., 2001, pp. 198–199.
Comon, P.: Independent component analysis—a new concept? Signal Processing, 36 (1994), 287–314.
Article MATH Google Scholar
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R.: Indexing by latent semantic analysis, Journal of the American Society for Information Science, 41(6) (1990), 391–407.
Article Google Scholar
Friedman, J. H. and Tukey, J. W.: A projection pursuit algorithm for exploratory data analysis, IEEE Tr. of Computers, c-23(9) (1974), 881–890.
Google Scholar
Hofmann, T.: Probabilistic Latent Semantic Analysis, In: Proc. 15th Annual Conf. on Uncertainty in Artificial Intelligence (UAI'99), Sweden: Stockholm, 1999.
Google Scholar
Hyvärinen, A.: Fast and robust fixed-point algorithms for independent component analysis, IEEE Tr. on Neural Networks, 10(3) (1999), 626–634.
Article Google Scholar
Hyvärinen, A.: Complexity pursuit: separating interesting components from time-series, Neural Computation, 13(4) (2001), 883–898.
Article MATH Google Scholar
Hyvärinen, A., Karhunen, J. and Oja, E.: Independent component analysis,Wiley Interscience, 2001.
Isbell, C. L. and Viola, P.: Restucturing sparse high dimensional data for effective retrieval, In: Advances in Neural Information Processing Systems 11, 1998,pp. 480–486.
Google Scholar
Jutten, C. and Herault, J.: Blind separation of sources,part I: An adaptive algorithm based on neuromimetic architecture, Signal Processing, 24 (1991), 1–10.
Article MATH Google Scholar
Kabán, A. and Girolami, M.: Unsupervised topic separation and keyword identification in document collections: a projection approach,Technical Report 10, Dept. of Computing and Information Systems,Univ. of Paisley, 2000.
Kabán, A. and Girolami, M.: A combined latent class and trait model for the analysis and visualization of discrete data, IEEE Tr. on Pattern Analysis, 23(8) (2001), 859–872.
Article Google Scholar
Kabán, A. and Girolami, M.: A dynamic probabilistic model to visualize topic evolution in text streams, Journal of Intelligent Information Systems, Special Issue on Automated Text Categorization, 18(2) (2002).
Katz, S.: Distribution of content words and phrases in text and language modeling, Natural Language Engineering, 2(1) (1996), 15–59.
Article Google Scholar
Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V. and Saarela, A.: Self organization of a massive document collection, IEEE Tr. on Neural Networks, 11(3) (2000) 574–585. Special Issue on Neural Networks for Data Mining and Knowledge Discovery.
Article Google Scholar
Kolenda, T. and Hansen, L. K.: Dynamical components of chat, Technical report Technical University of Denmark, 2000.
Kolenda, T., Hansen, L. K. and Larsen, J.: Signal detection using ICA: application to chat room topic spotting, In: Lee and Jung and Makeig and Sejnowski (eds.): Proc. of the Third International Conference on Independent Component Analysis and Signal Separation (ICA2001), San Diego, CA: USA pp. 540–545, 2001.
Kolenda, T., Hansen, L. K. and Sigurdsson, S.: Independent components in text, In: M. Girolami (ed.): Advances in Independent Component Analysis, Springer-Verlag, 2000, Chapt. 13,pp. 235–256.
Molgedey, L. and Schuster, H. G.: Separation of a mixture of independent signals using time delayed correlations, Phys. Review Letters, 72(23) (1994),3634–3637.
Article ADS Google Scholar
Müller, K.-R., Philips, P. and Ziehe, A.: JADETD: Combining higher-order statistics and temporal information for blind source separation (with noise), In: Proc. Int. Workshop on Independent Component Analysis and Signal Separation (ICA '99), France: Aussois, 1999, pp. 87–92.
Google Scholar
Pajunen, P.: Blind source separation using algorithmic information theory, Neurocomputing, 22 (1998), 35–48.
Article MATH Google Scholar
Pajunen, P.: Blind source separation of natural signals based on approximate complexity minimization, In: Proc. Int. Workshop on Independent Component Analysis and Signal Separation (ICA '99), France: Aussois, 1999, pp. 267–270.
Google Scholar
Papadimitriou, C., Raghavan, P., Tamaki, H. and Vempala, S.: Latent semantic indexing: a probabilistic analysis, In: Proc. 17th ACM Symp. Principles of Database Systems, Seattle, 1998,pp. 159–168.
Salton, G. and McGill, M.J.: Introduction to modern information retrieval, New York: McGraw-Hill, 1983.
Google Scholar
Slaney, M. and Ponceleon, D.: Hierarchical segmentation: finding changes in a text signal, In: Proc. of the SIAM Text Mining 2001 Workshop,Chicago, IL: 2001, pp. 6–13.
Stone, J. V.: Blind source separation using temporal predictability, Neural Computation, 13(4) (2001).

Download references

Author information

Authors and Affiliations

Neural Networks Research Centre, Helsinki University of Technology, P.O. Box 5400, FIN-02015, HUT, Finland
Ella Bingham, Ata Kabán & Mark Girolami

Authors

Ella Bingham
View author publications
You can also search for this author in PubMed Google Scholar
Ata Kabán
View author publications
You can also search for this author in PubMed Google Scholar
Mark Girolami
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bingham, E., Kabán, A. & Girolami, M. Topic Identification in Dynamical Text by Complexity Pursuit. Neural Processing Letters 17, 69–83 (2003). https://doi.org/10.1023/A:1022990829563

Download citation

Issue Date: February 2003
DOI: https://doi.org/10.1023/A:1022990829563

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topic Identification in Dynamical Text by Complexity Pursuit

Abstract

Access this article

Similar content being viewed by others

Comparison of Two-Pass Algorithms for Dynamic Topic Modeling Based on Matrix Decompositions

How Many Topics? Stability Analysis for Topic Models

Complex temporal topic evolution modelling using the Kullback-Leibler divergence and the Bhattacharyya distance

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Topic Identification in Dynamical Text by Complexity Pursuit

Abstract

Access this article

Similar content being viewed by others

Comparison of Two-Pass Algorithms for Dynamic Topic Modeling Based on Matrix Decompositions

How Many Topics? Stability Analysis for Topic Models

Complex temporal topic evolution modelling using the Kullback-Leibler divergence and the Bhattacharyya distance

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation