Skip to main content
Log in

Discovery of hierarchical thematic structure in text collections with adaptive resonance theory

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper investigates the abilities of adaptive resonance theory (ART) neural networks as miners of hierarchical thematic structure in text collections. We present experimental results with binary ART1 on the benchmark Reuter-21578 corpus. Using both quantitative evaluation with the standard F 1 measure and qualitative visualization of the hierarchy obtained with ART, we discuss how useful ART built hierarchies would be to a user intending to use it as a means to find and access textual information. Our F 1 results show that ART1 produces hierarchical clustering that exhibit a quality exceeding k-means and a hierarchical clustering algorithm. However, we identify several critical problem areas that would make it rather impractical to actually use such a hierarchy in a real-life environment. These predicaments point to the importance of semantic feature selection. Our main contribution is to test in details the applicability of ART to the important domain of hierarchical document clustering, an application of Adaptive Resonance that had received little attention until now.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. F 1 in this case is a well-known clustering quality measure, to be distinguished from the F 1 of section 2 which is the name of the input layer of ART networks.

References

  1. Koller D, Sahami M (1997) Hierarchically classifying documents using very few words. In: Proceedings of the 14th international conference on machine learning (ICML97), pp 170–178

  2. Kiritchenko S, Matwin S, Nock R, Famili F (2006) Learning and evaluation in the presence of class hierarchies: application to text categorization. In: Proceedings of the Canadian artificial intelligence conference, QC, Canada

  3. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: Proceedings of the sixth ACM international conference on knowledge discovery and data mining (SIGKDD), Boston

  4. Heuser U, Rosenstiel W (2000) Automatic construction of local internet directories using hierarchical radius-based competitive learning. In: Proceedings of the 4th world multiconference on systemics, cybernetics and informatics (SCI 2000) July 23–26, 2000, Orlando, vol IV (communications systems and networks), pp 436–441 (invited paper)

  5. Zhao Y, Karypis G (2005) Hierarchical clustering algorithms for document datasets. Data Min Knowl Discov 10(2):141–168

    MathSciNet  Google Scholar 

  6. Fung BCM, Wang K, Ester M (2003) Hierarchical document clustering using frequent itemsets. In: Proceedings of the SIAM international conference on data mining. SDM’03, San Francisco, pp 59–70

  7. Kummamuru K, Lotlikar R, Roy S, Singal K, Krishnapuram R (2004) A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In: Proceedings of the 13th international conference on World Wide Web, pp 658–665

  8. Freeman RT, Yin H (2004) Adaptive topological tree structure for document organisation and visualization. Neural Netw 17(8–9):1255–1271

    MATH  Google Scholar 

  9. Grossberg S (1976) Adaptive pattern classification and universal recording: I. Parallel development and coding of neural feature detectors. Biol Cybern 23:121–134

    MATH  MathSciNet  Google Scholar 

  10. Vlajic N, Card HC (1998) Categorizing Web Pages using modified ART. In: Proceedings of IEEE 1998 Canadian conference on electrical and computer engineering, Waterloo

  11. Massey L (2002) Structure discovery in text collections. In: Proceedings of KES’2002, sixth international conference on knowledge-based intelligent information and engineering systems, Podere d’Ombriano, Italy

  12. Massey L (2003) On the quality of ART1 text clustering. Neural Netw 5–6(16):771–778

    Google Scholar 

  13. Massey L (2005) Real-world text clustering with adaptive resonance theory neural networks. In: Proceedings of 2005 international joint conference on neural networks, Montréal, Canada

  14. Beale R, Jackson T (1990) Neural computing: an introduction, IOP Publishing Ltd., Bristol

  15. Carpenter GA, Grossberg S (1987) Invariant pattern recognition and recall by an attentive self-organizing art architecture in a nonstationary world. In: Proceedings of the IEEE first international conference on neural networks, pp II-737–II-745

  16. Georgiopoulos M, Heileman GL, Huang J (1990) Convergence properties of learning in ART1. Neural Comput 2(4):502–509

    Article  Google Scholar 

  17. Moore B (1988) ART and pattern clustering. In: Proceedings of the 1988 Connectionist Models Summer School, pp 174–183

  18. Massey L (2003) Using ART1 neural networks to determine clustering tendency. In: Lotfi A, Garibaldi JM (eds) Applications and science in soft computing. Springer, Heidelberg, pp 17–22

  19. Ishihara S, Ishihara K, Nagamachi M, Matsubara Y (1995) arboART: ART based hierarchical clustering and its application to questionnaire data analysis. In: Proceedings of the IEEE international conference on neural networks, vol 1, pp 532–537

  20. Bartfai G, White R (1997) Adaptive resonance theory–based modular networks for incremental learning of hierarchical clusterings. Connect Sci 9(1):87–112

    Google Scholar 

  21. Lavoie P, Crespo J-P, Savaria Y (1999) Generalization, discrimination, and multiple categorization using adaptive resonance theory. IEEE Trans Neural Netw 10(4):757–67

    Google Scholar 

  22. Burke L (1995) Conscientious neural nets for tour construction in the traveling salesman problem: the vigilant net. Comput Oper Res 23(2):121–129

    Google Scholar 

  23. Bartfai G (1996) An ART-based modular architecture for learning hierarchical clusterings. Neurocomputing 13:31–45

    Google Scholar 

  24. Larsen B, Aone C (1999) Fast and effective text mining using linear-time document clustering. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 16–22

  25. Massey L (2005) An experimental methodology for text clustering. In: Proceedings of 2005 IASTED international conference on computational intelligence (CI 2005), Calgary, Canada

  26. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47

    Google Scholar 

  27. VanRijsbergen CJ (1979) Information retrieval. Butterworths, London

    Google Scholar 

  28. Larkey LS, Croft WB (1996) Combining classifiers in text categorization. In: Proceedings of SIGIR-96, 19th ACM international conference on research and development in information retrieval, Zurich, pp 289–297

  29. Weigend AS, Wiener ED, Pedersen JO (1999) Exploiting hierarchy in text categorization. Inform Retr 1(3):193–216

    Google Scholar 

  30. D’Alessio S, Murray M, Schiaffino R, Kershenbaum A (1998) Category levels in hierarchical text categorization. In: Proceedings of EMNLP-3, 3rd conference on empirical methods in natural language processing

  31. Hotho A, Staab S, Stumme G (2003) Wordnet improves text document clustering. In: Proceedings of the Semantic Web Workshop of the 26th annual international ACM SIGIR conference, Toronto, Canada

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Louis Massey.

Additional information

This research was supported in part by the National Defense Academic Research Program (ARP) under grant 743321.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Massey, L. Discovery of hierarchical thematic structure in text collections with adaptive resonance theory. Neural Comput & Applic 18, 261–273 (2009). https://doi.org/10.1007/s00521-008-0178-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-008-0178-2

Keywords

Navigation