Skip to main content

Analysis of Text Cluster Visualization in Emergent Self Organizing Maps Using Unigrams and Its Variations after Introducing Bigrams

  • Conference paper
Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011) December 20-22, 2011

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 131))

Abstract

The process of clustering aims to discover natural groupings, and thus present an overview of the classes (topics) in a collection of documents. In the field of artificial intelligence, this is known as unsupervised machine learning. Extraction of internal structure from document collections in the absence of pre-classified training data, is a challenging task in text-mining due to the high dimensionality of the input data (usually in the form of word-frequency vectors derived from the bag-of-words (BOW) model of document representation). Self Organizing Maps (SOM) represents high-dimensional data in the form of topology preserving two-dimensional projections which can be exploited for creating a natural visualization of data and at the same time to accomplish the task of dimensionality reduction. The feature of emergence which is the generation of complex systems and patterns by the cooperation of multiple elementary interactions provides a way of detecting higher level structures or cluster of clusters within a document corpus. The natural visualization of clusters is investigated in this study (rather than classification/categorization) using Emergent Self-Organized Maps by effectively introducing bigrams. Experiments have been conducted using a limited vocabulary of 925 documents containing 2000 unigrams and 1000 bigrams approximately to analyze the visualization of emergent higher level structures, document relatedness at lower level and at the same time show the presence of micro-clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nicholas, O.A., Edward, A.F.: Recent Developments in Document Clustering. Department of Computer Science, Virginia Tech., Blacksburg, VA 24060 (2007)

    Google Scholar 

  2. Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: 7th ACM International Conference on Information and Knowledge Management, Bethesda, US, pp. 148–155 (1998)

    Google Scholar 

  3. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw Hill (1983)

    Google Scholar 

  4. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2000)

    MATH  Google Scholar 

  5. Bekkerman, R., Allan, J.: Using Bigrams in Text Categorization. CIIR Technical Report IR-408. University of Massachusetts, Amherst, US (2003)

    Google Scholar 

  6. Weiss, S.M., Apté, C., Damerau, F.J., Johnson, D.E., Oles, F.J., Goetz, T., Hampp, T.: Maximizing text-mining performance. IEEE Intelligent Systems 14(4), 63–69 (1999)

    Article  Google Scholar 

  7. Delen, D., Crossland, M.D.: Seeding the survey and analysis of research literature with text mining. Expert Systems with Applications 34(3), 1707–1720 (2008)

    Article  Google Scholar 

  8. Li, M., Zhang, L.: Multinomial mixture model with feature selection for text clustering. Knowledge-Based Systems 21, 704–708 (2008)

    Article  Google Scholar 

  9. Kohonen, T.: Self Organizing Maps, 3rd edn. Springer, Berlin (2001)

    Book  Google Scholar 

  10. Mingoti, S.A., Lima, J.O.: Comparing SOM neural network with Fuzzy c-means, Kmeans and traditional hierarchical clustering algorithms. European Journal of Operational Research 174, 1742–1759 (2006)

    Article  Google Scholar 

  11. Ultsch, A., Moerchen, F.: ESOM-Maps: Tools for clustering, visualization, and classification with Emergent SOM. Technical Report No. 46, Dept. of Mathematics and Computer Science, University of Marburg, Marburg, Germany (2005)

    Google Scholar 

  12. Yen, G.G., Wu, Z.: Ranked Centroid Projection: A Data Visualization Approach with Self-Organizing Maps. IEEE Transaction on Neural Networks 19(2), 245–259 (2008)

    Article  Google Scholar 

  13. Feng, Z., Bao, J., Shen, J.: Dynamic and Adaptive Self Organizing Maps applied to High Dimensional Large Scale Text Clustering. In: IEEE International Conference on Software Engineering and Service Sciences, pp. 348–351 (2010)

    Google Scholar 

  14. Schapire, R.E., Singer, Y.: BOOSTEXTER: a boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)

    Article  Google Scholar 

  15. Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: 15th ACM International Conference on Research and Development in Information Retrieval, Kobenhavn, DK, pp. 37–50 (1992)

    Google Scholar 

  16. Mladenić, D., Grobelnik, M.: Word sequences as features in text-learning. In: Seventh Electrotechnical and Computer Science Conference, Ljubljana, SL, pp. 145–148 (1998)

    Google Scholar 

  17. Stopword Removal (Dated February 3, 2009), http://www.fromzerotoseo.com/stopwords-remove/

  18. Stemming (Dated March 27, 2011), http://en.wikipedia.org/wiki/Stemming

  19. Porter’sStemmer (Dated Summer 2005), http://www.comp.lancs.ac.uk/computing/research/stemming/general/porter.htm

  20. Reuter’scorpus (Dated May 14, 2004), http://www.daviddlewis.com/resources/testcollections/reuters21578/

  21. Ultsch, A.: Maps for the Visualization of high-dimensional Data Spaces. In: WSOM 2003, Kyushu, Japan, pp. 225–230 (2003)

    Google Scholar 

  22. Ultsch, A.: Data Mining and Knowledge Discovery with Emergent Self-Organizing Feature Maps for Multivariate Time Series. In: Kohonen Maps, pp. 33–46 (1999)

    Chapter  Google Scholar 

  23. Ultsch, A.: Self Organizing Neural Networks perform different from statistical k-means clustering. In: GfKl, Basel (1995)

    Google Scholar 

  24. Ultsch, A.: Self-Organizing Neural Networks for Visualization and Classification. In: Conf. Soc. for Information and Classification, Dortmund (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pramod Kumar Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer India Pvt. Ltd.

About this paper

Cite this paper

Singh, P.K., Machavolu, M., Bharti, K., Suda, R. (2012). Analysis of Text Cluster Visualization in Emergent Self Organizing Maps Using Unigrams and Its Variations after Introducing Bigrams. In: Deep, K., Nagar, A., Pant, M., Bansal, J. (eds) Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011) December 20-22, 2011. Advances in Intelligent and Soft Computing, vol 131. Springer, New Delhi. https://doi.org/10.1007/978-81-322-0491-6_89

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-0491-6_89

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-0490-9

  • Online ISBN: 978-81-322-0491-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics