Skip to main content

Table Based Single Pass Algorithm for Clustering News Articles in NewsPage.com

  • Conference paper
Computational Science and Its Applications – ICCSA 2008 (ICCSA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5073))

Included in the following conference series:

  • 1569 Accesses

Abstract

This research proposes a modified version of single pass algorithm specialized for text clustering. Encoding documents into numerical vectors for using the traditional version of single pass algorithm causes the two main problems: huge dimensionality and sparse distribution. Therefore, in order to address the two problems, this research modifies the single pass algorithm into its version where documents are encoded into other forms than numerical vectors. In the proposed version, documents are mapped into tables and an operation on two tables is defined for using the single pass algorithm. The goal of this research is to improve the performance of single pass algorithm for text clustering by modifying it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text Classification with String Kernels. Journal of Machine Learning Research 2(2), 419–444 (2002)

    Article  MATH  Google Scholar 

  2. Jo, T., Lee, M.: The Evaluation Measure of Text Clustering for the Variable Number of Clusters. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007. LNCS, vol. 4492, pp. 871–879. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM-Self Organizing Maps of Document Collections. Neurocomputing 21, 101–117 (1998)

    Article  MATH  Google Scholar 

  4. Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Paatero, V., Saarela, A.: Self Organization of a Massive Document Collection. IEEE Transaction on Neural Networks 11(3), 574–585 (2000)

    Article  Google Scholar 

  5. Hatzivassiloglou, V., Gravano, L., Maganti, A.: An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering. In: The Proceedings of 23rd SIGIR, pp. 224–231 (2000)

    Google Scholar 

  6. Vinokourov, A., Girolami, M.: A Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents. In: The Proceedings of 15th International Conference on Pattern Recognition, pp. 182–185 (2000)

    Google Scholar 

  7. Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.: Generative model-based clustering of directional data. In: The Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 19–28 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Osvaldo Gervasi Beniamino Murgante Antonio Laganà David Taniar Youngsong Mun Marina L. Gavrilova

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yeom, G., Jo, T., Yeom, Y. (2008). Table Based Single Pass Algorithm for Clustering News Articles in NewsPage.com. In: Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science, vol 5073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69848-7_94

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69848-7_94

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69840-1

  • Online ISBN: 978-3-540-69848-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics