Skip to main content

Personalized News Categorization Through Scalable Text Classification

  • Conference paper
Frontiers of WWW Research and Development - APWeb 2006 (APWeb 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Included in the following conference series:

Abstract

Existing news portals on the WWW aim to provide users with numerous articles that are categorized into specific topics. Such a categorization procedure improves presentation of the information to the end-user. We further improve usability of these systems by presenting the architecture of a personalized news classification system that exploits user’s awareness of a topic in order to classify the articles in a ‘per-user’ manner. The system’s classification procedure bases upon a new text analysis and classification technique that represents documents using the vector space representation of their sentences. Traditional ‘term-to-documents’ matrix is replaced by a ‘term-to-sentences’ matrix that permits capturing more topic concepts of every document.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Achlioptas, D., McSherry, F.: Fast Computation of Low Rank Matrix Approximations. In: STOC 2001. ACM, New York (2001)

    Google Scholar 

  2. Azar, Y., Fiat, A., Karlin, A., McSherry, F., Saia, J.: Data mining through spectral analysis. In: STOC 2001. ACM, New York (2001)

    Google Scholar 

  3. Berry, M.W., Dumais, S.T., O’ Brien, G.W.: Using Linear Algebra for Intelligent Information Retrieval, UT-CS-94-270, Technical Report

    Google Scholar 

  4. Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, Vector Spaces, and Information Retrieval. SIAM Review 41(2), 335–362

    Google Scholar 

  5. Bouras, C., Kapoulas, V., Misedakis, I.: A Web - page fragmentation technique for personalized browsing. In: 19th ACM Symposium on Applied Computing - Track on Internet Data Management, Nicosia, Cyprus, March 14 - 17, pp. 1146–1147 (2004)

    Google Scholar 

  6. Bouras, C., Konidaris, A.: Web Components: A Concept for Improving Personalization and Reducing User Perceived Latency on the World Wide Web. In: Proceedings of the 2nd International Conference on Internet Computing (IC 2001), Las Vegas, Nevada, USA, vol. 2, pp. 238–244 (June 2001)

    Google Scholar 

  7. CMU Text Learning Group Data Archives, 20 newsgroup dataset, http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html

  8. Drineas, P., Kannan, R., Frieze, A., Vempala, S., Vinay, V.: Clustering of large graphs via the singular value decomposition. Machine Learning 56, 9–33 (2004)

    Article  MATH  Google Scholar 

  9. Drineas, P., Kannan, R., Mahoney, M.: Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix, Tech.Report TR-1270, Yale University, Department of Computer Science (February 2004)

    Google Scholar 

  10. Dumais, S., Furnas, G., Landauer, T.: Indexing by Semantic Analysis. SIAM

    Google Scholar 

  11. Google News Service, http://news.google.com

  12. Jones, W., Furnas, G.: Pictuers of relevance: A geometric analysis of similarity measures. J. American Society for Information Science 38, 420–442 (1987)

    Article  Google Scholar 

  13. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  14. My Yahoo!, http://my.yahoo.com

  15. Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: A probabilistic analysis. In: 17th Annual Symposium on Principles of Database Systems (Seattle, WA, 1998), pp. 159–168 (1998)

    Google Scholar 

  16. Rainbow, statistical text classifier, http://www-2.cs.cmu.edu/~mccallum/bow/rainbow/

  17. Zeimpekis, D., Gallopoulos, E.: Design of a MATLAB toolbox for term-document matrix generation. In: Proceedings of the Workshop on Clustering High Dimensional Data, SIAM 2005 (2005) (to appear)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Antonellis, I., Bouras, C., Poulopoulos, V. (2006). Personalized News Categorization Through Scalable Text Classification. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_35

Download citation

  • DOI: https://doi.org/10.1007/11610113_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31142-3

  • Online ISBN: 978-3-540-32437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics