Abstract
Existing news portals on the WWW aim to provide users with numerous articles that are categorized into specific topics. Such a categorization procedure improves presentation of the information to the end-user. We further improve usability of these systems by presenting the architecture of a personalized news classification system that exploits user’s awareness of a topic in order to classify the articles in a ‘per-user’ manner. The system’s classification procedure bases upon a new text analysis and classification technique that represents documents using the vector space representation of their sentences. Traditional ‘term-to-documents’ matrix is replaced by a ‘term-to-sentences’ matrix that permits capturing more topic concepts of every document.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Achlioptas, D., McSherry, F.: Fast Computation of Low Rank Matrix Approximations. In: STOC 2001. ACM, New York (2001)
Azar, Y., Fiat, A., Karlin, A., McSherry, F., Saia, J.: Data mining through spectral analysis. In: STOC 2001. ACM, New York (2001)
Berry, M.W., Dumais, S.T., O’ Brien, G.W.: Using Linear Algebra for Intelligent Information Retrieval, UT-CS-94-270, Technical Report
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, Vector Spaces, and Information Retrieval. SIAM Review 41(2), 335–362
Bouras, C., Kapoulas, V., Misedakis, I.: A Web - page fragmentation technique for personalized browsing. In: 19th ACM Symposium on Applied Computing - Track on Internet Data Management, Nicosia, Cyprus, March 14 - 17, pp. 1146–1147 (2004)
Bouras, C., Konidaris, A.: Web Components: A Concept for Improving Personalization and Reducing User Perceived Latency on the World Wide Web. In: Proceedings of the 2nd International Conference on Internet Computing (IC 2001), Las Vegas, Nevada, USA, vol. 2, pp. 238–244 (June 2001)
CMU Text Learning Group Data Archives, 20 newsgroup dataset, http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html
Drineas, P., Kannan, R., Frieze, A., Vempala, S., Vinay, V.: Clustering of large graphs via the singular value decomposition. Machine Learning 56, 9–33 (2004)
Drineas, P., Kannan, R., Mahoney, M.: Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix, Tech.Report TR-1270, Yale University, Department of Computer Science (February 2004)
Dumais, S., Furnas, G., Landauer, T.: Indexing by Semantic Analysis. SIAM
Google News Service, http://news.google.com
Jones, W., Furnas, G.: Pictuers of relevance: A geometric analysis of similarity measures. J. American Society for Information Science 38, 420–442 (1987)
Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)
My Yahoo!, http://my.yahoo.com
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: A probabilistic analysis. In: 17th Annual Symposium on Principles of Database Systems (Seattle, WA, 1998), pp. 159–168 (1998)
Rainbow, statistical text classifier, http://www-2.cs.cmu.edu/~mccallum/bow/rainbow/
Zeimpekis, D., Gallopoulos, E.: Design of a MATLAB toolbox for term-document matrix generation. In: Proceedings of the Workshop on Clustering High Dimensional Data, SIAM 2005 (2005) (to appear)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Antonellis, I., Bouras, C., Poulopoulos, V. (2006). Personalized News Categorization Through Scalable Text Classification. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_35
Download citation
DOI: https://doi.org/10.1007/11610113_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)