Abstract
The question addressed in this paper is to find a bidimensional representation of textual documents for the problem of text categorisation. The projection of documents is performed following subsequent steps. The main idea is to consider a possible double aspect of the importance of a word: the local importance in a category, and the global importance in the rest of the categories. This information is combined properly and summarized in two coordinates. Then, a machine learning method may be used in this simple bidimensional space to classify the documents. The results that can be obtained in this space are satisfactory with respect to the best state-of-the-art performances.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1, 69–90 (1999)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Gey, F., Hearst, M., Tong, R. (eds.) Proceedings of the Twenty-Second Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), Berkeley, California, US, pp. 42–49. ACM Press, New York (1999)
Eyheramendy, S., Lewis, D.D., Madigan, D.: On the naive bayes model for text categorization. In: Proceedings of the Ninth International Workshop Artificial Intelligence and Statistics (AISTATS 2003), Key West, Florida, US (2003)
Lewis, D.D.: Naive (bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Zang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Information Retrieval 4, 5–31 (2001)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Aas, K., Eikvil, L.: Text categorisation: A survey. Technical Report NR 941, Norwegian Computing Center, Oslo (1999)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, US, pp. 412–420. Morgan Kaufmann Publishers, San Francisco (1997)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review 37, 573–595 (1995)
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)
Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage University Paper Series on Quantitative Applications in the Social Sciences. Sage Publications, London (1978)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 513–523 (1988)
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: 18th ACM Symposium on Applied Computing, Melbourne, US, pp. 784–788. ACM Press, New York (2003)
Di Nunzio, G.M., Micarelli, A.: Does a new gaussian weighting approach perform well in text categorization? In: Gottlob, G., Walsh, T. (eds.) Proceedings of the Eighteenth International Joint Conference of Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 581–586. Morgan Kaufmann Publishers, San Francisco (2003)
Zobel, J., Moffat, A.: Exploring the similarity space. SIGIR Forum 32, 18–34 (1998)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer Series in Statistics. Springer, New York (1995)
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Dept. of Computer Science. University of Glasgow, Glasgow (1979)
Eyheramendy, S., Gerkin, A., Ju, W.H., Lewis, D.D., Madigan, D.: Sparse bayesian classifiers for text categorization. In: JICRD (2003) (submitted to), available at www.stat.rutgers.edu/~madigan/PAPERS/jicrd-v13.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Di Nunzio, G.M. (2004). A Bidimensional View of Documents for Text Categorisation. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-24752-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21382-6
Online ISBN: 978-3-540-24752-4
eBook Packages: Springer Book Archive