A Bidimensional View of Documents for Text Categorisation

Di Nunzio, Giorgio Maria

doi:10.1007/978-3-540-24752-4_9

A Bidimensional View of Documents for Text Categorisation

Giorgio Maria Di Nunzio⁶

Conference paper

781 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2997))

Abstract

The question addressed in this paper is to find a bidimensional representation of textual documents for the problem of text categorisation. The projection of documents is performed following subsequent steps. The main idea is to consider a possible double aspect of the importance of a word: the local importance in a category, and the global importance in the rest of the categories. This information is combined properly and summarized in two coordinates. Then, a machine learning method may be used in this simple bidimensional space to classify the documents. The results that can be obtained in this space are satisfactory with respect to the best state-of-the-art performances.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1, 69–90 (1999)
Article Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Gey, F., Hearst, M., Tong, R. (eds.) Proceedings of the Twenty-Second Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), Berkeley, California, US, pp. 42–49. ACM Press, New York (1999)
Chapter Google Scholar
Eyheramendy, S., Lewis, D.D., Madigan, D.: On the naive bayes model for text categorization. In: Proceedings of the Ninth International Workshop Artificial Intelligence and Statistics (AISTATS 2003), Key West, Florida, US (2003)
Google Scholar
Lewis, D.D.: Naive (bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Chapter Google Scholar
Zang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Information Retrieval 4, 5–31 (2001)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Aas, K., Eikvil, L.: Text categorisation: A survey. Technical Report NR 941, Norwegian Computing Center, Oslo (1999)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000)
Chapter Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, US, pp. 412–420. Morgan Kaufmann Publishers, San Francisco (1997)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)
Article Google Scholar
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review 37, 573–595 (1995)
Article MATH MathSciNet Google Scholar
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)
Google Scholar
Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage University Paper Series on Quantitative Applications in the Social Sciences. Sage Publications, London (1978)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 513–523 (1988)
Article Google Scholar
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: 18th ACM Symposium on Applied Computing, Melbourne, US, pp. 784–788. ACM Press, New York (2003)
Google Scholar
Di Nunzio, G.M., Micarelli, A.: Does a new gaussian weighting approach perform well in text categorization? In: Gottlob, G., Walsh, T. (eds.) Proceedings of the Eighteenth International Joint Conference of Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 581–586. Morgan Kaufmann Publishers, San Francisco (2003)
Google Scholar
Zobel, J., Moffat, A.: Exploring the similarity space. SIGIR Forum 32, 18–34 (1998)
Article Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer Series in Statistics. Springer, New York (1995)
MATH Google Scholar
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Dept. of Computer Science. University of Glasgow, Glasgow (1979)
Google Scholar
Eyheramendy, S., Gerkin, A., Ju, W.H., Lewis, D.D., Madigan, D.: Sparse bayesian classifiers for text categorization. In: JICRD (2003) (submitted to), available at www.stat.rutgers.edu/~madigan/PAPERS/jicrd-v13.pdf

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Padua,
Giorgio Maria Di Nunzio

Authors

Giorgio Maria Di Nunzio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing and Technology, David Goldman Informatics Centre, University of Sunderland, St. Peter’s Campus, SR6 0DD, Sunderland, UK
Sharon McDonald
School of Computing and Technology, University of Sunderland, St. Peter’s Campus, St. Peter’s Way, SR6 0DD, Sunderland, United Kingdom
John Tait

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Di Nunzio, G.M. (2004). A Bidimensional View of Documents for Text Categorisation. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-24752-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21382-6
Online ISBN: 978-3-540-24752-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics