Abstract
The organization of large text collections is the main goal of automated text categorization. In particular, the final aim is to classify documents into a certain number of pre-defined categories in an efficient way and with as much accuracy as possible. On-line and run-time services, such as personalization services and information filtering services, have increased the importance of effective and efficient document categorization techniques. In the last years, a wide range of supervised learning algorithms have been applied to this problem [1]. Recently, a new approach that exploits a two-dimensional summarization of the data for text classification was presented [2]. This method does not go through a selection of words phase; instead, it uses the whole dictionary to present data in intuitive way on two-dimensional graphs. Although successful in terms of classification effectiveness and efficiency (as recently showed in [3]), this method presents some unsolved key issues: the design of the training algorithm seems to be ad hoc for the Reuters-21578 collection; the evaluation has only been done only on the 10 most frequent classes of the Reuters-21578 dataset; the evaluation lacks measure of significance in most parts; the method adopted lacks a mathematical justification. We focus on the first three aspects, leaving the fourth as the future work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Di Nunzio, G.M.: A bidimensional view of documents for text categorisation. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 112–126. Springer, Heidelberg (2004)
Di Nunzio, G.M., Micarelli, A.: Pushing “underfitting” to the limit: Learning in bidimensional text categorization. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), Valencia, Spain (2004) (forthcoming)
Ross, S.: Introduction to Probability and Statistics for Engineers and Scientists. Academic Press, London (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Di Nunzio, G.M. (2004). Cross-Comparison for Two-Dimensional Text Categorization. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive