Abstract
A purpose of text-mining is to summarise a large collection of documents. This paper proposes a new method to view a summary of large document set. It consists of two techniques, one of which constructs classification trees using a split test called the standard-example (standard-document) split test, and the other is a method to display features in each class of documents classified in the trees. The standard-example split test is a test which divides examples by their distance (or similarity) from a standard-example which is selected by a criterion. This is the first method which applies this test to text mining. The display method exhibits representative words of document classes which emphasise their feature.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yamada, Y., Suzuki, E., Yokoi, H., Takabayashi, K.: Decision tree induction from time serieses data based on a standard-example split test. In: Proc. Twentieth International Conference on Machine Learning (ICML 2003), pp. 840–847 (2003)
Harman, D.: Ranking Algorithms. In: Frakes, W., Baeza-Yates, R. (eds.) Information Retrieval, Prentice-Hall, Englewood Cliffs (1992)
Sparck-Jones, K.: Index Term Weighting. Information Storage and Retrieval 9, 619–633 (1973)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Lang, K.: NewsWeeder: Learning to Filter Netnews. In: Proc. Twelfth International Conference on Machine Learning, pp. 331–339 (1995)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proc. Tenth European Conference on Machine Learning, pp. 262–271 (1998)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proc. SIGIR 1999, pp. 42–49 (1999)
Sabastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fukuoka, K., Nakano, T., Inuzuka, N. (2005). Organising Documents Based on Standard-Example Split Test. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552413_112
Download citation
DOI: https://doi.org/10.1007/11552413_112
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28894-7
Online ISBN: 978-3-540-31983-2
eBook Packages: Computer ScienceComputer Science (R0)