Skip to main content

Organising Documents Based on Standard-Example Split Test

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2005)

Abstract

A purpose of text-mining is to summarise a large collection of documents. This paper proposes a new method to view a summary of large document set. It consists of two techniques, one of which constructs classification trees using a split test called the standard-example (standard-document) split test, and the other is a method to display features in each class of documents classified in the trees. The standard-example split test is a test which divides examples by their distance (or similarity) from a standard-example which is selected by a criterion. This is the first method which applies this test to text mining. The display method exhibits representative words of document classes which emphasise their feature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Yamada, Y., Suzuki, E., Yokoi, H., Takabayashi, K.: Decision tree induction from time serieses data based on a standard-example split test. In: Proc. Twentieth International Conference on Machine Learning (ICML 2003), pp. 840–847 (2003)

    Google Scholar 

  2. Harman, D.: Ranking Algorithms. In: Frakes, W., Baeza-Yates, R. (eds.) Information Retrieval, Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  3. Sparck-Jones, K.: Index Term Weighting. Information Storage and Retrieval 9, 619–633 (1973)

    Article  Google Scholar 

  4. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  5. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  6. Lang, K.: NewsWeeder: Learning to Filter Netnews. In: Proc. Twelfth International Conference on Machine Learning, pp. 331–339 (1995)

    Google Scholar 

  7. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proc. Tenth European Conference on Machine Learning, pp. 262–271 (1998)

    Google Scholar 

  8. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proc. SIGIR 1999, pp. 42–49 (1999)

    Google Scholar 

  9. Sabastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fukuoka, K., Nakano, T., Inuzuka, N. (2005). Organising Documents Based on Standard-Example Split Test. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552413_112

Download citation

  • DOI: https://doi.org/10.1007/11552413_112

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28894-7

  • Online ISBN: 978-3-540-31983-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics