Organising Documents Based on Standard-Example Split Test

Fukuoka, Kenta; Nakano, Tomofumi; Inuzuka, Nobuhiro

doi:10.1007/11552413_112

Kenta Fukuoka²¹,
Tomofumi Nakano²¹ &
Nobuhiro Inuzuka²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3681))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1525 Accesses

Abstract

A purpose of text-mining is to summarise a large collection of documents. This paper proposes a new method to view a summary of large document set. It consists of two techniques, one of which constructs classification trees using a split test called the standard-example (standard-document) split test, and the other is a method to display features in each class of documents classified in the trees. The standard-example split test is a test which divides examples by their distance (or similarity) from a standard-example which is selected by a criterion. This is the first method which applies this test to text mining. The display method exhibits representative words of document classes which emphasise their feature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

AutoOverview: A Framework for Generating Structured Overviews over Many Documents

Data summarization: a survey

Article 21 March 2018

Recent advances in document summarization

Article 28 March 2017

References

Yamada, Y., Suzuki, E., Yokoi, H., Takabayashi, K.: Decision tree induction from time serieses data based on a standard-example split test. In: Proc. Twentieth International Conference on Machine Learning (ICML 2003), pp. 840–847 (2003)
Google Scholar
Harman, D.: Ranking Algorithms. In: Frakes, W., Baeza-Yates, R. (eds.) Information Retrieval, Prentice-Hall, Englewood Cliffs (1992)
Google Scholar
Sparck-Jones, K.: Index Term Weighting. Information Storage and Retrieval 9, 619–633 (1973)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Lang, K.: NewsWeeder: Learning to Filter Netnews. In: Proc. Twelfth International Conference on Machine Learning, pp. 331–339 (1995)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proc. Tenth European Conference on Machine Learning, pp. 262–271 (1998)
Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proc. SIGIR 1999, pp. 42–49 (1999)
Google Scholar
Sabastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Engineering, Nagoya Institute of Technology, Gokiso-cho Showa, Nagoya, 466-8555, Japan
Kenta Fukuoka, Tomofumi Nakano & Nobuhiro Inuzuka

Authors

Kenta Fukuoka
View author publications
You can also search for this author in PubMed Google Scholar
Tomofumi Nakano
View author publications
You can also search for this author in PubMed Google Scholar
Nobuhiro Inuzuka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Business, La Trobe University, 3086, Melbourne, Victoria, Australia
Rajiv Khosla
Centre for SMART systems Engineering Research Centre, University of Brighton, Moulsecoomb, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, 5095, Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fukuoka, K., Nakano, T., Inuzuka, N. (2005). Organising Documents Based on Standard-Example Split Test. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552413_112

Download citation

DOI: https://doi.org/10.1007/11552413_112
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28894-7
Online ISBN: 978-3-540-31983-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics