Abstract
In every text, words have various frequencies and keywords have strong relationship with the subjects of their texts. Word frequencies change due to time-series variation over given periods of time. An early method estimated stability classes that indicate word popularity due to time-series variation based on frequency changes in text data over given periods using a decision tree. The estimation precision of the decision tree decreases when there is scattering of data number among classes. This paper suggests a new way to use a Random Sampling Method and proposes a new Data Copying Method to improve the estimation precision of decision tree. By using this new Data Copying Method, F-measures have improved: Increasing Class 9%; Relatively Constant Class 9%; Decreasing Class 18%.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atlam, E.-S., Makoto, O., Masami, S., Aoe, J.: An Evaluation Method of Words Tendency Depending on Time–SeriesVariation and its Improvements. Information Processing & Management 8(2), 157–171 (2001)
Fukumoto, F., Suzuki, Y., Fukumoto, J.I.: An Automatic Clustering of Articles Using Dictionary Definitions. Trans. Of Information Processing Society of Japan 37(10), 1789–1799 (1996)
Hara, M., Nakajima, H., Kitani, T.: Keyword Extraction Using Text Format and Word Importance in Specific Field. Trans. Of Information Processing Society of Japan 38(2), 299–309 (1997)
Haruo, K.: Automatic Indexing and Evaluation of Keywords for Japanese Newspaper. Trans. of the Institute of Electronics, Information and Communication Engineering (IEIC) J74-D-I (8), 556–566 (1991)
Hisano, H.: Page-Type and Time-Series Variations of a Newspaper’s Character Occurrence Rate. Journal of Natural Language Processing 7(2), 45–61 (2000)
Honda, T., Mochizuki, H., Ho, T.B., Okumura, M.: Generating Decision Trees from an Unbalanced Data Set. In: Proceeding of the 9th European Conference on Machine Learning (1997)
Liman, J.: Cue Phrase Classification Using Machine Learning. Journal of Artificial Intelligence Research 5, 53–94 (1996)
Ohkubo, M., Sugizaki, M., Inoue, T., Tanaka, K.: Extracting Information Demand by Analyzing a WWW Search Login. Trans. of Information Processing Society of Japan 39(7), 2250–2258 (1998)
Okumura, M., Haraguchi, Y., Mochizuki, H.: Some Observation on Automatic Text Summarization Based on Decision Tree Learning. Journal of Information Processing Society of Japan 5N-2, 71–72 (1999)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Salton, G., McGill, M.J.: Introduction of Modern Information Retrieval. McGraw-Hill, New York (1983)
Swerts, M., Ostendorf, M.: Discourse Prosody in Human-Machine Interaction. In: European Speech Communication Association (ESCA) Workshop on spoken Dialogue Systems, pp. 205–208 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Atlam, ES., Ghada, E., Fuketa, M., Morita, K., Aoe, Ji. (2005). An Improvement Approach for Word Tendency Using Decision Tree. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11554028_84
Download citation
DOI: https://doi.org/10.1007/11554028_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28897-8
Online ISBN: 978-3-540-31997-9
eBook Packages: Computer ScienceComputer Science (R0)