ABSTRACT
In this paper, we address the text classification problem that a period of time created test data is different from the training data, and present a method for text classification based on temporal adaptation. We first applied lexical chains for the training data to collect terms with semantic relatedness, and created sets (we call these Sem sets). Semantically related terms in the documents are replaced to their representative term. For the results, we identified short terms that are salient for a specific period of time. Finally, we trained SVM classifiers by applying a temporal weighting function to each selected short terms within the training data, and classified test data. Temporal weighting function is weighted each short term in the training data according to the temporal distance between training and test data. The results using MedLine data showed that the method was comparable to the current state-of-the-art biased-SVM method, especially the method is effective when testing on data far from the training data.
- R. Barzilay and M. Elhadad. Using Lexical Chain for Text Summarization. In Proc. of the ACL Workshop in Intelligent Scalable Text Summarization, pages 10--17, 1997.Google Scholar
- C. Elkan and K. Noto. Learning Classifiers from Only Positive and Unlabeled Data. In Proc. of the KDD'08, pages 213--220, 2008. Google ScholarDigital Library
- D. He and D. S. Parker. Topic Dynamics: An Alternative Model of Bursts in Streams of Topics. In Proc. of the 16th ACM SIGKDD, pages 443--452, 2010. Google ScholarDigital Library
- T. Joachims. SVM Light Support Vector Machine. In Dept. of Computer Science Cornell University, 1998.Google Scholar
- R. Klinkenberg and T. Joachims. Detecting Concept Drift with Support Vector Machines. In Proc. of the 17th ICML, pages 487--494, 2000. Google ScholarDigital Library
- J. Morris and H. Hirst. Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text. Computational Linguistics, 17(1):21--43, 1991. Google ScholarDigital Library
- F. Mourão, L. Rocha, R. Araujo, T. Couto, M. Gonçalves, and W. M. Jr. Understanding Temporal Aspects in Document Classification. In Proc. of the 1st ACM WSDM, pages 159--169, 2008. Google ScholarDigital Library
- L. Rocha, F. Mourão, A. Pereira, M. A. Gonçalves, and W. M. Jr. Exploiting Temporal Contexts in Text Classification. In Proc. of the 17th ACM CIKM, pages 26--30, 2008. Google ScholarDigital Library
- G. J. Ross, N. M. Adams, D. K. Tasoulis, and D. J. Hand. Exponentially Weighted Moving Average Charts for Detecting Concept Drift. Pattern Recognition Letters, 33(2012):191--198, 2012. Google ScholarDigital Library
- T. Salles, L. Rocha, G. L. Pappa, F. Mourao, W. M. Jr., and M. Goncalves. Temporally-aware Algorithms for Document Classification. In Proc. of the ACM SIGIR 2010, pages 307--314, 2010. Google ScholarDigital Library
- H. Schmid. Improvements in Part-of-Speech Tagging with an Application to German. In Proc. of the EACL SIGDAT Workshop, pages 47--50, 1995.Google Scholar
Index Terms
- Timeline adaptation for text classification
Recommendations
Improving Text Classification Accuracy by Training Label Cleaning
In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain. Semisupervised learning and active learning are two strategies whose aim is maximizing the effectiveness of the resulting ...
Text Classification from Labeled and Unlabeled Documents using EM
Special issue on information retrievalThis paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining ...
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Comments