poster

Timeline adaptation for text classification

Authors:
Fumiyo Fukumoto

Univ. of Yamanashi, Kofu, Japan

Univ. of Yamanashi, Kofu, Japan
View Profile

,
Yoshimi Suzuki

Univ. of Yamanashi, Kofu, Japan

Univ. of Yamanashi, Kofu, Japan
View Profile

,
Atsuhiro Takasu

National Institute of Informatics, Tokyo, Japan

National Institute of Informatics, Tokyo, Japan
View Profile

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementOctober 2013Pages 1517–1520https://doi.org/10.1145/2505515.2507833

Published:27 October 2013Publication History

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Pages 1517–1520

ABSTRACT

In this paper, we address the text classification problem that a period of time created test data is different from the training data, and present a method for text classification based on temporal adaptation. We first applied lexical chains for the training data to collect terms with semantic relatedness, and created sets (we call these Sem sets). Semantically related terms in the documents are replaced to their representative term. For the results, we identified short terms that are salient for a specific period of time. Finally, we trained SVM classifiers by applying a temporal weighting function to each selected short terms within the training data, and classified test data. Temporal weighting function is weighted each short term in the training data according to the temporal distance between training and test data. The results using MedLine data showed that the method was comparable to the current state-of-the-art biased-SVM method, especially the method is effective when testing on data far from the training data.

References

R. Barzilay and M. Elhadad. Using Lexical Chain for Text Summarization. In Proc. of the ACL Workshop in Intelligent Scalable Text Summarization, pages 10--17, 1997.Google Scholar
C. Elkan and K. Noto. Learning Classifiers from Only Positive and Unlabeled Data. In Proc. of the KDD'08, pages 213--220, 2008. Google ScholarDigital Library
D. He and D. S. Parker. Topic Dynamics: An Alternative Model of Bursts in Streams of Topics. In Proc. of the 16th ACM SIGKDD, pages 443--452, 2010. Google ScholarDigital Library
T. Joachims. SVM Light Support Vector Machine. In Dept. of Computer Science Cornell University, 1998.Google Scholar
R. Klinkenberg and T. Joachims. Detecting Concept Drift with Support Vector Machines. In Proc. of the 17th ICML, pages 487--494, 2000. Google ScholarDigital Library
J. Morris and H. Hirst. Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text. Computational Linguistics, 17(1):21--43, 1991. Google ScholarDigital Library
F. Mourão, L. Rocha, R. Araujo, T. Couto, M. Gonçalves, and W. M. Jr. Understanding Temporal Aspects in Document Classification. In Proc. of the 1st ACM WSDM, pages 159--169, 2008. Google ScholarDigital Library
L. Rocha, F. Mourão, A. Pereira, M. A. Gonçalves, and W. M. Jr. Exploiting Temporal Contexts in Text Classification. In Proc. of the 17th ACM CIKM, pages 26--30, 2008. Google ScholarDigital Library
G. J. Ross, N. M. Adams, D. K. Tasoulis, and D. J. Hand. Exponentially Weighted Moving Average Charts for Detecting Concept Drift. Pattern Recognition Letters, 33(2012):191--198, 2012. Google ScholarDigital Library
T. Salles, L. Rocha, G. L. Pappa, F. Mourao, W. M. Jr., and M. Goncalves. Temporally-aware Algorithms for Document Classification. In Proc. of the ACM SIGIR 2010, pages 307--314, 2010. Google ScholarDigital Library
H. Schmid. Improvements in Part-of-Speech Tagging with an Application to German. In Proc. of the EACL SIGDAT Workshop, pages 47--50, 1995.Google Scholar

Index Terms

Timeline adaptation for text classification
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Improving Text Classification Accuracy by Training Label Cleaning

In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain. Semisupervised learning and active learning are two strategies whose aim is maximizing the effectiveness of the resulting ...
Read More
Text Classification from Labeled and Unlabeled Documents using EM
Special issue on information retrieval

This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining ...
Read More
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
General Chairs:
Qi He
LinkedIn, USA
,
Arun Iyengar
IBM T.J. Watson Research Center, USA
,
Program Chairs:
Wolfgang Nejdl
L3S Research Center, Germany
,
Jian Pei
Simon Fraser University, Canada
,
Rajeev Rastogi
Amazon, India
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
temporal analysis
text classification
Qualifiers
- poster
Conference

Acceptance Rates
CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 195
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Timeline adaptation for text classification

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving Text Classification Accuracy by Training Label Cleaning

Text Classification from Labeled and Unlabeled Documents using EM

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values