skip to main content
10.1145/2505515.2505638acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Short text classification by detecting information path

Published: 27 October 2013 Publication History

Abstract

Short text is becoming ubiquitous in many modern information systems. Due to the shortness and sparseness of short texts, there are less informative word co-occurrences among them, which naturally pose great difficulty for classification tasks on such data. To overcome this difficulty, this paper proposes a new way for effectively classifying the short texts. Our method is based on a key observation that there usually exists ordered subsets in short texts, which is termed ``information path'' in this work, and classification on each subset based on the classification results of some pervious subsets can yield higher overall accuracy than classifying the entire data set directly. We propose a method to detect the information path and employ it in short text classification. Different from the state-of-art methods, our method does not require any external knowledge or corpus that usually need careful fine-tuning, which makes our method easier and more robust on different data sets. Experiments on two real world data sets show the effectiveness of the proposed method and its superiority over the existing methods.

References

[1]
Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In Proc. ICML, 2009.
[2]
D. Bollegala, Y. Matsuo, and M. Ishizuka. Measuring semantic similarity between words using web search engines. In Proc. WWW, 2007.
[3]
M. Chen, X. Jin, and D. Shen. Short text classification improved by learning multi-granularity topics. In Proc. IJCAI, 2011.
[4]
X. Hu, X. Zhang, C. Lu, E. Park, and X. Zhou. Exploiting wikipedia as external knowledge for document clustering. In Proc. KDD, 2009.
[5]
O. Jin, N. Liu, K. Zhao, Y. Yu, and Q. Yang. Transferring topical knowledge from auxiliary long texts for short text clustering. In Proc. CIKM, 2011.
[6]
L. Li, X. Jin, and M. Long. Topic correlation analysis for cross-domain text classification. In Proc. AAAI, 2012.
[7]
G. Long, L. Chen, X. Zhu, and C. Zhang. Tcsst: transfer classification of short & sparse text using external data. In Proc. CIKM, 2012.
[8]
K. Nigam. Using unlabeled data to improve text classification. PhD thesis, Carnegie Mellon University, 2001.
[9]
S. Pan and Q. Yang. A survey on transfer learning. TKDE, 22(10):1345--1359, 2010.
[10]
X. Phan, L. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proc. WWW, 2008.
[11]
X. Quan, G. Liu, Z. Lu, X. Ni, and L. Wenyin. Short text similarity based on probabilistic topics. KAIS, 25(3):473--491, 2010.
[12]
M. Sahami and T. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In Proc. WWW, 2006.
[13]
D. Shen, R. Pan, J. Sun, J. Pan, K. Wu, J. Yin, and Q. Yang. Query enrichment for web-query classification. TOIS, 24(3):320--352, 2006.
[14]
B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas. Short text classification in twitter to improve information filtering. In Proc. SIGIR, 2010.
[15]
A. Sun. Short text classification using very few words. In Proc. SIGIR, 2012.
[16]
D. Vitale, P. Ferragina, and U. Scaiella. Classification of short texts by deploying topical annotations. In Proc. ECIR, 2012.
[17]
Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In Proc. ICML, 1997.
[18]
W. Yih and C. Meek. Improving similarity measures for short segments of text. In Proc. AAAI, 2007.
[19]
X. Zhu. Semi-supervised learning literature survey. Computer Sciences, 2005.

Cited By

View all
  • (2023)Detection and Classification of Cognitive Distortions: A Literature Review2023 International Conference on Smart-Green Technology in Electrical and Information Systems (ICSGTEIS)10.1109/ICSGTEIS60500.2023.10424225(166-171)Online publication date: 2-Nov-2023
  • (2021)On entropy-based term weighting schemes for text categorizationKnowledge and Information Systems10.1007/s10115-021-01581-563:9(2313-2346)Online publication date: 1-Sep-2021
  • (2019)Identification and classification of best spreader in the domain of interest over the social networksCluster Computing10.1007/s10586-018-2616-y22:2(4035-4045)Online publication date: 1-Mar-2019
  • Show More Cited By

Index Terms

  1. Short text classification by detecting information path

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
    October 2013
    2612 pages
    ISBN:9781450322638
    DOI:10.1145/2505515
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. anchor shackle term
    2. information path
    3. short text classification

    Qualifiers

    • Research-article

    Conference

    CIKM'13
    Sponsor:
    CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
    October 27 - November 1, 2013
    California, San Francisco, USA

    Acceptance Rates

    CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Detection and Classification of Cognitive Distortions: A Literature Review2023 International Conference on Smart-Green Technology in Electrical and Information Systems (ICSGTEIS)10.1109/ICSGTEIS60500.2023.10424225(166-171)Online publication date: 2-Nov-2023
    • (2021)On entropy-based term weighting schemes for text categorizationKnowledge and Information Systems10.1007/s10115-021-01581-563:9(2313-2346)Online publication date: 1-Sep-2021
    • (2019)Identification and classification of best spreader in the domain of interest over the social networksCluster Computing10.1007/s10586-018-2616-y22:2(4035-4045)Online publication date: 1-Mar-2019
    • (2017)Data Transfer and Extension for Mining Big Meteorological DataIntelligent Computing Theories and Application10.1007/978-3-319-63309-1_6(57-66)Online publication date: 20-Jul-2017
    • (2017)Title Categorization Based on Category GranularityHuman Language Technology. Challenges for Computer Science and Linguistics10.1007/978-3-030-66527-2_25(329-340)Online publication date: 17-Nov-2017
    • (2015)Comparing Tweet Classifications by Authors' Hashtags, Machine Learning, and Human AnnotatorsProceedings of the 2015 IEEE / WIC / ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) - Volume 0110.1109/WI-IAT.2015.69(67-74)Online publication date: 6-Dec-2015
    • (2014)EgoCentricProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661990(1079-1088)Online publication date: 3-Nov-2014
    • (2014)Hierarchical multi-label classification of social text streamsProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609595(213-222)Online publication date: 3-Jul-2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media