Skip to main content

A Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation

  • Conference paper
  • 867 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5255))

Abstract

The need for non-standard text categorisation, i.e. based on some subtle criterion other than topics, may arise in various circumstances. In this study, we consider written responses to a standardised psychometric test for determining the personality trait of human subjects. A number of state-of-the-art text classifiers that have been very successful in standard topic-based classification problems turn out to perform poorly in this task. Here we propose a very simple probabilistic approach, which is able to achieve accurate predictions, and demonstrates this peculiar problem is still solvable by simple statistical text representation means. We then extend this approach to include a latent variable, in order to obtain additional explanatory information beyond a black-box prediction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Colas, F., Brazdil, P.: Comparison of SVM and Some Other Classification Algorithms in Text Classification Tasks. Artificial Intelligence in Theory and Practice 217, 169–178 (2006)

    Article  Google Scholar 

  2. Madsen, R.E., Kauchak, D., Elkan, C.: Modeling Word Burstiness Using the Dirichlet Distribution. In: Proceedings of the Twenty-Second International Conference on Machine Learning (2005)

    Google Scholar 

  3. Eyheramendy, S., Genkin, A., Ju, W.-H., Lewis, D.D., Madigan, D.: Sparse Bayesian Classifiers for Text Categorization. Technical Report, Department of Statistics, Rutgers University (2003)

    Google Scholar 

  4. Fawcett, T.: ROC graphs: Notes and practical considerations for researchers, Technical report, HP Laboratories, MS 1143, 1501 Page Mill Road, Palo Alto CA 94304, USA (April 2004)

    Google Scholar 

  5. Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood Component Analysis. In: Neural Information Processing Systems (NIPS 2004) 17, pp. 513–520 (2004)

    Google Scholar 

  6. Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI 1999) (1999)

    Google Scholar 

  7. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proceedings of the European Conference on Machine Learning (1998)

    Google Scholar 

  8. McCallum, A.K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), www.cs.cmu.edu/~mccallum/bow

  9. Mitchell, T.: Machine Learning, ch. 6. McGraw Hill, New York (1997)

    Google Scholar 

  10. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)

    Article  MATH  Google Scholar 

  11. Saul, L., Pereira, F.: Aggregate Markov Models for statistical language processing. In: Proc. of the Second Conference on Empirical Methods in Natural Language Processing, pp. 81–89 (1997)

    Google Scholar 

  12. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  13. Shevade, S.K., Keerthi, S.S.: A Simple and Efficient Algorithm for Gene Selection using Sparse Logistic Regression, Technical Report No. CD-02-22, Control Division, Department of Mechanical Engineering, National University of Singapore, Singapore - 117 576 (2002)

    Google Scholar 

  14. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1(1/2), 69–90 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Berlin Heidelberg

About this paper

Cite this paper

Kabán, A. (2008). A Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation. In: Jean-Fran, JF., Berthold, M.R., Horváth, T. (eds) Discovery Science. DS 2008. Lecture Notes in Computer Science(), vol 5255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88411-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88411-8_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88410-1

  • Online ISBN: 978-3-540-88411-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics