Self-taught learning via exponential family sparse coding for cost-effective patient thought record categorization

Wang, Hua; Huang, Heng; Basco, Monica; Lopez, Molly; Makedon, Fillia

doi:10.1007/s00779-012-0614-2

Self-taught learning via exponential family sparse coding for cost-effective patient thought record categorization

Original Article
Published: 30 October 2012

Volume 18, pages 27–35, (2014)
Cite this article

Personal and Ubiquitous Computing Aims and scope Submit manuscript

Hua Wang¹,
Heng Huang²,
Monica Basco³,
Molly Lopez⁴ &
…
Fillia Makedon²

1455 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Automatic patient thought record categorization (TR) is important in cognitive behavior therapy, which is an useful augmentation of standard clinic treatment for major depressive disorder. Because both collecting and labeling TR data are expensive, it is usually cost prohibitive to require a large amount of TR data, as well as their corresponding category labels, to train a classification model with high classification accuracy. Because in practice we only have very limited amount of labeled and unlabeled training TR data, traditional semi-supervised learning methods and transfer learning methods, which are the most commonly used strategies to deal with the lack of training data in statistical learning, cannot work well in the task of automatic TR categorization. To address this challenge, we propose to tackle the TR categorization problem from a new perspective via self-taught learning, an emerging technique in machine learning. Self-taught learning is a special type of transfer learning. Instead of requiring labeled data from an auxiliary domain that are relevant to the classification task of interest as in traditional transfer learning methods, it learns the inherent structures of the auxiliary data and does not require their labels. As a result, a classifier achieves decent classification accuracy using the limited amount of labeled TR texts, with the assistance from the large amount of text data obtained from some inexpensive, or even no-cost, resources. That is, a cost-effective TR categorization system can be built that may be particularly useful for diagnosis of patients and training of new therapists. By further taking into account the discrete nature input text data, instead of using the traditional Gaussian sparse coding in self-taught learning, we use exponential family sparse coding to better simulate the distribution of the input data. We apply the proposed method to the task of classifying patient homework texts. Experimental results show the effectiveness of the proposed automatic TR classification framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action recommendations review in community-based therapy and depression and anxiety outcomes: a machine learning approach

Article Open access 16 February 2024

Identifying and Training Skill Acquisition Protocol Development: A Preliminary Investigation

Article 13 July 2022

How to Use This Book

Notes

http://www.textfixer.com/resources/common-english-words.txt.
http://www.eecs.umich.edu/honglak/softwares/nips06-sparsecoding.html.
Here we drop the superscript “l” and “u” for brevity, as there is no difference between auxiliary data and target data when discussing sparse coding.

References

Arbib M (2003) The handbook of brain theory and neural networks. The MIT Press, Cambridge
MATH Google Scholar
Bishop C, service SO (2006) Pattern recognition and machine learning, vol 4. Springer, New York
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines: and other kernel-based learning methods. Cambridge University Press, Cambridge
Book Google Scholar
Dai W, Yang Q, Xue G, Yu Y (2008) Self-taught clustering. In: ICML
Duan K, Keerthi S (2005) Which is the best multiclass SVM method? An empirical study. Multiple Classif Syst 3541:278–285
Article Google Scholar
Goodman J (2004) Exponential priors for maximum entropy models. In: Proceedings of the HLT-NAACL, pp 305–312
Hand D, Yu K (2001) Idiot’s BayesNot So Stupid After All? Int Stat Rev 69(3):385–398
MATH Google Scholar
Hilbe J (2009) Logistic regression models. CRC Press, New York
MATH Google Scholar
Hsu C, Lin C (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Article Google Scholar
Joachims T (1999) Transductive inference for text classification using support vector machines. In: International conference on machine learning, pp 200–209
Kressel U (1999) Pairwise classification and support vector machines. Advances in kernel methods: support vector learning, pp 255–268
Lee H, Battle A, Raina R, Ng A (2007) Efficient sparse coding algorithms. In: NIPS
Lee H, Raina R, Teichman A, Ng A (2009) Exponential family sparse coding with applications to self-taught learning. IJCAI09, pp 1113–1119
Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient ℓ_2,1-norm minimization. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, AUAI Press, Arlington, pp 339–348
Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2009) Supervised dictionary learning. In: NIPS, pp 1033–1040
Manning C, Raghavan P, Schütze H, Corporation E (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Book Google Scholar
Pan S, Yang Q (2009) A survey on transfer learning. IEEE TKDE
Raina R (2009) Self-taught learning. PhD thesis of Stanford University
Raina R, Battle A, Lee H, Packer B, Ng A (2007) Self-taught learning: transfer learning from unlabeled data. In: ICML
Seeger M (2008) Bayesian inference and optimal design for the sparse linear model. The Journal of Machine Learning Research 9:759–813
MATH MathSciNet Google Scholar
Sumathi S, Sivanandam S (2006) Introduction to data mining and its applications. Springer, New York
Book MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288
Google Scholar
Ueda N, Saito K (2002) Single-shot detection of multiple categories of text using parametric mixture models. In: Proceedings of SIGKDD, pp 626–631
Wang H, Huang H, Basco M, Lopez M, Makedon F (2011) Cost effective depression patient thought record categorization via self-taught learning. In: Proceedings of the 4th international conference on pervasive technologies related to assistive environments (PETRA 2011), p 41. ACM, New York
Wang H, Huang H, Nie F, Ding C (2011) Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization. In: Proceedings of the 34th international ACM SIGIR conference on research and development in Information, pp 933–942. ACM, New York
Wang H, Nie F, Huang H, Ding C (2011) Dyadic transfer learning for cross-domain image classification. In: IEEE international conference on computer vision (ICCV), pp 551–556
Zhu X (2006) Semi-supervised learning literature survey. Technical report, University of Wisconsin-Madison

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Colorado School of Mines, Golden, CO, 80401, USA
Hua Wang
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, 76019, USA
Heng Huang & Fillia Makedon
Department of Psychology, University of Texas at Arlington, Arlington, TX, 76019, USA
Monica Basco
School of Social Work, University of Texas, Austin, TX, 78712, USA
Molly Lopez

Authors

Hua Wang
View author publications
You can also search for this author inPubMed Google Scholar
Heng Huang
View author publications
You can also search for this author inPubMed Google Scholar
Monica Basco
View author publications
You can also search for this author inPubMed Google Scholar
Molly Lopez
View author publications
You can also search for this author inPubMed Google Scholar
Fillia Makedon
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Heng Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Huang, H., Basco, M. et al. Self-taught learning via exponential family sparse coding for cost-effective patient thought record categorization. Pers Ubiquit Comput 18, 27–35 (2014). https://doi.org/10.1007/s00779-012-0614-2

Download citation

Received: 30 September 2011
Accepted: 21 August 2012
Published: 30 October 2012
Issue Date: January 2014
DOI: https://doi.org/10.1007/s00779-012-0614-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-taught learning via exponential family sparse coding for cost-effective patient thought record categorization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Action recommendations review in community-based therapy and depression and anxiety outcomes: a machine learning approach

Identifying and Training Skill Acquisition Protocol Development: A Preliminary Investigation

How to Use This Book

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now