Skip to main content
Log in

Text classification based on the word subspace representation

  • Theoretical advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel framework for text classification based on subspace-based methods. Recent studies showed the advantages of modeling texts as linear subspaces in a high-dimensional word vector space, to which we refer as word subspace. Therefore, we propose solving topic classification and sentiment analysis by using the word subspace along with different subspace-based methods. We explore the word embeddings geometry to decide which subspace-based method is more suitable for each task. We empirically demonstrate that a word subspace generated from sets of texts is a unique representation of a semantic topic that can be spanned by basis vectors derived from different texts. Therefore, texts can be classified by comparing their word subspace with the topic class subspaces. We achieve this framework by using the mutual subspace method that effectively handles multiple subspaces for classification. For sentiment analysis, as word embeddings do not necessarily consider sentiment information (i.e., opposite sentiment words have similar word vectors), we introduce the orthogonal mutual subspace method, to push opposite sentiment words apart. Furthermore, as there may be overlap between the sentiment class subspaces due to overlapping topics, we propose modeling a sentiment class by a set of multiple word subspaces, generated from each text belonging to the class. We further model the sentiment classes on a Grassmann manifold by using the Grassmann subspace method and its discriminative extension, the Grassmann orthogonal subspace method. We show the validity of each framework through experiments on four widely used datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and material

All datasets used in the experiments are available to the public, with the source links adequately indicated in the manuscript.

Notes

  1. https://code.google.com/archive/p/word2vec/.

  2. http://www.cs.cornell.edu/people/pabo/movie-review-data/.

  3. https://nlp.stanford.edu/sentiment/.

  4. https://code.google.com/archive/p/word2vec/.

  5. http://nlp.stanford.edu/data/glove.42B.300d.zip.

  6. https://github.com/google-research/bert (BERT-base uncased model).

References

  1. Afriat SN (1957) Orthogonal and oblique projectors and the characteristics of pairs of vector spaces. In: Mathematical proceedings of the Cambridge philosophical society, vol 53. Cambridge Univ Press, pp 800–816

  2. Ahmed N, Natarajan T, Rao KR (1974) Discrete cosine transform. IEEE Trans Comput 100(1):90–93

    Article  MathSciNet  Google Scholar 

  3. Almarwani N, Aldarmaki H, Diab M (2019) Efficient sentence embedding using discrete cosine transform. arXiv preprint arXiv:1909.03104

  4. Arora S, Liang Y, Ma T (2019) A simple but tough-to-beat baseline for sentence embeddings. In: 5th International conference on learning representations, ICLR 2017

  5. Cardoso-Cachopo A (2007) Improving methods for single-label text categorization. Ph.D. thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa

  6. Cer D, Yang Y, Kong SY, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C et al (2018) Universal sentence encoder. arXiv preprint arXiv:1803.11175

  7. Chatelin F (2012) Eigenvalues of matrices, Revised edn. SIAM, Philadelphia

    Book  Google Scholar 

  8. Chikuse Y (2013) Statistics on special manifolds, vol 174. Lecture. Notes in Statistics. Springer

  9. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364

  10. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391

    Article  Google Scholar 

  11. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  12. Dhillon PS, Foster DP, Ungar LH (2015) Eigenwords: spectral word embeddings. J Mach Learn Res 16(1):3035–3078

    MathSciNet  MATH  Google Scholar 

  13. Fukui K, Maki A (2015) Difference subspace and its generalization for subspace-based methods. IEEE Trans Pattern Anal Mach Intell 37(11):2164–2177

    Article  Google Scholar 

  14. Fukui K, Yamaguchi O (2005) Face recognition using multi-viewpoint patterns for robot vision. In: Robotics research, the eleventh international symposium, ISRR. pp 192–201. https://doi.org/10.1007/11008941_21

  15. Fukunaga K, Koontz WL (1970) Application of the Karhunen–Loeve expansion to feature selection and ordering. IEEE Trans Comput 100(4):311–318

    Article  Google Scholar 

  16. Gatto BB, Bogdanova A, Souza LS, dos Santos EM (2017) Hankel subspace method for efficient gesture representation. In: 2017 IEEE 27th international workshop on machine learning for signal processing (MLSP). IEEE, pp 1–6

  17. Gong H, Bhat S, Viswanath P (2017) Geometry of compositionality. In: Thirty-first AAAI conference on artificial intelligence

  18. Gong H, Sakakini T, Bhat S, Xiong J (2018) Document similarity for texts of varying lengths via hidden topics. In: Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Melbourne, Australia, pp 2341–2351. https://doi.org/10.18653/v1/P18-1218

  19. Greenacre MJ (1984) Theory and applications of correspondence analysis. Academic Press, London

    MATH  Google Scholar 

  20. Hamm J, Lee DD (2008) Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 376–383

  21. Harris ZS (1954) Distributional structure. Word 10(2–3):146–162

    Article  Google Scholar 

  22. Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data. arXiv preprint arXiv:1602.03483

  23. Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3,4):321–377

    Article  Google Scholar 

  24. Jolliffe I (2006) Principal component analysis. Springer, Berlin

    MATH  Google Scholar 

  25. Kawahara T, Nishiyama M, Kozakaya T, Yamaguchi O (2007) Face recognition based on whitening transformation of distribution of subspaces. In: Proceedings of the ACCV07 workshop subspace. pp 97–103

  26. Kayal S, Tsatsaronis G (2019) Eigensent: Spectral sentence embeddings using higher-order dynamic mode decomposition. In: Proceedings of the 57th annual meeting of the association for computational linguistics. pp 4536–4546

  27. Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems. pp 3294–3302

  28. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning. pp 1188–1196

  29. Le Clainche S, Vega JM (2017) Higher order dynamic mode decomposition. SIAM J Appl Dyn Syst 16(2):882–925

    Article  MathSciNet  Google Scholar 

  30. Logeswaran L, Lee H (2018) An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893

  31. Lvd Maaten, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  32. McCallum A, Nigam K, et al (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol 752. Madison, WI, pp 41–48

  33. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  34. Mu J, Bhat S, Viswanath P (2017) Representing sentences as low-rank subspaces. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2: short papers). pp 629–634

  35. Mu J, Bhat SP, Viswanath P (2019) Geometry of polysemy. In: 5th International conference on learning representations. ICLR 2017

  36. Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for computational linguistics. Association for Computational Linguistics, p 271

  37. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp 1532–1543

  38. Perone CS, Silveira R, Paula TS (2018) Evaluation of sentence embeddings in downstream and linguistic probing tasks. arXiv preprint arXiv:1806.06259

  39. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of NAACL

  40. Raunak V, Gupta V, Metze F (2019) Effective dimensionality reduction for word embeddings. In: Proceedings of the 4th workshop on representation learning for NLP (RepL4NLP-2019). pp 235–243

  41. Rücklé A, Eger S, Peyrard M, Gurevych I (2018) Concatenated power mean word embeddings as universal cross-lingual sentence representations. arXiv preprint arXiv:1803.01400

  42. Shimomoto EK, Souza LS, Gatto BB, Fukui K (2018) Text classification based on word subspace with term-frequency. In: 2018 International joint conference on neural networks (IJCNN). IEEE, pp 1–8

  43. Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing. pp 1631–1642

  44. Suryanto CH, Xue JH, Fukui K (2016) Randomized time warping for motion recognition. Image Vis Comput 54:1–11

    Article  Google Scholar 

  45. Yaghoobzadeh Y, Schütze H (2016) Intrinsic subspace evaluation of word embedding representations. arXiv preprint arXiv:1606.07902

  46. Yang Z, Zhu C, Chen W (2019) Parameter-free sentence embedding via orthogonal basis. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). pp 638–648

  47. Zhang H, Wang S, Zhao M, Xu X, Ye Y (2018) Locality reconstruction models for book representation. IEEE Trans Knowl Data Eng 30(10):1873–1886

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors of this research paper have directly participated in the planning, execution, and analysis of this study. All authors of this paper have read and approved the final version submitted.

Corresponding author

Correspondence to Erica K. Shimomoto.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shimomoto, E.K., Portet, F. & Fukui, K. Text classification based on the word subspace representation. Pattern Anal Applic 24, 1075–1093 (2021). https://doi.org/10.1007/s10044-021-00960-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-021-00960-6

Keywords

Navigation