Abstract
In this paper, we propose a novel framework for text classification based on subspace-based methods. Recent studies showed the advantages of modeling texts as linear subspaces in a high-dimensional word vector space, to which we refer as word subspace. Therefore, we propose solving topic classification and sentiment analysis by using the word subspace along with different subspace-based methods. We explore the word embeddings geometry to decide which subspace-based method is more suitable for each task. We empirically demonstrate that a word subspace generated from sets of texts is a unique representation of a semantic topic that can be spanned by basis vectors derived from different texts. Therefore, texts can be classified by comparing their word subspace with the topic class subspaces. We achieve this framework by using the mutual subspace method that effectively handles multiple subspaces for classification. For sentiment analysis, as word embeddings do not necessarily consider sentiment information (i.e., opposite sentiment words have similar word vectors), we introduce the orthogonal mutual subspace method, to push opposite sentiment words apart. Furthermore, as there may be overlap between the sentiment class subspaces due to overlapping topics, we propose modeling a sentiment class by a set of multiple word subspaces, generated from each text belonging to the class. We further model the sentiment classes on a Grassmann manifold by using the Grassmann subspace method and its discriminative extension, the Grassmann orthogonal subspace method. We show the validity of each framework through experiments on four widely used datasets.
Similar content being viewed by others
Availability of data and material
All datasets used in the experiments are available to the public, with the source links adequately indicated in the manuscript.
Notes
References
Afriat SN (1957) Orthogonal and oblique projectors and the characteristics of pairs of vector spaces. In: Mathematical proceedings of the Cambridge philosophical society, vol 53. Cambridge Univ Press, pp 800–816
Ahmed N, Natarajan T, Rao KR (1974) Discrete cosine transform. IEEE Trans Comput 100(1):90–93
Almarwani N, Aldarmaki H, Diab M (2019) Efficient sentence embedding using discrete cosine transform. arXiv preprint arXiv:1909.03104
Arora S, Liang Y, Ma T (2019) A simple but tough-to-beat baseline for sentence embeddings. In: 5th International conference on learning representations, ICLR 2017
Cardoso-Cachopo A (2007) Improving methods for single-label text categorization. Ph.D. thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa
Cer D, Yang Y, Kong SY, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C et al (2018) Universal sentence encoder. arXiv preprint arXiv:1803.11175
Chatelin F (2012) Eigenvalues of matrices, Revised edn. SIAM, Philadelphia
Chikuse Y (2013) Statistics on special manifolds, vol 174. Lecture. Notes in Statistics. Springer
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Dhillon PS, Foster DP, Ungar LH (2015) Eigenwords: spectral word embeddings. J Mach Learn Res 16(1):3035–3078
Fukui K, Maki A (2015) Difference subspace and its generalization for subspace-based methods. IEEE Trans Pattern Anal Mach Intell 37(11):2164–2177
Fukui K, Yamaguchi O (2005) Face recognition using multi-viewpoint patterns for robot vision. In: Robotics research, the eleventh international symposium, ISRR. pp 192–201. https://doi.org/10.1007/11008941_21
Fukunaga K, Koontz WL (1970) Application of the Karhunen–Loeve expansion to feature selection and ordering. IEEE Trans Comput 100(4):311–318
Gatto BB, Bogdanova A, Souza LS, dos Santos EM (2017) Hankel subspace method for efficient gesture representation. In: 2017 IEEE 27th international workshop on machine learning for signal processing (MLSP). IEEE, pp 1–6
Gong H, Bhat S, Viswanath P (2017) Geometry of compositionality. In: Thirty-first AAAI conference on artificial intelligence
Gong H, Sakakini T, Bhat S, Xiong J (2018) Document similarity for texts of varying lengths via hidden topics. In: Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Melbourne, Australia, pp 2341–2351. https://doi.org/10.18653/v1/P18-1218
Greenacre MJ (1984) Theory and applications of correspondence analysis. Academic Press, London
Hamm J, Lee DD (2008) Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 376–383
Harris ZS (1954) Distributional structure. Word 10(2–3):146–162
Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data. arXiv preprint arXiv:1602.03483
Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3,4):321–377
Jolliffe I (2006) Principal component analysis. Springer, Berlin
Kawahara T, Nishiyama M, Kozakaya T, Yamaguchi O (2007) Face recognition based on whitening transformation of distribution of subspaces. In: Proceedings of the ACCV07 workshop subspace. pp 97–103
Kayal S, Tsatsaronis G (2019) Eigensent: Spectral sentence embeddings using higher-order dynamic mode decomposition. In: Proceedings of the 57th annual meeting of the association for computational linguistics. pp 4536–4546
Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems. pp 3294–3302
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning. pp 1188–1196
Le Clainche S, Vega JM (2017) Higher order dynamic mode decomposition. SIAM J Appl Dyn Syst 16(2):882–925
Logeswaran L, Lee H (2018) An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893
Lvd Maaten, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
McCallum A, Nigam K, et al (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol 752. Madison, WI, pp 41–48
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mu J, Bhat S, Viswanath P (2017) Representing sentences as low-rank subspaces. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2: short papers). pp 629–634
Mu J, Bhat SP, Viswanath P (2019) Geometry of polysemy. In: 5th International conference on learning representations. ICLR 2017
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for computational linguistics. Association for Computational Linguistics, p 271
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp 1532–1543
Perone CS, Silveira R, Paula TS (2018) Evaluation of sentence embeddings in downstream and linguistic probing tasks. arXiv preprint arXiv:1806.06259
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of NAACL
Raunak V, Gupta V, Metze F (2019) Effective dimensionality reduction for word embeddings. In: Proceedings of the 4th workshop on representation learning for NLP (RepL4NLP-2019). pp 235–243
Rücklé A, Eger S, Peyrard M, Gurevych I (2018) Concatenated power mean word embeddings as universal cross-lingual sentence representations. arXiv preprint arXiv:1803.01400
Shimomoto EK, Souza LS, Gatto BB, Fukui K (2018) Text classification based on word subspace with term-frequency. In: 2018 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing. pp 1631–1642
Suryanto CH, Xue JH, Fukui K (2016) Randomized time warping for motion recognition. Image Vis Comput 54:1–11
Yaghoobzadeh Y, Schütze H (2016) Intrinsic subspace evaluation of word embedding representations. arXiv preprint arXiv:1606.07902
Yang Z, Zhu C, Chen W (2019) Parameter-free sentence embedding via orthogonal basis. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). pp 638–648
Zhang H, Wang S, Zhao M, Xu X, Ye Y (2018) Locality reconstruction models for book representation. IEEE Trans Knowl Data Eng 30(10):1873–1886
Author information
Authors and Affiliations
Contributions
All authors of this research paper have directly participated in the planning, execution, and analysis of this study. All authors of this paper have read and approved the final version submitted.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shimomoto, E.K., Portet, F. & Fukui, K. Text classification based on the word subspace representation. Pattern Anal Applic 24, 1075–1093 (2021). https://doi.org/10.1007/s10044-021-00960-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-021-00960-6