Textual data dimensionality reduction - a deep learning approach

Kushwaha, Neetu; Pant, Millie

doi:10.1007/s11042-018-6900-x

Textual data dimensionality reduction - a deep learning approach

Published: 15 December 2018

Volume 79, pages 11039–11050, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

442 Accesses
6 Citations
Explore all metrics

Abstract

The growth of Internet has produced a high volume of natural language textual data. Such data can be sparse and may contain uninformative features which increase the dimensions of the data. This high dimensionality in turn, decreases the efficiency of text mining tasks such as clustering. Transforming the high dimensional data into a lower dimension is an important pre-processing step before applying clustering. In this paper, dimensionality reduction method based on deep Autoencoder neural network named as DRDAE, is proposed to provide optimized and robust features for text clustering. DRDAE selects less correlated and salient feature space from the high dimensional feature space. To evaluate proposed algorithm, k-means is used to cluster text documents. The proposed method is tested on five benchmark text datasets. Simulation results demonstrate that the proposed algorithm clearly outperforms other conventional dimensionality reduction methods in the literature in terms of RI measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Siamese Neural Networks: An Overview

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

References

Abualigah LM, Khader AT, Al-Betar MA (2016) Multi-objectives-based text clustering technique using K-mean algorithm. In: 2016 7th international conference on computer science and information technology (CSIT). IEEE: 1–6
Agarwal B, Mittal N (2014) Text classification using machine learning methods-a survey. Springer, New Delhi, pp 701–709
Google Scholar
Arzeno NM, Vikalo H (2015) Semi-supervised affinity propagation with soft instance-level constraints. IEEE Trans Pattern Anal Mach Intell 37:1041–1052. https://doi.org/10.1109/TPAMI.2014.2359454
Article Google Scholar
Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42:3105–3114. https://doi.org/10.1016/J.ESWA.2014.11.038
Article Google Scholar
Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput J 43:20–34. https://doi.org/10.1016/j.asoc.2016.01.019
Article Google Scholar
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. Proc 16th ACM SIGKDD Int Conf Knowl Discov data Min - KDD 10:333. https://doi.org/10.1145/1835804.1835848
Article Google Scholar
Chouaib H, Terrades OR, Tabbone S, et al (2008) Feature selection combining genetic algorithm and Adaboost classifiers. In: 2008 19th international conference on pattern recognition. IEEE: 1–4
Cover TM, Thomas JA, Bellamy J, et al (1991) Elements of Information Theory WILEY SERIES IN Expert System Applications to Telecommunications
Duda RO, Hart PE, Stork DG PATTERN CLASSIFICATION Second Edition
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis
Guha S, Rastogi R, Shim K (2000) Rock: a robust clustering algorithm for categorical attributes. Inf Syst 25:345–366. https://doi.org/10.1016/S0306-4379(00)00022-3
Article Google Scholar
Hartigan JA (175AD) Clustering Algorithems. a Wiley Publ Appl Stat 1–351. doi:https://doi.org/10.1002/0471725382.scard
Hull DA (2013) Stemming algorithms - a case study for detailed Evalation. J Chem Inf Model 53:1689–1699. https://doi.org/10.1017/CBO9781107415324.004
Article Google Scholar
Jolliffe IT (2002) Principal component analysis. Springer
Kant S, Mahara T, Kumar Jain V et al (2018) LeaderRank based k-means clustering initialization method for collaborative filtering. Comput Electr Eng 69:598–609. https://doi.org/10.1016/J.COMPELECENG.2017.12.001
Article Google Scholar
Koller D, Sahami M (1996) Toward optimal feature selection. Int Conf Mach learn 284–292 . doi: citeulike-article-id:393144
Kushwaha N, Pant M (2018) Fuzzy magnetic optimization clustering algorithm with its application to health care. J Ambient Intell Humaniz Comput 1–10. doi:https://doi.org/10.1007/s12652-018-0941-x
Kushwaha N, Pant M (2018) Link based BPSO for feature selection in big data text clustering. Futur Gener Comput Syst 82. doi:https://doi.org/10.1016/j.future.2017.12.005
Kushwaha N, Pant M, Kant S, Kumar V (2017) Magnetic optimization algorithm for data clustering. Pattern Recogn Lett 0:1–7. https://doi.org/10.1016/j.patrec.2017.10.031
Article Google Scholar
Lee Rodgers J, Alan Nice Wander W (1988) Thirteen ways to look at the correlation coefficient. Am Stat 42:59–66. https://doi.org/10.1080/00031305.1988.10475524
Article Google Scholar
Li YH, Jain AK Classification of Text Documents
Li M, Yuan B (2005) 2D-LDA: a statistical linear discriminant analysis for image matrix. Pattern Recogn Lett 26:527–532. https://doi.org/10.1016/J.PATREC.2004.09.007
Article Google Scholar
Li Z, Yang Y, Liu J, et al (2012) Unsupervised Feature Selection Using Nonnegative Spectral Analysis. Twenty-Sixth AAAI Conf Artif Intell Unsupervised 1026–1032
Liu H, Yu L, Member SS et al (2005) Toward integrating feature selection algorithms for classification and clustering. Knowl Data Eng IEEE Trans 17:491–502. https://doi.org/10.1109/TKDE.2005.66
Article Google Scholar
Ludwig C (2007) Text Retrieval 24:1–21
Nie F, Xiang S, Jia Y, et al (2008) Trace ratio criterion for feature selection. Twenty-third AAAI Conf Artif Intell 671–676
Xu R, Member S, Ii DW (2005) Survey of clustering. Algorithms 16:645–678
Google Scholar
Yang Y, Shen HT, Ma Z, et al (2011) ℓ2,1-norm regularized discriminative feature selection for unsupervised learning. IJCAI Int Jt Conf Artif Intell 1589–1594. doi:https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-267
Zareapoor M, Yang J, Jain DK, et al (2018) Deep semantic preserving hashing for large scale image retrieval. Multimed Tools Appl 1–16 . doi:https://doi.org/10.1007/s11042-018-5970-0
Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. Icml 1151–1157

Download references

Author information

Authors and Affiliations

Department of ASE, Indian Institute of Technology Roorkee, Roorkee, 247667, India
Neetu Kushwaha & Millie Pant

Authors

Neetu Kushwaha
View author publications
You can also search for this author in PubMed Google Scholar
Millie Pant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neetu Kushwaha.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kushwaha, N., Pant, M. Textual data dimensionality reduction - a deep learning approach. Multimed Tools Appl 79, 11039–11050 (2020). https://doi.org/10.1007/s11042-018-6900-x

Download citation

Received: 24 June 2018
Revised: 12 September 2018
Accepted: 13 November 2018
Published: 15 December 2018
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11042-018-6900-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Textual data dimensionality reduction - a deep learning approach

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Siamese Neural Networks: An Overview

Impact of word embedding models on text analytics in deep learning environment: a review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Textual data dimensionality reduction - a deep learning approach

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Siamese Neural Networks: An Overview

Impact of word embedding models on text analytics in deep learning environment: a review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation