A Feature Clustering Approach for Dimensionality Reduction and Classification

VinayKumar, Kotte; Srinivasan, R.; Singh, Elijah Blessing

doi:10.1007/978-3-319-19824-8_21

Kotte VinayKumar³,
R. Srinivasan⁴ &
Elijah Blessing Singh⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 378))

Included in the following conference series:

International Conference on Soft Computing - MENDEL

647 Accesses
2 Citations

Abstract

Dimensionality reduction is one of the primary challenges when handling high dimensional data. Feature clustering is a powerful approach for reducing the dimensionality of the global feature vector when performing classification. In this paper, we discuss the current research issues in handling data streams and high dimensional data and introduce an approach to perform dimensionality reduction by computing the standard deviation of each feature with every transaction or document of the entire dataset. We then rank and cluster the features of the global feature vector to obtain feature-cluster matrix. The feature-cluster matrix so formed is used to perform dimensionality reduction. Then we show how the reduced dimensionality can be used to perform classification after elimination of noise. In this work, we classify the new test document or transaction after reducing dimensionality. In future, the idea is to cluster the features using a kernel measure and perform clustering and classification of text streams dynamically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jiawei Han, M., Kamber, J.P.: Data Mining Concepts and Techniques, 3rd edn. (2012)
Google Scholar
Agarwal, C.: Data Streams Models and Algorithms. Springer Publications (2007)
Google Scholar
Gama, J.: Knowledge Discovery from Databases. CRC Press (2013)
Google Scholar
Jiang, J.-Y., et al.: A Fuzzy self constructing feature clustering algorithm for text classification. In: IEEE Transactions of Knowledge and Data Engineering, pp. 335–349 (2011)
Google Scholar
Lin, Y.-S., et al.: A similarity measure for text classification and clustering. In: IEEE Transactions of Knowledge and Data Engineering (2013)
Google Scholar
Han, J., Kamber, M.: Data mining: concepts and techniques. In: Kacprzyk, J., Jain, L.C. (eds.) vol. 54, 2nd edn. Morgan Kaufmann (2006)
Google Scholar
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams, a review, SIGMODC Record, vol. 34, No 2 (2005)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of PODS (2002)
Google Scholar
Tatbul, N., Zdonik, S.: A subset-based load shedding approach for aggregation queries over data streams. In: Proceedings of International Conference on very Large Data Bases (VLDB) (2006)
Google Scholar
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Towards an adaptive approach for mining data streams in resource constrained environments. In: The Proceedings of Sixth International Conference on Data Warehousing and Knowledge Discovery. Lecture Notes in Computer Science (LNCS), Springer (2004)
Google Scholar
Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proceedings of 35th ACM Symposium on Theory of Computing (2003)
Google Scholar
Aggarwal C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: VLDB Conference (2003)
Google Scholar
Chang, J.H., Lee, W.S.: estWin: online data stream stream mining of recent frequent item sets by sliding window method. J. Inf. Sci. 31(2), 7690 (2005)
Google Scholar
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining Data Streams, a Review. SIGMODC Record, vol. 34, No 2 (2005)
Google Scholar
Phridviraj, M.S.B., Srinivas, C., GuruRao, C.V.: Clustering text data streams a tree based approach with ternary function and ternary feature vector. Proc. Comput. Sci. 31, 976–984
Google Scholar

Download references

Author information

Authors and Affiliations

Kakatiya Institute of Technology and Science, Warangal, India
Kotte VinayKumar
Karunya University, Coimbatore, India
R. Srinivasan
School of CSE, Karunya University, Coimbatore, India
Elijah Blessing Singh

Authors

Kotte VinayKumar
View author publications
You can also search for this author in PubMed Google Scholar
R. Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Elijah Blessing Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kotte VinayKumar .

Editor information

Editors and Affiliations

Faculty of Mechanical Engineering, Department of Applied Computer Science, Brno University of Technology, Brno, Czech Republic
Radek Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

VinayKumar, K., Srinivasan, R., Singh, E.B. (2015). A Feature Clustering Approach for Dimensionality Reduction and Classification. In: Matoušek, R. (eds) Mendel 2015. ICSC-MENDEL 2016. Advances in Intelligent Systems and Computing, vol 378. Springer, Cham. https://doi.org/10.1007/978-3-319-19824-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-19824-8_21
Published: 07 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19823-1
Online ISBN: 978-3-319-19824-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics