Computing Vectors Based Document Clustering and Numerical Result Analysis

Sahu, Neeraj; Thakur, G. S.

doi:10.1007/978-81-322-1602-5_138

Neeraj Sahu⁹ &
G. S. Thakur¹⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 236))

1697 Accesses

Abstract

This paper presents new approach analytical results of document clustering for vectors. The proposed analytical results of document clustering for vectors approach is based on mean clusters. In this paper we have used six iterations \(\text {I}_{1}\) to \(\text {I}_{6}\) for document clustering results. The steps Document collection, Text Pre-processing, Feature Selection, Indexing, Clustering Process and Results Analysis are used. Twenty news group data sets are used in the experiments. The experimental results are evaluated using the numerical computing MATLAB 7.14 software. The experimental results show the proposed approach out performs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arora, S., Raghavan, P., Rao, S.: Approximation schemes for Euclidean k-median and related problems. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, May 1998, pp. 106–113
Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Comm. ACM 18, 509–517 (1975)
Article MATH Google Scholar
Bradley, P.S., Fayyad, U.: Refining initial points for k-means clustering. In: Proceedings of the 15th International Conference on Machine Learning, 1998, pp. 91–99
Google Scholar
www.kdd.ics.uci.edu
Dasgupta, S.: Learning mixtures of Gaussians. In: Proceedings of the 40th IEEE Symposium on Foundations of Computer Science, Oct 1999, pp. 634–644
Google Scholar
Du, Q., Faber, V., Gunzburger, M.: Centroidal Voronoi tesselations: Applications and algorithms. SIAM Rev. 41, 637–676 (1999)
Article MATH MathSciNet Google Scholar
Agarwal, P.K., Procopiuc, C.M.: Exact and approximation algorithms for clustering. In: Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, Jan 1998, pp. 658–667
Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Faber, V.: Clustering and the continuous k-means algorithm. Los Alamos Sci. 22, 138–144 (1994)
Google Scholar
Arya, S., Mount, D.M.: Approximate range searching. Comput. Geom. Theor. Appl. 17, 135–163 (2000)
Article MATH MathSciNet Google Scholar
Forgey, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics 21, 768 (1965)
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Cambridge (1996)
Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, Boston (1990)
MATH Google Scholar
Ball, G.H., Hall, D.J.: Some fundamental concepts and synthesis procedures for pattern recognition preprocessors. In: Proceedings of the International Conference on Microwaves, Circuit Theory, and Information Theory, Sept 1964
Google Scholar
Dasgupta, S., Shulman, L.J.: A two-round variant of EM for Gaussian mixtures. In: Procedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI-2000), June 2000, pp. 152–159
Google Scholar
Feller, W.: An Introduction to Probability Theory and Its Applications, 3rd edn. Wiley, New York (1968)
MATH Google Scholar
Alsabti, K., Ranka, S., Singh, V.: An Efficient k-means clustering algorithm. In: Proceedings of the First Workshop High Performance Data Mining, Mar 1998
Google Scholar
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching. J. ACM 45, 891–923 (1998)
Article MATH MathSciNet Google Scholar
Bottou, L., Bengio, Y.: Convergence properties of the k-means algorithms. In: Tesauro, G., Touretzky, D. (eds.) Advances in Neural Information Processing Systems 7, pp. 585–592. MIT Press, Cambridge (1996)
Google Scholar
Coggins, J.M., Jain, A.K.: A spatial filtering approach to texture analysis. Pattern Recognit. Lett. 3, 195–203 (1985)
Article Google Scholar
Ester, M., Kriegel, H., Xu, X.: A database interface for clustering in large spatial databases. In: Procedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), 1995, pp. 94–99
Google Scholar
Neeraj, S., Thakur, G.S. : Hesitant distance similarity measures for document clustering. In: IEEE Conference—2011 World Congress on Information and Communication Technologies Mumbai, 11–14 Dec 2011. ISBN: 978-1-4673-0125-1
Google Scholar
Sahu, S.K., Sahu, N., Thakur, G.S.: Classification of document clustering approaches. Intl. J. Comput. Sci. Softw. Eng. 2(5), 509–513 (2012). ISSN (Online): 2277 128X
Google Scholar
Sahu, B., Sahu, N., Thakur, G.S.: Architecture based users and administrator login data processing. In: International Conference on Intelligent Computing and Information System (ICICIS-2012), Pachmarhi, Piparia, 27–28 Oct 2012. ISSN (Online): 2249–071X
Google Scholar
Bradley, P.S., Fayyad, U., Reina, C.: Scaling clustering algorithms to large databases. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 1998, pp. 9–15
Google Scholar

Download references

Acknowledgments

This work is supported by research grant from MPCST, Bhopal M.P., India, Endt.No. 2427/CST/R&D/2011 dated 22/09/2011.

Author information

Authors and Affiliations

Singhania University Rajasthan, Rajasthan, India
Neeraj Sahu
MANIT, Bhopal, India
G. S. Thakur

Authors

Neeraj Sahu
View author publications
You can also search for this author in PubMed Google Scholar
G. S. Thakur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neeraj Sahu .

Editor information

Editors and Affiliations

Institute of Engineering and Technology, JK Lakshmipat University, Jaipur, Rajasthan, India
B. V. Babu
Department of Computer Science, Liverpool Hope University, Liverpool, United Kingdom
Atulya Nagar
Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Kusum Deep
Department of Paper Technology, Indian Institute of Technology Roorkee, Roorkee, India
Millie Pant
Department of Applied Mathematics, South Asian University, New Delhi, India
Jagdish Chand Bansal
Institute of Engineering and Technology, JK Lakshmipat University, Jaipur, Rajasthan, India
Kanad Ray
Institute of Engineering and Technology, JK Lakshmipat University, Jaipur, Rajasthan, India
Umesh Gupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sahu, N., Thakur, G.S. (2014). Computing Vectors Based Document Clustering and Numerical Result Analysis. In: Babu, B., et al. Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. Advances in Intelligent Systems and Computing, vol 236. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1602-5_138

Download citation

DOI: https://doi.org/10.1007/978-81-322-1602-5_138
Published: 26 February 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1601-8
Online ISBN: 978-81-322-1602-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics