Skip to main content

Attribute Clustering and Dimensionality Reduction Based on In/Out Degree of Attributes in Dependency Graph

  • Conference paper
Swarm, Evolutionary, and Memetic Computing (SEMCCO 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7076))

Included in the following conference series:

Abstract

In order to mine useful information from huge datasets development of appropriate tools and techniques are needed to organize and evaluate such data. However, ultra high dimensionality of data poses serious challenges in data mining research. The method proposed in the paper encompasses a new strategy in dimensionality reduction by attribute clustering based on the dependency graph of the attributes. Information gain, an established theory of measuring uncertainty and quantified the information contained in the system, of each attribute is calculated that expresses dependency relationship between the attributes in the graph. The underlying principles able to select the optimum set of attributes, called reduct able to classify the dataset as could be done in presence of all attributes. The rate of dimension reduction of the datasets of UCI repository is measured and compared with existing methods and also the classification accuracy with reduced dataset is calculated by various classifiers to measure the effectiveness of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baldonado Pal, S.K., Mitra, S.: Neuro-Fuzzy pattern Recognition: Methods in Soft Computing. Willey, New York (1999)

    Google Scholar 

  2. Carreira-Perpinan, M.A.: A review of dimension reduction techniques. Technical report CS-96-09, Department of Computer Science, University of Sheffield (1997)

    Google Scholar 

  3. An, A., Huang, Y., Huang, X., Cercone, N.J.: Feature Selection with Rough Sets for Web Page Classification. In: Peters, J.F., Skowron, A., Dubois, D., Grzymała-Busse, J.W., Inuiguchi, M., Polkowski, L. (eds.) Transactions on Rough Sets II. LNCS, vol. 3135, pp. 1–13. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Pawlak, Z.: Rough sets. International Journal of information and Computer Sciences 11, 341–356 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  5. Pawlak, Z.: Rough set theory and its applications to data analysis. Cybernetics and Systems 29(1998), 661–688 (1998)

    Article  MATH  Google Scholar 

  6. Gupta, S.C., Kapoor, V.K.: Fundamental of Mathematical Statistics. Sultan Chand & Sons, A.S. Printing Press, India (1994)

    Google Scholar 

  7. Devroye, L., Gyorfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)

    Book  MATH  Google Scholar 

  8. Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1992)

    MATH  Google Scholar 

  9. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, MK (2001)

    Google Scholar 

  10. Witten, I.H., Frank, E.: Data Mining:Practical Machine Learning Tools and Techniques with Java Implementations, MK (2000)

    Google Scholar 

  11. Deo, N.: Graph Theory with Applications to Engineering and Computer Science. Prentice-Hall of India Pvt. (1995) ISBN-81-203-0145-5

    Google Scholar 

  12. WEKA: Machine Learning Software, http://www.cs.waikato.ac.nz/~ml/

  13. Murphy, P., Aha, W.: UCI repository of machine learning databases (1996), http://www.ics.uci.edu/mlearn/MLRepository.html

  14. Hall, M.A.: Correlation-Based Feature Selection for Machine Learning PhD thesis, Dept. of Computer Science, Univ. of Waikato, Hamilton, New Zealand (1998)

    Google Scholar 

  15. Liu, H., Setiono, R.: A Probabilistic Approach to Feature Selection: A Filter Solution. In: Proc.13th Int’l Conf. Machine Learning, pp. 319–327 (1996)

    Google Scholar 

  16. Kerber, R.: ChiMerge: Discretization of Numeric Attributes. In: Proceedings of AAAI 1992, Ninth Int’l Conf. Artificial Intelligence, pp. 123–128. AAAI Press (1992)

    Google Scholar 

  17. Daren, Y., Qinghua, H., Wen, B.: Combining multiple neural networks for classification based on rough set reduction. In: IEEE int. Conf. Neural Network & Signal Processing, Nanjing, China, December 14-17 (2003)

    Google Scholar 

  18. Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  19. Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, London (2001)

    MATH  Google Scholar 

  20. Hall, M.A.: Correlation-Based Feature Selection for Machine Learning PhD thesis, Dept. of Computer Science, Univ. of Waikato, Hamilton, New Zealand (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Das, A.K., Sil, J., Phadikar, S. (2011). Attribute Clustering and Dimensionality Reduction Based on In/Out Degree of Attributes in Dependency Graph. In: Panigrahi, B.K., Suganthan, P.N., Das, S., Satapathy, S.C. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2011. Lecture Notes in Computer Science, vol 7076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27172-4_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27172-4_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27171-7

  • Online ISBN: 978-3-642-27172-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics