Skip to main content

Ensemble Clustering Based Dimensional Reduction

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2018)

Abstract

Distance metric over a given space of data should reflect the precise comparison among objects. The Euclidean distance of data points represented by a large number of features is not capturing the actual relationship between those points. However, objects of similar cluster both often have some common attributes despite the fact that their geometrical distance could be somewhat large. In this study, we proposed a new method that replaced the given data space to categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. To assess our suggested method, it was integrated within the framework of the Decision Trees, K Nearest Neighbors, and the Random Forest classifiers. The results obtained by applying EC on 10 datasets confirmed that our hypotheses embedding the EC space as a distance metric, would improve the performance and reduce the feature space dramatically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Topchy, A., Jain, A.K., Punch, W.: Combining multiple weak clusterings. In: Third IEEE International Conference on Data Mining, pp. 0–7 (2003)

    Google Scholar 

  2. Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  3. Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866–1881 (2005)

    Article  Google Scholar 

  4. Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003)

    Article  Google Scholar 

  5. Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the Twentieth International Conference on Machine Learning, vol. 20, pp. 186–193 (2003)

    Google Scholar 

  6. Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 25(11), 1411–1415 (2003)

    Article  Google Scholar 

  7. Derbeko, P., El-Yaniv, R., Meir, R.: Explicit learning curves for transduction and application to clustering and compression algorithms. J. Artif. Intell. Res. 22, 117–142 (2004)

    Article  MathSciNet  Google Scholar 

  8. AbedAllah, L., Shimshoni, I.: k nearest neighbor using ensemble clustering. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 265–278. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32584-7_22

    Chapter  Google Scholar 

  9. AbedAllah, L., Shimshoni, I.: An ensemble-clustering-based distance metric and its applications. Int. J. Bus. Intell. Data Min. 8(3), 264–287 (2013)

    Article  Google Scholar 

  10. Yousef, M., Khalifa, W., AbedAllah, L.: Ensemble clustering classification compete SVM and one-class classifiers applied on plant microRNAs data. J. Integr. Bioinform. 13(5), 304 (2016)

    Article  Google Scholar 

  11. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  12. Griffiths-Jones, S.: miRBase: microRNA sequences and annotation. Curr. Protoc. Bioinform. Chapter 12, Unit 12.9.1–10 (2010)

    Google Scholar 

  13. Yousef, M., Nigatu, D., Levy, D., Allmer, J., Henkel, W.: Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers. EURASIP J. Adv. Signal Process. 2017(1), 70 (2017)

    Article  Google Scholar 

  14. Yousef, M., Nebozhyn, M., Shatkay, H., Kanterakis, S., Showe, L.C., Showe, M.K.: Combining multi-species genomic data for microRNA identification using a Naive Bayes classifier. Bioinformatics 22(11), 1325–1334 (2006)

    Article  Google Scholar 

  15. Sacar, M.D., Allmer, J.: Data mining for microRNA gene prediction: on the impact of class imbalance and feature number for microRNA gene prediction. In: 2013 8th International Symposium on Health Informatics and Bioinformatics, pp. 1–6 (2013)

    Google Scholar 

  16. Yousef, M., Yousef, A., Allmer, J.: K-mer Distance a New Set of Features for Delineating among Pre-Cursor microRNAs from Different Species (2018)

    Google Scholar 

Download references

Acknowledgment

This research was supported by the Max Stern Yezreel Valley College for LA and by Zefat Academic College for MY.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Loai Abddallah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abddallah, L., Yousef, M. (2018). Ensemble Clustering Based Dimensional Reduction. In: Elloumi, M., et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99133-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99132-0

  • Online ISBN: 978-3-319-99133-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics