Skip to main content

Dimensionality Reduction via Community Detection in Small Sample Datasets

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Abstract

Real world networks constructed from raw data are often characterized by complex community structures. Existing dimensionality reduction techniques, however, do not take such characteristics into account. This is especially important for problems with low number of samples where the curse of dimensionality is particularly significant. Therefore, in this paper, we propose FeatureNet, a novel community-based dimensionality reduction framework targeting small sample problems. To this end, we propose a new method to directly construct a network from high-dimensional raw data while explicitly revealing its hidden community structure; these communities are then used to learn low-dimensional features using a representation learning framework. We show the effectiveness of our approach on eight datasets covering application areas as diverse as handwritten digits, biology, physical sciences, NLP, and computational sustainability. Extensive experiments on the above datasets (with sizes mostly between 100 and 1500 samples) demonstrate that FeatureNet significantly outperforms (i.e., up to 40% improvement in classification accuracy) ten well-known dimensionality reduction methods like PCA, Kernel PCA, Isomap, SNE, t-SNE, etc.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://archive.ics.uci.edu/ml/datasets/Arcene.

  2. 2.

    Supplementary material available at: https://goo.gl/LvkmjB.

  3. 3.

    http://archive.ics.uci.edu/ml/index.php.

  4. 4.

    For the ease of presentation, we will report only the top six performers.

  5. 5.

    See details at http://www.cad.zju.edu.cn/home/dengcai/Data/TextData.html.

References

  1. Agrafiotis, D.K.: Stochastic proximity embedding. J. Comput. Chem. 24(10), 1215–1221 (2003)

    Article  Google Scholar 

  2. Andres, R.: Monthly Fossil-Fuel CO2 Emissions: Mass of Emissions Gridded by One Degree Latitude by One Degree Longitude. CDIAC, U.S.A. (2013)

    Google Scholar 

  3. Bengio, Y., et al.: Representation learning: a review and new perspectives. IEEE Trans. PAMI 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  4. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5

    Book  MATH  Google Scholar 

  5. Fukunaga, K., Hayes, R.R.: Effects of sample size in classifier design. IEEE Trans. Pattern Anal. Mach. Intell. 11(8), 873–885 (1989)

    Article  Google Scholar 

  6. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: KDD, pp. 855–864. ACM (2016)

    Google Scholar 

  7. Hinton, G., Roweis, S.: Stochastic neighbor embedding. In: NIPS, vol. 15 (2002)

    Google Scholar 

  8. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  9. Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14(1), 55–63 (1968)

    Article  Google Scholar 

  10. Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000)

    Article  Google Scholar 

  11. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)

    MATH  Google Scholar 

  12. Maaten, V.D.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)

    Google Scholar 

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  14. Moss, R.H., et al.: The next generation of scenarios for climate change research and assessment. Nature 463(7282), 747 (2010)

    Article  Google Scholar 

  15. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: KDD, pp. 701–710. ACM (2014)

    Google Scholar 

  16. Raudys, S.J., et al.: Small sample size effects in statistical pattern recognition. IEEE Trans. PAMI 13(3), 252–264 (1991)

    Article  Google Scholar 

  17. Steinhaeuser, K., et al.: Multivariate and multiscale dependence in the global climate system revealed through complex networks. Clim. Dyn. 39, 889–895 (2012)

    Article  Google Scholar 

  18. Tang, J., et al.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077 (2015)

    Google Scholar 

  19. Tenenbaum, J.B., et al.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  20. Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: AAAI, pp. 203–209 (2017)

    Google Scholar 

  21. Weinberger, K.Q., Saul, L.K.: An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: AAAI 2006, pp. 1683–1686 (2006)

    Google Scholar 

  22. World Bank: GDP Growth Data (%) (2017). http://data.worldbank.org/

  23. Yamada, M., et al.: High-dimensional feature selection by feature-wise kernelized Lasso. Neural Comput. 26(1), 185–207 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kartikeya Bhardwaj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhardwaj, K., Marculescu, R. (2018). Dimensionality Reduction via Community Detection in Small Sample Datasets. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93040-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93039-8

  • Online ISBN: 978-3-319-93040-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics