Abstract
Real world networks constructed from raw data are often characterized by complex community structures. Existing dimensionality reduction techniques, however, do not take such characteristics into account. This is especially important for problems with low number of samples where the curse of dimensionality is particularly significant. Therefore, in this paper, we propose FeatureNet, a novel community-based dimensionality reduction framework targeting small sample problems. To this end, we propose a new method to directly construct a network from high-dimensional raw data while explicitly revealing its hidden community structure; these communities are then used to learn low-dimensional features using a representation learning framework. We show the effectiveness of our approach on eight datasets covering application areas as diverse as handwritten digits, biology, physical sciences, NLP, and computational sustainability. Extensive experiments on the above datasets (with sizes mostly between 100 and 1500 samples) demonstrate that FeatureNet significantly outperforms (i.e., up to 40% improvement in classification accuracy) ten well-known dimensionality reduction methods like PCA, Kernel PCA, Isomap, SNE, t-SNE, etc.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
Supplementary material available at: https://goo.gl/LvkmjB.
- 3.
- 4.
For the ease of presentation, we will report only the top six performers.
- 5.
See details at http://www.cad.zju.edu.cn/home/dengcai/Data/TextData.html.
References
Agrafiotis, D.K.: Stochastic proximity embedding. J. Comput. Chem. 24(10), 1215–1221 (2003)
Andres, R.: Monthly Fossil-Fuel CO2 Emissions: Mass of Emissions Gridded by One Degree Latitude by One Degree Longitude. CDIAC, U.S.A. (2013)
Bengio, Y., et al.: Representation learning: a review and new perspectives. IEEE Trans. PAMI 35(8), 1798–1828 (2013)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5
Fukunaga, K., Hayes, R.R.: Effects of sample size in classifier design. IEEE Trans. Pattern Anal. Mach. Intell. 11(8), 873–885 (1989)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: KDD, pp. 855–864. ACM (2016)
Hinton, G., Roweis, S.: Stochastic neighbor embedding. In: NIPS, vol. 15 (2002)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14(1), 55–63 (1968)
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Maaten, V.D.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Moss, R.H., et al.: The next generation of scenarios for climate change research and assessment. Nature 463(7282), 747 (2010)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: KDD, pp. 701–710. ACM (2014)
Raudys, S.J., et al.: Small sample size effects in statistical pattern recognition. IEEE Trans. PAMI 13(3), 252–264 (1991)
Steinhaeuser, K., et al.: Multivariate and multiscale dependence in the global climate system revealed through complex networks. Clim. Dyn. 39, 889–895 (2012)
Tang, J., et al.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077 (2015)
Tenenbaum, J.B., et al.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: AAAI, pp. 203–209 (2017)
Weinberger, K.Q., Saul, L.K.: An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: AAAI 2006, pp. 1683–1686 (2006)
World Bank: GDP Growth Data (%) (2017). http://data.worldbank.org/
Yamada, M., et al.: High-dimensional feature selection by feature-wise kernelized Lasso. Neural Comput. 26(1), 185–207 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Bhardwaj, K., Marculescu, R. (2018). Dimensionality Reduction via Community Detection in Small Sample Datasets. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-93040-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)