Dimensionality Reduction via Community Detection in Small Sample Datasets

Bhardwaj, Kartikeya; Marculescu, Radu

doi:10.1007/978-3-319-93040-4_9

Dimensionality Reduction via Community Detection in Small Sample Datasets

Kartikeya Bhardwaj¹⁹ &
Radu Marculescu¹⁹

Conference paper
First Online: 17 June 2018

3490 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Abstract

Real world networks constructed from raw data are often characterized by complex community structures. Existing dimensionality reduction techniques, however, do not take such characteristics into account. This is especially important for problems with low number of samples where the curse of dimensionality is particularly significant. Therefore, in this paper, we propose FeatureNet, a novel community-based dimensionality reduction framework targeting small sample problems. To this end, we propose a new method to directly construct a network from high-dimensional raw data while explicitly revealing its hidden community structure; these communities are then used to learn low-dimensional features using a representation learning framework. We show the effectiveness of our approach on eight datasets covering application areas as diverse as handwritten digits, biology, physical sciences, NLP, and computational sustainability. Extensive experiments on the above datasets (with sizes mostly between 100 and 1500 samples) demonstrate that FeatureNet significantly outperforms (i.e., up to 40% improvement in classification accuracy) ten well-known dimensionality reduction methods like PCA, Kernel PCA, Isomap, SNE, t-SNE, etc.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://archive.ics.uci.edu/ml/datasets/Arcene.
2.
Supplementary material available at: https://goo.gl/LvkmjB.
3.
http://archive.ics.uci.edu/ml/index.php.
4.
For the ease of presentation, we will report only the top six performers.
5.
See details at http://www.cad.zju.edu.cn/home/dengcai/Data/TextData.html.

References

Agrafiotis, D.K.: Stochastic proximity embedding. J. Comput. Chem. 24(10), 1215–1221 (2003)
Article Google Scholar
Andres, R.: Monthly Fossil-Fuel CO2 Emissions: Mass of Emissions Gridded by One Degree Latitude by One Degree Longitude. CDIAC, U.S.A. (2013)
Google Scholar
Bengio, Y., et al.: Representation learning: a review and new perspectives. IEEE Trans. PAMI 35(8), 1798–1828 (2013)
Article Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5
Book MATH Google Scholar
Fukunaga, K., Hayes, R.R.: Effects of sample size in classifier design. IEEE Trans. Pattern Anal. Mach. Intell. 11(8), 873–885 (1989)
Article Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: KDD, pp. 855–864. ACM (2016)
Google Scholar
Hinton, G., Roweis, S.: Stochastic neighbor embedding. In: NIPS, vol. 15 (2002)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14(1), 55–63 (1968)
Article Google Scholar
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 4–37 (2000)
Article Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
MATH Google Scholar
Maaten, V.D.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Moss, R.H., et al.: The next generation of scenarios for climate change research and assessment. Nature 463(7282), 747 (2010)
Article Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: KDD, pp. 701–710. ACM (2014)
Google Scholar
Raudys, S.J., et al.: Small sample size effects in statistical pattern recognition. IEEE Trans. PAMI 13(3), 252–264 (1991)
Article Google Scholar
Steinhaeuser, K., et al.: Multivariate and multiscale dependence in the global climate system revealed through complex networks. Clim. Dyn. 39, 889–895 (2012)
Article Google Scholar
Tang, J., et al.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077 (2015)
Google Scholar
Tenenbaum, J.B., et al.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: AAAI, pp. 203–209 (2017)
Google Scholar
Weinberger, K.Q., Saul, L.K.: An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: AAAI 2006, pp. 1683–1686 (2006)
Google Scholar
World Bank: GDP Growth Data (%) (2017). http://data.worldbank.org/
Yamada, M., et al.: High-dimensional feature selection by feature-wise kernelized Lasso. Neural Comput. 26(1), 185–207 (2014)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Kartikeya Bhardwaj & Radu Marculescu

Authors

Kartikeya Bhardwaj
View author publications
You can also search for this author in PubMed Google Scholar
Radu Marculescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kartikeya Bhardwaj .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhardwaj, K., Marculescu, R. (2018). Dimensionality Reduction via Community Detection in Small Sample Datasets. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-93040-4_9
Published: 17 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics