Abstract
Many real-world networks, such as the Internet, social networks, biological networks, and others, are massive in size, which impairs their processing and analysis. To cope with this, the network size could be reduced without losing relevant information. In this paper, we extend a work that proposed a sampling method based on the following centrality measures: degree, k-core, clustering, eccentricity and structural holes. For our experiments, we remove \(30\%\) and \(50\%\) of the vertices and their edges from the original network. After, we evaluate our proposal on six real-world networks on relational classification task using eight different classifiers. Classification results achieved on sampled graphs generated from our proposal are similar to those obtained on the entire graphs. The execution time for learning step of the classifier is shorter on the sampled graph compared to the entire graph and random sampling. In most cases, the original graph was reduced by up to \(50\%\) of its initial number of edges without losing topological properties.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmed, N.K., Neville, J., Kompella, R.: Network sampling designs for relational classification. In: The 6th International AAAI Conference on Weblogs and Social (2012)
Ahmed, N.K., Neville, J., Kompella, T.: Network sampling: from static to streaming graphs. ACM Trans. Knowl. Discov. Data 8(2), 7:1–7:56 (2013)
Berton, L., Vega-Oliveros, D., Valverde-Rebaza, J., Silva, A.T., Lopes, A.: The impact of network sampling on relational classification. In: SIMBig 2016 - SNMAM track. CEUR-WS.org (2016)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. JMLR 7, 1–30 (2006)
Fortunato, S.: Community detection in graphs, CoRR abs/0906.0612v2 (2010)
Frank, O.: The Sage Handbook of Social Network Analysis. Sage publications, London (2011)
Gile, K.J., Handcock, M.S.: Respondent-driven sampling: an assessment of current methodology. Sociol. Methodol. 1(40), 285–327 (2010)
Lee, S., Kim, P., Jeong, H.: Statistical properties of sampled networks. Phys. Rev. E 73, 016102 (2006)
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD 2006, pp. 631–636 (2006)
Lopes, A.A., Bertini, J.R., Motta, R., Zhao, L.: Classification based on the optimal K-associated network. In: Zhou, J. (ed.) Complex 2009. LNICSSITE, vol. 4, pp. 1167–1177. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02466-5_117
Lu, Q., Getoor, L.: Link-based classification. In: ICML 2003, pp. 496–503 (2003)
Macskassy, S.A., Provost, F.J.: A simple relational classifier. In: 2nd Workshop on Multi-Relational Data Mining (2003)
Macskassy, S.A., Provost, F.J.: Classification in networked data: a toolkit and a univariate case study. JMLR 8, 935–983 (2007)
Newman, M.E.J.: Networks: An Introduction. Oxford University Press, Oxford (2010)
Pastor-Satorras, R., Vespignani, A.: Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86(14), 3200–3203 (2001)
Rezvanian, A., Meybodi, M.R.: Sampling social networks using shortest paths. Physica A Stat. Mech. Appl. 424(C), 254–268 (2015)
Rezvanian, A., Meybodi, M.R.: Sampling algorithms for weighted networks. Soc. Netw. Anal. Mining 6(1), 1–22 (2016)
Yon, S., Lee, S., Yook, S.H., Kim, Y.: Statistical properties of sampled networks by random walks. Phys. Rev. E 75, 46114 (2007)
Smith, J.A., Moody, J., Morgan, J.H.: Network sampling coverage II: the effect of non-random missing data on network measurement. Soc. Netw. 48, 78–99 (2017)
Tong, C., Lian, Y., Niu, J., Xie, Z., Zhang, Y.: A novel green algorithm for sampling complex networks. J. Netw. Comput. Appl. 59, 55–62 (2016)
Valverde-Rebaza, J., Valejo, A., Berton, L., Faleiros, T., Lopes, A.: A naïve bayes model based on overlapping groups for link prediction in online social networks. In: ACM SAC 2015, pp. 1136–1141 (2015)
Vega-Oliveros, D., Berton, L., Lopes, A., Rodrigues, F.: Influence maximization based on the least influential spreaders. In: SocInf 2015, Co-located with IJCAI 2015, vol. 1398, pp. 3–8 (2015)
Yoon, S.-H., Kim, K.-N., Hong, J., Kim, S.-W., Park, S.: A community-based sampling method using DPL for online social networks. Inf. Sci. Int. J. 306(C), 53–69 (2015)
Acknowledgments
This work was partially supported by the São Paulo Research Foundation (FAPESP) grants: \(2013/12191-5\) and \(2015/14228-9\), National Council for Scientific and Technological Development (CNPq) grants: \(140688/2013-7\) and \(302645/2015-2\), and Coordination for the Improvement of Higher Education Personnel (CAPES), Brazil.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Berton, L., Vega-Oliveros, D.A., Valverde-Rebaza, J., da Silva, A.T., Lopes, A.d.A. (2017). Network Sampling Based on Centrality Measures for Relational Classification. In: Lossio-Ventura, J., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig SIMBig 2015 2016. Communications in Computer and Information Science, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-319-55209-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-55209-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55208-8
Online ISBN: 978-3-319-55209-5
eBook Packages: Computer ScienceComputer Science (R0)