skip to main content
10.1145/2576768.2598391acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Genetic algorithm for sampling from scale-free data and networks

Published: 12 July 2014 Publication History

Abstract

A variety of real-world data and networks can be described by a heavy-tailed probability distribution of its values, vertex degrees, or other significant properties, that follows the power law. Such a scale-free data and networks can be found in both natural phenomena such as protein interaction networks and gene regulation networks and man-made structures like the Internet, language, and various social networks. An efficient analysis of large scale data and networks is often impractical and various heuristic and metaheuristc sampling techniques are deployed to select smaller subsets of the data for analysis and visualisation. A key goal of data and network sampling is to select such a subset of the original data that would accurately represent the original data with respect to selected attributes. In this work we propose a novel genetic algorithm for scale-free data and network sampling and evaluate the algorithm in a series of computational experiments.

References

[1]
D. Achlioptas, A. Clauset, D. Kempe, and C. Moore. On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs. J. ACM, 56(4):21:1--21:28, July 2009.
[2]
M. Affenzeller, S. Winkler, S. Wagner, and A. Beham. Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications. Chapman & Hall/CRC, 2009.
[3]
V. Alves, R. Campello, and E. Hruschka. Towards a fast evolutionary algorithm for clustering. In G. G. Yen, S. M. Lucas, G. Fogel, G. Kendall, R. Salomon, B.-T. Zhang, C. A. C. Coello, and T. P. Runarsson, editors, Proc. of the 2006 IEEE Congress on Evolutionary Computation, pages 1776--1783, Vancouver, BC, Canada, 16--21 July 2006. IEEE Press.
[4]
S. Bandyopadhyay. Genetic algorithms for clustering and fuzzy clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(6):524--531, 2011.
[5]
A.-L. Barabasi. Linked - The new Science of Networks. Perseus Publishing, 2002.
[6]
A.-L. Barabasi. Scale-free networks: A decade and beyond. Science, 325(5939):412--413, 2009.
[7]
A. L. Barabási, H. Jeong, Z. Néda, E. Ravasz, A. Schubert, and T. Vicsek. Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311(3--4):590 -- 614, 2002.
[8]
V. A. Cicirello. Non-wrapping order crossover: An order preserving crossover operator that respects absolute position. In Proc. of the 8th Annual Conference on Genetic and Evolutionary Computation, GECCO '06, pages 1125--1132, New York, NY, USA, 2006. ACM.
[9]
A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev., 51(4):661--703, Nov. 2009.
[10]
A. Czarn, C. MacNish, K. Vijayan, and B. A. Turlach. Statistical exploratory analysis of genetic algorithms: The influence of gray codes upon the difficulty of a problem. In Australian Conf. on Artificial Intelligence, pages 1246--1252, 2004.
[11]
D.-C. Dang and A. Moukrim. Subgraph extraction and metaheuristics for the maximum clique problem. Journal of Heuristics, 18(5):767--794, 2012.
[12]
E. Elmacioglu and D. Lee. On six degrees of separation in dblp-db and more. SIGMOD Rec., 34(2):33--40, June 2005.
[13]
E. R. Hruschka, R. J. Campello, and L. N. de Castro. Evolving clusters in gene-expression data. Information Sciences, 176(13):1898 -- 1927, 2006.
[14]
E. R. Hruschka, R. J. G. B. Campello, A. A. Freitas, and A. C. P. L. F. De Carvalho. A survey of evolutionary algorithms for clustering. Trans. Sys. Man Cyber Part C, 39:133--155, March 2009.
[15]
H. Jeong, S. Mason, A. Barabasi, and Z. Oltvai. Lethality and centrality in protein networks. Nature, 411:41--42, May 2001.
[16]
H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A. L. Barabasi. The large-scale organization of metabolic networks. Nature, 407(6804):651--654, October 2000.
[17]
E. Koonin, Y. Wolf, and G. Karev. Power Laws, Scale-Free Networks and Genome Biology. Molecular Biology Intelligence Unit. Landes Bioscience/Eurekah.com, 2007.
[18]
S. H. Lee, P. J. Kim, and H. Jeong. Statistical properties of sampled networks. Phys Rev E Stat Nonlin Soft Matter Phys, 73(1 Pt 2):016102, Jan 2006.
[19]
J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proc. of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '06, pages 631--636, New York, NY, USA, 2006. ACM.
[20]
T. G. Lewis. Network Science: Theory and Practice. Wiley Publishing, 2009.
[21]
C.-C. Lin, W.-Y. Liu, and D.-J. Deng. A genetic algorithm approach for detecting hierarchical and overlapping community structure in dynamic social networks. In Wireless Communications and Networking Conference (WCNC), 2013 IEEE, pages 4469--4474, 2013.
[22]
J. Lu and D. Li. Sampling online social networks by random walk. In Proc. of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, HotSocial '12, pages 33--40, New York, NY, USA, 2012. ACM.
[23]
S. Maenhout, B. De Baets, and G. Haesaert. Graph-based data selection for the construction of genomic prediction models. Genetics, 185(4):1463--1475, 2010.
[24]
M. Mitchell. An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA, 1996.
[25]
J. Nummela and B. A. Julstrom. An effective genetic algorithm for the minimum-label spanning tree problem. In Proc. of the 8th Annual Conference on Genetic and Evolutionary Computation, GECCO '06, pages 553--558, New York, NY, USA, 2006. ACM.
[26]
C. Pizzuti. A multi-objective genetic algorithm for community detection in networks. In Tools with Artificial Intelligence, 2009. ICTAI '09. 21st International Conference on, pages 379--386, 2009.
[27]
G. R. Raidl, G. Koller, and B. A. Julstrom. Biased mutation operators for subgraph-selection problems. Trans. Evol. Comp, 10(2):145--156, Sept. 2006.
[28]
M. Sabeti, R. Boostani, and T. Zoughi. Using genetic programming to select the informative eeg-based features to distinguish schizophrenic patients. Neural Network World, 22(1):3--20, 2012.
[29]
M. P. H. Stumpf, C. Wiuf, and R. M. May. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc. of the National Academy of Sciences, 102(12):4221--4224, 2005.
[30]
A. S. Wu, R. K. Lindsay, and R. Riolo. Empirical observations on the roles of crossover and mutation. In T. Back, editor, Proc. of the Seventh Int. Conf. on Genetic Algorithms, pages 362--369, San Francisco, CA, 1997. Morgan Kaufmann.
[31]
J. B. Yao, B. Z. Yao, L. Li, and Y. L. Jiang. Hybrid model for displacement prediction of tunnel surrounding rock. Neural Network World, 22(3):263--275, 2012.
[32]
S. Yoon, S. Lee, S. H. Yook, and Y. Kim. Statistical properties of sampled networks by random walks. Phys Rev E Stat Nonlin Soft Matter Phys, 75(4 Pt 2):046114, Apr 2007.

Cited By

View all
  • (2024)Particle Swarm Optimization and Differential Evolution for Derangement Problems2024 International Conference on Intelligent Computing and Next Generation Networks (ICNGN)10.1109/ICNGN63705.2024.10871873(01-05)Online publication date: 23-Nov-2024
  • (2020)Sampling from social networks’s graph based on topological properties and bee colony algorithmSignal and Data Processing10.29252/jsdp.17.3.5517:3(55-70)Online publication date: 1-Nov-2020
  • (2018)Guided Genetic Algorithm for Information Diffusion Problems2018 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC.2018.8477835(1-8)Online publication date: Jul-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation
July 2014
1478 pages
ISBN:9781450326629
DOI:10.1145/2576768
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data
  2. genetic algorithms
  3. networks
  4. power law
  5. sampling

Qualifiers

  • Research-article

Funding Sources

  • SGS

Conference

GECCO '14
Sponsor:
GECCO '14: Genetic and Evolutionary Computation Conference
July 12 - 16, 2014
BC, Vancouver, Canada

Acceptance Rates

GECCO '14 Paper Acceptance Rate 180 of 544 submissions, 33%;
Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Particle Swarm Optimization and Differential Evolution for Derangement Problems2024 International Conference on Intelligent Computing and Next Generation Networks (ICNGN)10.1109/ICNGN63705.2024.10871873(01-05)Online publication date: 23-Nov-2024
  • (2020)Sampling from social networks’s graph based on topological properties and bee colony algorithmSignal and Data Processing10.29252/jsdp.17.3.5517:3(55-70)Online publication date: 1-Nov-2020
  • (2018)Guided Genetic Algorithm for Information Diffusion Problems2018 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC.2018.8477835(1-8)Online publication date: Jul-2018
  • (2017)Selected aspects and tradeoffs in transistor level implementation of genetic algorithms2017 IEEE 30th International Conference on Microelectronics (MIEL)10.1109/MIEL.2017.8190110(235-238)Online publication date: Oct-2017
  • (2017)Guided Genetic Algorithm for the Influence Maximization ProblemComputing and Combinatorics10.1007/978-3-319-62389-4_52(630-641)Online publication date: 1-Jul-2017
  • (2016)Evolutionary Feature Subset Selection with Compression-based Entropy EstimationProceedings of the Genetic and Evolutionary Computation Conference 201610.1145/2908812.2908853(933-940)Online publication date: 20-Jul-2016
  • (2016)Genetic algorithm for entropy-based feature subset selection2016 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC.2016.7744360(4486-4493)Online publication date: Jul-2016
  • (2016)Optimal column subset selection for image classification by genetic algorithmsAnnals of Operations Research10.1007/s10479-016-2331-0265:2(205-222)Online publication date: 7-Oct-2016
  • (2016)A Comparison of Differential Evolution and Genetic Algorithms for the Column Subset Selection ProblemProceedings of the 9th International Conference on Computer Recognition Systems CORES 201510.1007/978-3-319-26227-7_21(223-232)Online publication date: 5-Mar-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media