Skip to main content
Log in

Sampling dark networks to locate people of interest

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Dark networks, which describe networks with covert entities and connections such as those representing illegal activities, are of great interest to intelligence analysts. However, before studying such a network, one must first collect appropriate network data. Collecting accurate network data in such a setting is a challenging task, as data collectors will make inferences, which may be incorrect, based on available intelligence data, which may itself be misleading. In this paper, we consider the problem of how to effectively sample dark networks, in which sampling queries may return incorrect information, with the specific goal of locating people of interest. We present RedLearn and RedLearnRS, two algorithms for crawling dark networks with the goal of maximizing the identification of nodes of interest, given a limited sampling budget. RedLearn assumes that a query on a node can accurately return whether a node represents a person of interest, while RedLearnRS dispenses with that assumption. We consider realistic error scenarios, which describe how individuals in a dark network may attempt to conceal their connections. We evaluate and present results on several real-world networks, including dark networks, as well as various synthetic dark network structures proposed in the criminology literature. Our analysis shows that RedLearn and RedLearnRS meet or outperform other sampling strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. The data were collected by Roberts and Everton (2011) and compiled into a network by Gera et al. (2017).

  2. This terminology was motivated by an application in which hardware devices known as monitors are placed on computers to observe incoming and outgoing traffic.

  3. We consider realistic types of errors to represent analysis errors; these are described in Sect. 6.5.

  4. The optimal stopping problem studies when to take an action in order to maximize some reward. The optimal stopping problem is applicable to many disciplines including stock trading (Tsitsiklis and Roy 1999), oil drilling (Benkherouf and Bather 1988), and determining when to stop a random walk (Novikov and Shiryaev 2005). The solutions to the optimal stopping problem generally make assumptions about prior probability distribution of success and failure. Since we have a limited budget of monitors, calculating these priors is not feasible.

  5. Obtained from https://sites.google.com/site/sfeverton18/research/appendix-1.

  6. Obtained from http://snap.stanford.edu/data/.

  7. Obtained from https://archive.org/download/oxford-2005-facebook-matrix.

  8. Available at https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.generators.random_graphs.random_powerlaw_tree.html#networkx.generators.random_graphs.random_powerlaw_tree.

  9. Availabe at https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.generators.random_graphs.powerlaw_cluster_graph.html#networkx.generators.random_graphs.powerlaw_cluster_graph.

  10. Available at https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.generators.random_graphs.erdos_renyi_graph.html#networkx.generators.random_graphs.erdos_renyi_graph.

  11. Results were similar for different types of centrality, including eigenvector and betweenness centrality.

References

  • Adamic LA, Lukose RM, Puniyani AR, Huberman BA (2001) Search in power-law networks. Phys Rev E 64(4):046–135

    Article  Google Scholar 

  • Aldous D, Fill J (2002) Reversible markov chains and random walks on graphs. Berkeley

  • Asztalos A, Toroczkai Z (2010) Network discovery by generalized random walks. EPL (Europhys Lett) 92(5):50,008

    Article  Google Scholar 

  • Avrachenkov K, Basu P, Neglia G, Ribeiro B, Towsley D (2014) Pay few, influence most: Online myopic network covering. In: IEEE NetSciCom workshop

  • Baker WE, Faulkner RR (1993) The social organization of conspiracy: Illegal networks in the heavy electrical equipment industry. Am Sociol Rev 58(6):837–860

  • Benkherouf L, Bather J (1988) Oil exploration: sequential decisions in the face of uncertainty. J Appl Probab 25(3):529–543

    Article  MathSciNet  MATH  Google Scholar 

  • Bhagat S, Cormode G, Muthukrishnan S (2011) Node classification in social networks. In: Social network data analytics, Springer, pp 115–148

  • Biernacki P, Waldorf D (1981) Snowball sampling: problems and techniques of chain referral sampling. Soc Methods Res 10(2):141–163

    Article  Google Scholar 

  • Bliss CA, Danforth CM, Dodds PS (2014) Estimation of global network statistics from incomplete data. PloS ONE 9(10):e108,471

    Article  Google Scholar 

  • Bnaya Z, Puzis R, Stern R, Felner A (2013) Social network search as a volatile multi-armed bandit problem. HUMAN 2(2):84

    Google Scholar 

  • Burfoot C, Bird S, Baldwin T (2011) Collective classification of congressional floor-debate transcripts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-vol 1, Association for Computational Linguistics, pp 1506–1515

  • Carvalho VR, Cohen WW (2005) On the collective classification of email speech acts. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 345–352

  • Chen H, Chung W, Xu JJ, Wang G, Qin Y, Chau M (2004) Crime data mining: a general framework and some examples. Computer 37(4):50–56

    Article  Google Scholar 

  • Davis B, Gera R, Lazzaro G, Lim BY, Rye EC (2016) The marginal benefit of monitor placement on networks. In: Complex networks VII, Springer, pp 93–104

  • Erdos P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5(1):17–60

    MathSciNet  MATH  Google Scholar 

  • Friedman N, Koller D (2003) Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks. Mach Learn 50(1–2):95–125

    Article  MATH  Google Scholar 

  • Fronczak A, Fronczak P (2009) Biased random walks in complex networks: the role of local navigation rules. Phys Rev E 80(1):016–107

    Article  MathSciNet  MATH  Google Scholar 

  • Gallagher B, Tong H, Eliassi-Rad T, Faloutsos C (2008) Using ghost edges for classification in sparsely labeled networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 256–264

  • Gera R, Miller R, MirandaLopez M, Warnke S, Saxena A (2017) Three is the answer: combining relationships to analyze multilayered terrorist networks. In: Advances in social networks analysis and mining (ASONAM), 2017 IEEE/ACM, IEEE

  • Hanneke S, Xing EP (2009) Network completing and survey sampling. In: AISTATS, pp 209–215

  • Holme P, Kim BJ (2002) Growing scale-free networks with tunable clustering. Phys Rev E 65(2):026–107

    Article  Google Scholar 

  • Hughes BD (1995) Random walks and random environments. Oxford, vol 2, 1995–1996

  • Koschade S (2006) A social network analysis of jemaah islamiyah: The applications to counterterrorism and intelligence. Stud Confl Terror 29(6):559–575

    Article  Google Scholar 

  • Le V (2012) Organised crime typologies: structure, activities and conditions. Int J Criminol Sociol 1:121–131

    Google Scholar 

  • Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: SIGKDD, ACM, pp 631–636

  • Lin F, Cohen WW (2010) Semi-supervised classification of network data using very few labels. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, pp 192–199

  • Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp 496–503

  • Lu Y, Luo X, Polgar M, Cao Y (2010) Social network analysis of a criminal hacker community. J Comput Inf Syst 51(2):31–41

    Google Scholar 

  • Macskassy SA, Provost F (2005) Suspicion scoring based on guilt-by-association, collective inference, and focused data access. In: International Conference on Intelligence Analysis

  • Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: PAKDD, pp 91–98

  • Michalak TP, Rahwan T, Wooldridge M (2017) Strategic social network analysis. In: AAAI, pp 4841–4845

  • Neville J, Jensen D (2000) Iterative classification in relational data. In: Proceedings of AAAI-2000 workshop on learning statistical models from relational data, pp 13–20

  • Noh JD, Rieger H (2004) Random walks on complex networks. Phys Rev Lett 92(11):118–701

    Article  Google Scholar 

  • Novikov AA, Shiryaev AN (2005) On an effective solution of the optimal stopping problem for random walks. Theory Probab Appl 49(2):344–354

    Article  MathSciNet  MATH  Google Scholar 

  • Raab J, Milward HB (2003) Dark networks as problems. J Public Adm Res Theory 13(4):413–439

    Article  Google Scholar 

  • Roberts N, Everton S (2011) Terrorist data: Noordin top terrorist network. https://sites.google.com/site/sfeverton18/research/appendix-1

  • Schwartz DM, Rouselle TD (2009) Using social network analysis to target criminal networks. Trends Organ Crime 12(2):188–207

    Article  Google Scholar 

  • Sparrow MK (1991) The application of network analysis to criminal intelligence: an assessment of the prospects. Soc Netw 13(3):251–274

    Article  Google Scholar 

  • Stern RT, Samama L, Puzis R, Beja T, Bnaya Z, Felner A (2013) Tonic: Target oriented network intelligence collection for the social web. In: AAAI

  • Tsitsiklis JN, Van Roy B (1999) Optimal stopping of markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans Autom Control 44(10):1840–1851

    Article  MathSciNet  MATH  Google Scholar 

  • Wijegunawardana P, Ojha V, Gera R, Soundarajan S (2017) Seeing red: locating people of interest in networks. In: Workshop on Complex Networks CompleNet, Springer, pp 141–150

  • Xiang R, Neville J, Rogati M (2010) Modeling relationship strength in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, ACM, pp 981–990

  • Yan G (2013) Peri-watchdog: hunting for hidden botnets in the periphery of online social networks. Comput Netw 57(2):540–555

    Article  Google Scholar 

  • Zheleva E, Getoor L (2009) To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: Proceedings of the 18th International Conference on World Wide Web, ACM, pp 531–540

  • Zhu X, Ghahramani Z, Lafferty J et al (2003) Semi-supervised learning using gaussian fields and harmonic functions. ICML 3:912–919

    Google Scholar 

Download references

Acknowledgements

R. Gera thanks the DoD for partially sponsoring this work. This research was supported in part through computational resources provided by Syracuse University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pivithuru Wijegunawardana.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wijegunawardana, P., Ojha, V., Gera, R. et al. Sampling dark networks to locate people of interest. Soc. Netw. Anal. Min. 8, 15 (2018). https://doi.org/10.1007/s13278-018-0487-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-018-0487-0

Keywords

Navigation