Sampling dark networks to locate people of interest

Wijegunawardana, Pivithuru; Ojha, Vatsal; Gera, Ralucca; Soundarajan, Sucheta

doi:10.1007/s13278-018-0487-0

Sampling dark networks to locate people of interest

Original Article
Published: 03 March 2018

Volume 8, article number 15, (2018)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Pivithuru Wijegunawardana ORCID: orcid.org/0000-0002-2897-9447¹,
Vatsal Ojha²,
Ralucca Gera³ &
…
Sucheta Soundarajan¹

358 Accesses
1 Citation
Explore all metrics

Abstract

Dark networks, which describe networks with covert entities and connections such as those representing illegal activities, are of great interest to intelligence analysts. However, before studying such a network, one must first collect appropriate network data. Collecting accurate network data in such a setting is a challenging task, as data collectors will make inferences, which may be incorrect, based on available intelligence data, which may itself be misleading. In this paper, we consider the problem of how to effectively sample dark networks, in which sampling queries may return incorrect information, with the specific goal of locating people of interest. We present RedLearn and RedLearnRS, two algorithms for crawling dark networks with the goal of maximizing the identification of nodes of interest, given a limited sampling budget. RedLearn assumes that a query on a node can accurately return whether a node represents a person of interest, while RedLearnRS dispenses with that assumption. We consider realistic error scenarios, which describe how individuals in a dark network may attempt to conceal their connections. We evaluate and present results on several real-world networks, including dark networks, as well as various synthetic dark network structures proposed in the criminology literature. Our analysis shows that RedLearn and RedLearnRS meet or outperform other sampling strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Seeing Red: Locating People of Interest in Networks

The Marginal Benefit of Monitor Placement on Networks

Dynamic network sampling for community detection

Article Open access 13 January 2023

Notes

The data were collected by Roberts and Everton (2011) and compiled into a network by Gera et al. (2017).
This terminology was motivated by an application in which hardware devices known as monitors are placed on computers to observe incoming and outgoing traffic.
We consider realistic types of errors to represent analysis errors; these are described in Sect. 6.5.
The optimal stopping problem studies when to take an action in order to maximize some reward. The optimal stopping problem is applicable to many disciplines including stock trading (Tsitsiklis and Roy 1999), oil drilling (Benkherouf and Bather 1988), and determining when to stop a random walk (Novikov and Shiryaev 2005). The solutions to the optimal stopping problem generally make assumptions about prior probability distribution of success and failure. Since we have a limited budget of monitors, calculating these priors is not feasible.
Obtained from https://sites.google.com/site/sfeverton18/research/appendix-1.
Obtained from http://snap.stanford.edu/data/.
Obtained from https://archive.org/download/oxford-2005-facebook-matrix.
Available at https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.generators.random_graphs.random_powerlaw_tree.html#networkx.generators.random_graphs.random_powerlaw_tree.
Availabe at https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.generators.random_graphs.powerlaw_cluster_graph.html#networkx.generators.random_graphs.powerlaw_cluster_graph.
Available at https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.generators.random_graphs.erdos_renyi_graph.html#networkx.generators.random_graphs.erdos_renyi_graph.
Results were similar for different types of centrality, including eigenvector and betweenness centrality.

References

Adamic LA, Lukose RM, Puniyani AR, Huberman BA (2001) Search in power-law networks. Phys Rev E 64(4):046–135
Article Google Scholar
Aldous D, Fill J (2002) Reversible markov chains and random walks on graphs. Berkeley
Asztalos A, Toroczkai Z (2010) Network discovery by generalized random walks. EPL (Europhys Lett) 92(5):50,008
Article Google Scholar
Avrachenkov K, Basu P, Neglia G, Ribeiro B, Towsley D (2014) Pay few, influence most: Online myopic network covering. In: IEEE NetSciCom workshop
Baker WE, Faulkner RR (1993) The social organization of conspiracy: Illegal networks in the heavy electrical equipment industry. Am Sociol Rev 58(6):837–860
Benkherouf L, Bather J (1988) Oil exploration: sequential decisions in the face of uncertainty. J Appl Probab 25(3):529–543
Article MathSciNet MATH Google Scholar
Bhagat S, Cormode G, Muthukrishnan S (2011) Node classification in social networks. In: Social network data analytics, Springer, pp 115–148
Biernacki P, Waldorf D (1981) Snowball sampling: problems and techniques of chain referral sampling. Soc Methods Res 10(2):141–163
Article Google Scholar
Bliss CA, Danforth CM, Dodds PS (2014) Estimation of global network statistics from incomplete data. PloS ONE 9(10):e108,471
Article Google Scholar
Bnaya Z, Puzis R, Stern R, Felner A (2013) Social network search as a volatile multi-armed bandit problem. HUMAN 2(2):84
Google Scholar
Burfoot C, Bird S, Baldwin T (2011) Collective classification of congressional floor-debate transcripts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-vol 1, Association for Computational Linguistics, pp 1506–1515
Carvalho VR, Cohen WW (2005) On the collective classification of email speech acts. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 345–352
Chen H, Chung W, Xu JJ, Wang G, Qin Y, Chau M (2004) Crime data mining: a general framework and some examples. Computer 37(4):50–56
Article Google Scholar
Davis B, Gera R, Lazzaro G, Lim BY, Rye EC (2016) The marginal benefit of monitor placement on networks. In: Complex networks VII, Springer, pp 93–104
Erdos P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5(1):17–60
MathSciNet MATH Google Scholar
Friedman N, Koller D (2003) Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks. Mach Learn 50(1–2):95–125
Article MATH Google Scholar
Fronczak A, Fronczak P (2009) Biased random walks in complex networks: the role of local navigation rules. Phys Rev E 80(1):016–107
Article MathSciNet MATH Google Scholar
Gallagher B, Tong H, Eliassi-Rad T, Faloutsos C (2008) Using ghost edges for classification in sparsely labeled networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 256–264
Gera R, Miller R, MirandaLopez M, Warnke S, Saxena A (2017) Three is the answer: combining relationships to analyze multilayered terrorist networks. In: Advances in social networks analysis and mining (ASONAM), 2017 IEEE/ACM, IEEE
Hanneke S, Xing EP (2009) Network completing and survey sampling. In: AISTATS, pp 209–215
Holme P, Kim BJ (2002) Growing scale-free networks with tunable clustering. Phys Rev E 65(2):026–107
Article Google Scholar
Hughes BD (1995) Random walks and random environments. Oxford, vol 2, 1995–1996
Koschade S (2006) A social network analysis of jemaah islamiyah: The applications to counterterrorism and intelligence. Stud Confl Terror 29(6):559–575
Article Google Scholar
Le V (2012) Organised crime typologies: structure, activities and conditions. Int J Criminol Sociol 1:121–131
Google Scholar
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: SIGKDD, ACM, pp 631–636
Lin F, Cohen WW (2010) Semi-supervised classification of network data using very few labels. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, pp 192–199
Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp 496–503
Lu Y, Luo X, Polgar M, Cao Y (2010) Social network analysis of a criminal hacker community. J Comput Inf Syst 51(2):31–41
Google Scholar
Macskassy SA, Provost F (2005) Suspicion scoring based on guilt-by-association, collective inference, and focused data access. In: International Conference on Intelligence Analysis
Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: PAKDD, pp 91–98
Michalak TP, Rahwan T, Wooldridge M (2017) Strategic social network analysis. In: AAAI, pp 4841–4845
Neville J, Jensen D (2000) Iterative classification in relational data. In: Proceedings of AAAI-2000 workshop on learning statistical models from relational data, pp 13–20
Noh JD, Rieger H (2004) Random walks on complex networks. Phys Rev Lett 92(11):118–701
Article Google Scholar
Novikov AA, Shiryaev AN (2005) On an effective solution of the optimal stopping problem for random walks. Theory Probab Appl 49(2):344–354
Article MathSciNet MATH Google Scholar
Raab J, Milward HB (2003) Dark networks as problems. J Public Adm Res Theory 13(4):413–439
Article Google Scholar
Roberts N, Everton S (2011) Terrorist data: Noordin top terrorist network. https://sites.google.com/site/sfeverton18/research/appendix-1
Schwartz DM, Rouselle TD (2009) Using social network analysis to target criminal networks. Trends Organ Crime 12(2):188–207
Article Google Scholar
Sparrow MK (1991) The application of network analysis to criminal intelligence: an assessment of the prospects. Soc Netw 13(3):251–274
Article Google Scholar
Stern RT, Samama L, Puzis R, Beja T, Bnaya Z, Felner A (2013) Tonic: Target oriented network intelligence collection for the social web. In: AAAI
Tsitsiklis JN, Van Roy B (1999) Optimal stopping of markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans Autom Control 44(10):1840–1851
Article MathSciNet MATH Google Scholar
Wijegunawardana P, Ojha V, Gera R, Soundarajan S (2017) Seeing red: locating people of interest in networks. In: Workshop on Complex Networks CompleNet, Springer, pp 141–150
Xiang R, Neville J, Rogati M (2010) Modeling relationship strength in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, ACM, pp 981–990
Yan G (2013) Peri-watchdog: hunting for hidden botnets in the periphery of online social networks. Comput Netw 57(2):540–555
Article Google Scholar
Zheleva E, Getoor L (2009) To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: Proceedings of the 18th International Conference on World Wide Web, ACM, pp 531–540
Zhu X, Ghahramani Z, Lafferty J et al (2003) Semi-supervised learning using gaussian fields and harmonic functions. ICML 3:912–919
Google Scholar

Download references

Acknowledgements

R. Gera thanks the DoD for partially sponsoring this work. This research was supported in part through computational resources provided by Syracuse University.

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, USA
Pivithuru Wijegunawardana & Sucheta Soundarajan
Science and Humanities Scholars Program, Carnegie Mellon University, Pittsburgh, PA, USA
Vatsal Ojha
Department of Applied Mathematics, Naval Postgraduate School, Monterey, CA, USA
Ralucca Gera

Authors

Pivithuru Wijegunawardana
View author publications
You can also search for this author in PubMed Google Scholar
Vatsal Ojha
View author publications
You can also search for this author in PubMed Google Scholar
Ralucca Gera
View author publications
You can also search for this author in PubMed Google Scholar
Sucheta Soundarajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pivithuru Wijegunawardana.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wijegunawardana, P., Ojha, V., Gera, R. et al. Sampling dark networks to locate people of interest. Soc. Netw. Anal. Min. 8, 15 (2018). https://doi.org/10.1007/s13278-018-0487-0

Download citation

Received: 02 October 2017
Revised: 16 December 2017
Accepted: 24 January 2018
Published: 03 March 2018
DOI: https://doi.org/10.1007/s13278-018-0487-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sampling dark networks to locate people of interest

Abstract

Access this article

Similar content being viewed by others

Seeing Red: Locating People of Interest in Networks

The Marginal Benefit of Monitor Placement on Networks

Dynamic network sampling for community detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sampling dark networks to locate people of interest

Abstract

Access this article

Similar content being viewed by others

Seeing Red: Locating People of Interest in Networks

The Marginal Benefit of Monitor Placement on Networks

Dynamic network sampling for community detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation