Skip to main content
Log in

Target oriented network intelligence collection: effective exploration of social networks

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Target Oriented Network Intelligence Collection (TONIC) is a crawling process whose goal is to find social network profiles that contain information about a given target. Such profiles are called leads and the TONIC problem is how to minimize crawling costs incurred while finding them. We model this problem as a search problem in an unknown graph and present a best-first search approach for solving it. Three key challenges are (1) which profiles to consider crawling to, (2) how to prioritize the crawling order, and (3) when additional crawling is not worthwhile. For the first challenge, we propose two frameworks: the Restricted TONIC Framework (RTF), that restricts the search to immediate neighbors of previously found leads, and the Extended TONIC Framework (ETF), that extends the scope of the search to a wider neighborhood. Guidelines for when to choose which framework are provided. For the second challenge, we propose a set of effective topology-based heuristics that guide the search towards profiles that are more likely to be leads. For the third challenge, we propose to use data collected in previously executed crawls to learn when additional crawling is expected to be useful.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

Similar content being viewed by others

Notes

  1. Note that the acquire action does not include sophisticated information extraction methods: it simply downloads all data and extracts the LOF. As mentioned above, further analysis of this data may be done by a human analyst.

  2. In some OSNs, profiles can block their LOF, so that it is not possible to perform the IsLead() query on them. For our purposes, they will be regarded as non-leads, since we cannot verify that they are leads.

  3. Sophisticated TONIC applications may assign rewards that decay with time or are dependent on the amount of information about the target that can be extracted from the lead. We focus on a simpler reward model in which the reward of finding a lead is constant.

  4. Initially, it is possible that L(m) + NL(m) = 0, making pf(m) undefined. To avoid this, we set pf(m) = 0.5 in this case.

  5. A more comprehensive discussion on the relation between link prediction and TONIC is given in Section 3.

  6. The exact setting of this experiment is provided below in the experimental section.

References

  1. Adamic, L.A., Lukose, R.M., Puniyani, A.R., Huberman, B.A.: Search in power-law networks. Phys. Rev. E 64, 046135 (2001)

    Article  Google Scholar 

  2. Aggarwal, C.C., Al-Garawi, F., Yu, P.S.: Intelligent crawling on the world wide web with arbitrary predicates. In: Proceedings of the 10th international conference on World Wide Web. ACM, pp. 96–105 (2001)

  3. Almpanidis, G., Kotropoulos, C., Pitas, I.: Combining text and link analysis for focused crawling—an application for vertical search engines. Inf. Syst. 32(6), 886–908 (2007)

    Article  Google Scholar 

  4. Altshuler, Y., Aharony, N., Fire, M., Elovici, Y., Pentland, A.: Incremental learning with accuracy prediction of social and individual properties from mobile-phone data, CoRR, vol. arXiv:1111.4645. [Online]. Available: http://dblp.uni-trier.de/db/journals/corr/corr1111.html#abs-1111-4645 (2011)

  5. Altshuler, Y., Elovici, Y., Cremers, A.B., Aharony, N., Pentland, A.: Security and Privacy in Social Networks. Springer, Berlin (2012)

    Google Scholar 

  6. Backstrom, L., Huttenlocher, D., Kleinberg, J., Lan, X.: Group formation in large social networks: Membership, growth, and evolution. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 44–54 (2006)

  7. Barabási, A.-L., Réka, A.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bidoki, A.M.Z., Yazdani, N., Ghodsnia, P.: FICA: A fast intelligent crawling algorithm. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, pp. 635–641 (2007)

  9. Bnaya, Z., Puzis, R., Stern, R., Felner, A.: Social network search as a volatile multi-armed bandit problem. ASE Human 2(2), pp–84 (2013)

    Google Scholar 

  10. Bujlow, T., Carela-Español, V., Sole-Pareta, J., Barlet-Ros, P.: A survey on web tracking: mechanisms, implications, and defenses. Proc. IEEE 105(8), 1476–1510 (2017)

    Article  Google Scholar 

  11. Cai, R., Yang, J.-M., Lai, W., Wang, Y., Zhang, L.: irobot: An intelligent crawler for web forums. In: Proceedings of the 17th international conference on World Wide Web. ACM, pp. 447–456 (2008)

  12. Chakrabarti, S., Van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. Comput. Netw. 31(11), 1623–1640 (1999)

    Article  Google Scholar 

  13. Chang, C., Kayed, M., Girgis, M., Shaalan, K., et al.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411 (2006)

    Article  Google Scholar 

  14. Chen, Z., Ma, J., Lei, J., Yuan, B., Lian, L.: An improved shark-search algorithm based on multi-information. In: 2007. FSKD 2007. Fourth International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, vol. 4, pp. 659–658 (2007)

  15. Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system. In: ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 785–794 (2016)

  16. Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through url ordering. Comput. Netw. ISDN Syst. 30, 161–172 (1998)

    Article  Google Scholar 

  17. Croft, W., Metzler, D., Strohman, T.: Search engines: Information retrieval in practice. Addison-Wesley, Reading (2010)

    Google Scholar 

  18. Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, pp. 281–288 (2011)

  19. De Bra, P., Post, R.: Searching for Arbitrary Information in the Www: the Fish-Search for Mosaic. In: WWW (1994)

  20. Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M., et al.: Focused crawling using context graphs. In: VLDB, pp. 527–534 (2000)

  21. Dong, Y., Tang, J., Wu, S., Tian, J., Chawla, N.V., Rao, J., Cao, H.: Link prediction and recommendation across heterogeneous social networks. In: 2012 IEEE 12th International Conference on Data Mining. IEEE, pp. 181–190 (2012)

  22. Ermakova, T., Fabian, B., Bender, B., Klimek, K.: Web Tracking – a Literature Review on the State of Research. In: HICSS 51 (2018)

  23. Felner, A., Stern, R., Ben-Yair, A., Kraus, S., Netanyahu, N.: PhA*: Finding the shortest path with A* in unknown physical environments. J. Artif. Intell. Res. 21, 631–679 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  24. Fire, M., Tenenboim, L., Lesser, O., Puzis, R., Rokach, L., Elovici, Y.: Link prediction in social networks using computationally efficient topological features. In: IEEE international conference on social computing (SocialCom), pp. 73–80 (2011)

  25. Fire, M., Katz, G., Elovici, Y., Shapira, B., Rokach, L.: Predicting student exam’s scores by analyzing social network data. In: AMT, pp. 584–595 (2012)

  26. Fire, M., Tenenboim-Chekina, L., Puzis, R., Lesser, O., Rokach, L., Elovici, Y.: Computationally efficient link prediction in a variety of social networks. ACM Trans Intell Syst Technol (TIST) 5(1), 10 (2013)

    Google Scholar 

  27. Fire, M., Tenenboim-Chekina, L., Puzis, R., Lesser, O., Rokach, L., Elovici, Y.: Computationally efficient link prediction in a variety of social networks, ACM Trans. Intell. Syst. Technol. 5(1), 1–25 (2014)

    Google Scholar 

  28. Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in facebook: A case study of unbiased sampling of osns. In: INFOCOM, pp. 1–9 (2010)

  29. Hersovici, M., Jacovi, M., Maarek, Y.S., Pelleg, D., Shtalhaim, M., Ur, S.: The shark-search algorithm. an application: tailored web site mapping. Comput. Netw. ISDN Syst. 30(1), 317–326 (1998)

    Article  Google Scholar 

  30. Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf Syst 20(4), 422–446 (2002)

    Article  Google Scholar 

  31. Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)

    Article  MATH  Google Scholar 

  32. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  33. Klerks, P.: The network paradigm applied to criminal organizations: Theoretical nitpicking or a relevant doctrine for investigators? recent developments in the netherlands. Connections 24(3), 53–65 (2001)

    Google Scholar 

  34. Kurant, M., Gjoka, M., Butts, C.T., Markopoulou, A.: Walking on a graph with a magnifying glass: Stratified sampling via weighted random walks. In: ACM Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), pp. 281–292 (2011)

  35. Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 29–123 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  36. Li, X., Smith, J.D., Dinh, T.N., Thai, M.T.: Privacy issues in light of reconnaissance attacks with incomplete information. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 311–318 (2016)

  37. Li, X., Smith, J.D., Thai, M.T.: Adaptive reconnaissance attacks with near-optimal parallel batching. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE. pp. 699–709 (2017)

  38. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J Amer Soc Inf Sci Technol 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  39. McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: Homophily in social networks. Annu. Rev. Sociol. 27(1), 415–444 (2001)

    Article  Google Scholar 

  40. Menczer, F., Pant, G., Srinivasan, P., Ruiz, M.E.: Evaluating topic-driven web crawlers. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp. 241–249 (2001)

  41. Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proceedings of the third ACM international conference on Web search and data mining. ACM, pp. 251–260 (2010)

  42. Mitchell, T.M.: Machine learning. McGraw-Hill, McGraw-Hill (1997)

    MATH  Google Scholar 

  43. Pawlas, P., Domański, A., Domańska, J.: Universal web pages content parser. In: Computer Networks. Springer, pp. 130–138 (2012)

  44. Russell, S.J., Norvig, P.: Artificial intelligence - A modern approach pearson education (2010)

  45. Samama-Kachko, L., Puzis, R., Stern, R., Felner, A.: Extended Framework for Target Oriented Network Intelligence Collection. In: Symposium on Combinatorial Search (SoCS) (2014)

  46. Stern, R., Kalech, M., Felner, A.: Searching for a K-Clique in Unknown Graphs. In: SOCS (2010)

  47. Stern, R.: Finding patterns in an unknown graph. AI Commun. 25(3), 229–256 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  48. Stern, R.T., Samama, L., Puzis, R., Beja, T., Bnaya, Z., Felner, A.: TONIC Target Oriented Network Intelligence Collection for the Social Web. In: AAAI (2013)

  49. Takac, L., Zabovsky, M.: Data analysis in public social networks. In: International Scientific Conference and International Workshop Present Day Trends of Innovations, pp. 1–6 (2012)

  50. Tang, J., Lou, T., Kleinberg, J.: Inferring social ties across heterogenous networks. In: Proceedings of the fifth ACM international conference on Web search and data mining. ACM, pp. 743–752 (2012)

  51. Tang, J., Yao, L., Zhang, D., Zhang, J.: A combination approach to web user profiling. ACM Trans. Knowl. Discov. Data 5(1), 2:1–2:44 (2010)

    Article  Google Scholar 

  52. Vempaty, N.R., Kumar, V., Korf, R.E.: Depth-first vs best-first search. In: National Conference on Artificial Intelligence (AAAI), pp. 434–440 (1991)

  53. Wang, W., Chen, X., Zou, Y., Wang, H., Dai, Z.: A focused crawler based on naive bayes classifier. In: 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (IITSI). IEEE, pp. 517–521 (2010)

  54. Watts, D.J., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature 393, 6684 (1998)

    Article  MATH  Google Scholar 

  55. Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17(3), 73–83 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roni Stern.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Puzis, R., Kachko, L., Hagbi, B. et al. Target oriented network intelligence collection: effective exploration of social networks. World Wide Web 22, 1447–1480 (2019). https://doi.org/10.1007/s11280-018-0648-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-018-0648-0

Keywords

Navigation