Skip to main content
Log in

HyperMan: detecting misbehavior in online forums based on hyperlink posting behavior

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

How can we detect and analyze hyperlink-driven misbehavior in online forums? Online forums contain enormous amounts of user-generated contents, with threads and comments frequently supplemented by hyperlinks. These hyperlinks are often posted with malicious intention and we refer to this as ‘hyperlink-driven misbehavior.’ We present HyperMan, a systematic suite of capabilities, to detect and analyze hyperlink-driven misbehavior in online forums. We take a unique perspective focusing on hyperlink sharing practices of the users to spot misbehavior. HyperMan can categorize these hyperlinks as (a) phishing, (b) spamming, and (b) promoting malicious products. Our approach consists of three high-level phases: (a) extracting hyperlinks from the textual data, (b) identifying misbehaving hyperlinks, and (c) modeling the behavioral patterns of hyperlink sharing, where we identify key hyperlinks and analyze the collaboration dynamics of hyperlink sharing. In addition, we implement our approach as a powerful and easy-to-use open platform for practitioners. We apply HyperMan to spot misbehavior from three online security forums, where we expect the users to be more security-aware. We show that our approach works very well in terms of retrieving and classifying hyperlinks compared to previous solutions. Furthermore, we find non-trivial and often systematic misbehavior: (a) we find a total of 2703 misbehaving hyperlinks, and (b) we identify 94 colluding groups of users in terms of promoting hyperlinks. Our work is a significant step toward mining online forums and detecting misbehaving users comprehensively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Ahmad R et al (2016) Information extraction from pdf sources based on rule-based system using integrated formats. In: Semantic web evaluation challenge. Springer, pp 293–308

  • Alexa (2021) Alexa web ranking. https://www.alexa.com/siteinfo/. Accessed 2-June-2021

  • Banerjee A et al (2011) Sut: quantifying and mitigating url typosquatting. Comput Netw 55(13):3001–3014

    Article  Google Scholar 

  • Bigeard É, Grabar N (2019) Detection and analysis of medical misbehavior in online forums. In: SNAMS, IEEE, pp 7–12

  • Browserlink (2021) Popular online url extractor. https://www.browserling.com/tools/extract-urls. Accessed 2-June-2021

  • ConvertCSV (2021) Popular online url extractor. https://convertcsv.com/url-extractor.htm,browserling.com/tools/extract-urls. Accessed 2-June-2021

  • Deepa S et al (2021) Phishing website detection using novel features and machine learning approach. TURCOMAT 12(7):2648–2653

    Google Scholar 

  • Dua D, Graff C (2017) UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/phishing+websites

  • Gharibshah J, Papalexakis EE, Faloutsos M (2020) REST: a thread embedding approach for identifying and classifying user-specified information in security forums. ICWSM

  • Gharibshah J et al (2018) RIPEx: Extracting malicious ip addresses from security forums using cross-forum learning. In: PAKDD. Springer

  • HackerOne (2021) Hackerone: top 100 haking tools. https://www.hackerone.com/blog/100-hacking-tools-and-resources. Accessed 2-June-2021

  • Hunt KJ, Sbarbaro D, Zbikowski R, Gawthrop PJ (1992) Neural networks for control systems-a survey. Automatica 28(6):1083–1112

    Article  MathSciNet  MATH  Google Scholar 

  • Islam R, Rokon MOF, Darki A, Faloutsos M (2020a) Hackerscope: the dynamics of a massive hacker online ecosystem. In: 2020 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 361–368

  • Islam R, Rokon MOF, Papalexakis EE, Faloutsos M (2020b) Tenfor: a tensor-based tool to extract interesting events from security forums. In: 2020 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 515–522

  • Islam R, Rokon MOF, Darki A, Faloutsos M (2021) Hackerscope: the dynamics of a massive hacker online ecosystem. Soc Netw Anal Min 11(1):1–12

    Article  Google Scholar 

  • Islam R, Rokon MOF, Papalexakis EE, Faloutsos M (2021b) Recten: a recursive hierarchical low rank tensor factorization method to discover hierarchical patterns in multi-modal data. In: Proceedings of the international AAAI conference on web and social media

  • Islam R, Rokon MOF, Papalexakis EE, Faloutsos M (2021c) Tenfor: Tool to mine interesting events from security forums leveraging tensor decomposition. Lecture Notes in Social Networks

  • Islam R, Treves B, Rokon MOF, Faloutsos M (2021d) Linkman: hyperlink-driven misbehavior detection in online security forums. In: Proceedings of international conference on advances in social network analysis and mining (ASONAM). IEEE/ACM

  • Jonas MA, Hossain MS, Islam R, Narman HS, Atiquzzaman M (2019) An intelligent system for preventing ssl stripping-based session hijacking attacks. In: MILCOM 2019-2019 IEEE military communications conference (MILCOM). IEEE, pp 1–6

  • Knot A (2021) Hackers posing as mcafee antivirus. shorturl.at/mtuAS. Accessed 2-June-2021

  • Le A, Markopoulou A, Faloutsos M (2011) Phishdef: Url names say it all. In: 2011 Proceedings IEEE INFOCOM, IEEE, pp 191–195

  • Li TC et al (2017) Trollspot: Detecting misbehavior in commenting platforms. IEEE/ACM ASONAM 2017:171–175

  • Li Y, Yang Z, Chen X, Yuan H, Liu W (2019) A stacking model using url and html features for phishing webpage detection. Future Gener Comput Syst 94:27–39

    Article  Google Scholar 

  • Marin E et al (2018) Community finding of malware and exploit vendors on darkweb marketplaces. In: ICDIS, IEEE, pp 81–84

  • Online Forums (2021) Ethical hacker, hack this site, offensive community, wilders security. https://www.ethicalhacker.net/, https://www.hackthissite.org/, http://offensivecommunity.net/, https://www.wilderssecurity.com/, https://mpgh.net/

  • Pandya A et al (2018) On the use of urls and hashtags in age prediction of twitter users. In: IEEE IRI, pp 62–69

  • Pastrana S, Thomas DR, Hutchings A, Clayton R (2018) Crimebb: Enabling cybercrime research on underground forums at scale. In: WWW, pp 1845–1854

  • Portnoff RS, Afroz S, Durrett G, Kummerfeld JK, Berg-Kirkpatrick T, McCoy D, Levchenko K, Paxson V (2017) Tools for automated analysis of cybercriminal markets. In: WWW, p 657

  • Prasad SDV, Rao KR (2021) A novel framework for malicious url detection using hybrid model. TURCOMAT 68–76

  • Regex (2021) Regular expression format. https://en.wikipedia.org/wiki/URL/. Accessed 2-June-2021

  • Rokon MOF, Islam R, Darki A, Papalexakis EE, Faloutsos M (2020) Sourcefinder: Finding malware source-code from publicly available repositories in github. In: 23rd International symposium on research in attacks, intrusions and defenses (\(\{\)RAID\(\}\) 2020), pp 149–163

  • Rokon MOF, Yan P, Islam R, Faloutsos M (2021) Repo2vec: a comprehensive embedding approach for determining repository similarity. In: 2021 IEEE international conference on software maintenance and evolution (ICSME). IEEE

  • Sidonce J (2021) Estimated number of online forums and users. https://quora.com/How-many-online-forums-are-in-existence. Accessed 2-June-2021

  • TLDExtract (2021) Tldextract package. https://github.com/john-kurkowski/tldextract. Accessed 2-June-2021

  • Tshimula JM et al (2020) On predicting behavioral deterioration in online discussion forums. In: IEEE/ACM ASONAM, pp 190–195

  • Wold S et al (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52

    Article  Google Scholar 

  • Zhang Y et al (2007) Cantina: a content-based approach to detecting phishing web sites. In: 16th international conference on World Wide Web, ACM, pp 639–648

Download references

Acknowledgements

This work was supported by the UC Multicampus-National Lab Collaborative Research and Training (UCNLCRT) award #LFR18548554.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Risul Islam.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Islam, R., Treves, B., Rokon, M. et al. HyperMan: detecting misbehavior in online forums based on hyperlink posting behavior. Soc. Netw. Anal. Min. 12, 111 (2022). https://doi.org/10.1007/s13278-022-00943-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-022-00943-3

Keywords

Navigation