Abstract
Online Social Networks witness a rise in user activity whenever a major event makes news. Cyber criminals exploit this spur in user engagement levels to spread malicious content that compromises system reputation, causes financial losses and degrades user experience. In this paper, we collect and characterize a dataset of 4.4 million public posts generated on Facebook during 17 news-making events (natural calamities, sports, terror attacks, etc.) over a 16-month time period. From this dataset, we filter out two sets of malicious posts, one using URL blacklists and another using human annotations. Our observations reveal some characteristic differences between malicious posts obtained from the two methodologies, thus demanding a twofold filtering process for a more complete and robust filtering system. We empirically confirm the need for this twofold filtering approach by cross-validating supervised learning models obtained from the two sets of malicious posts. These supervised learning models include Naive Bayesian, Decision Trees, Random Forest, and Support Vector Machine-based models. Based on this learning, we implement Facebook Inspector, a REST API-based browser plug-in for identifying malicious Facebook posts in real time. Facebook Inspector uses class probabilities obtained from two independent supervised learning models based on a Random Forest classifier to identify malicious posts in real time. These supervised learning models are based on a feature set comprising of 44 features and achieve an accuracy of over 80% each, using only publicly available features. During the first 9 months of its public deployment (August 2015–May 2016), Facebook Inspector processed 0.97 million posts at an average response time of 2.6 s per post and was downloaded over 2500 times. We also evaluate Facebook Inspector in terms of performance and usability to identify further scope for improvement.
Similar content being viewed by others
Notes
http://multiosn.iiitd.edu.in/fbapi/endpoint/?version=2.0&fid=<post_id>.
We refer to a post as malicious if it contains a malicious URL.
The top 25 applications were used to generate over 95% of content in all three categories we analyzed.
Overall, only 24.74% of all posts in Dataset III originated from pages.
We marked events as crisis based on the following definition of crisis from the Oxford English Dictionary: “A time of intense difficulty or danger”.
As of May, 2016. Source: http://www.w3schools.com/browsers/browsers_stats.asp.
References
Acar A, Muraki Y (2011) Twitter for crisis communication: lessons learned from Japan’s tsunami disaster. Int J Web Based Communities 7(3):392–402
Aggarwal A, Rajadesingan A, Kumaraguru P (2012) Phishari: automatic realtime phishing detection on twitter. In: eCrime Researchers Summit (eCrime), 2012. IEEE, pp 1–12
Ahmed F, Abulaish M (2012) An mcl-based approach for spam profile detection in online social networks. In: IEEE TrustCom. IEEE, pp 602–608
Antoniades D, Polakis I, Kontaxis G, Athanasopoulos E, Ioannidis S, Markatos EP, Karagiannis T (2011) we. b: The web of short URLs. In: Proceedings of WWW. ACM, pp 715–724
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: CEAS, vol 6, p 12
Benevenuto F, Rodrigues T, Almeida V, Almeida J, Gonçalves M (2009) Detecting spammers and content promoters in online video social networks. In: Proceedings of ACM SIGIR. ACM, pp 620–627
Brooke J (1996) SUS-a quick and dirty usability scale. Usability Eval Ind 189:194
Castillo C, Mendoza M, Poblete B (2011) Information credibility on twitter. In: WWW. ACM, pp 675–684
Catanese S, De Meo P, Ferrara E, Fiumara G, Provetti A (2012) Extraction and analysis of facebook friendship relations. In: Computational Social Networks. Springer, Berlin, pp 291–324
Chhabra S, Aggarwal A, Benevenuto F, Kumaraguru P (2011) Phi.sh/$ocial: the phishing landscape through short urls. In: CEAS. ACM, pp 92–101
Chu Z, Widjaja I, Wang H (2012) Detecting social spam campaigns on twitter. In: Applied cryptography and network security. Springer, Berlin, pp 455–472
Facebook (2014) http://newsroom.fb.com/company-info/. Facebook Company Info
Facebook, Ericsson, Qualcomm (2013) A focus on efficiency. Whitepaper, Internet.org
Facebook Developers (2011) Keeping you safe from scams and spam. https://www.facebook.com/notes/facebook-security/keeping-you-safe-from-scams-and-spam/10150174826745766
Facebook Developers (2013) Facebook graph api search. https://developers.facebook.com/docs/graph-api/using-graph-api/v1.0#search
Gao H, Chen Y, Lee K, Palsetia D, Choudhary AN (2012) Towards online spam filtering in social networks. In: NDSS
Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Internet measurement conference. ACM, pp 35–47
Gao H, Yang Y, Bu K, Chen Y, Downey D, Lee K, Choudhary A (2014) Spam ain’t as diverse as it seems: throttling osn spam with templates underneath. In: Proceedings of the 30th annual computer security applications conference. ACM, pp 76–85
Google (2014) Safe browsing api. https://developers.google.com/safe-browsing/
Grier C, Thomas K, Paxson V, Zhang M (2010) @ spam: the underground on 140 characters or less. In: CCS. ACM, pp 27–37
Gupta A, Kumaraguru P (2012) Credibility ranking of tweets during high impact events. In: PSOSM. ACM
Gupta A, Kumaraguru P, Castillo C, Meier P (2014) Tweetcred: Real-time credibility assessment of content on twitter. In: Social Informatics. Springer, Berlin, pp 228–243
Gupta A, Lamba H, Kumaraguru P (2013) $1.00 per rt #bostonmarathon #prayforboston: analyzing fake content on twitter. In: eCRS. IEEE, p 12
Gupta M, Zhao P, Han J (2012) Evaluating event credibility on twitter. In: SDM. SIAM, pp 153–164
Hispasec Sistemas SL (2013) VirusTotal Public API. https://www.virustotal.com/en/documentation/public-api/
Holcomb J, Gottfried J, Mitchell A (2013) News use across social media platforms. Technical report, Pew Research Center
Marca.com (2014) Luis suarez used as bait for Facebook scam. http://www.marca.com/2014/07/18/en/football/barcelona/1405709402.html
McAuley JJ, Leskovec J (2012) Learning to discover social circles in ego networks. NIPS 2012:548–556
Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we trust what we rt? In: Proceedings of the first workshop on social media analytics. ACM, pp 71–79
OpenDNS (2014) Phishtank api. http://www.phishtank.com/api_info.php
Opsahl T (2013) Triadic closure in two-mode networks: redefining the global and local clustering coefficients. Soc Netw 35(2):159–167
Owens E, Turitzin C (2014) News feed fyi: Cleaning up news feed spam. http://newsroom.fb.com/news/2014/04/news-feed-fyi-cleaning-up-news-feed-spam/
Owens E, Weinsberg U (2015) News feed fyi: Showing fewer hoaxes. https://newsroom.fb.com/news/2015/01/news-feed-fyi-showing-fewer-hoaxes/
Palen L (2008) Online social media in crisis events. Educ Q 31(3):76–78
Rahman MS, Huang T-K, Madhyastha HV, Faloutsos M (2012) Efficient and scalable socware detection in online social networks. In: USENIX security symposium, pp 663–678
Rudra K, Banerjee S, Ganguly N, Goyal P, Imran M, Mitra P (2016) Summarizing situational tweets in crisis scenario. In: Proceedings of the 27th ACM conference on hypertext and social media. ACM, pp 137–147
Security RBS (2010) Facebook names dataset. https://blog.skullsecurity.org/2010/return-of-the-facebook-snatchers
Semaan B, Mark G (2012) ’facebooking’towards crisis recovery and beyond: disruption as an opportunity. In: Proceedings of the ACM 2012 conference on computer supported cooperative work. ACM, pp 27–36
Sheng S, Wardman B, Warner G, Cranor L, Hong J, Zhang C (2009) An empirical analysis of phishing blacklists. In: Sixth conference on Email and anti-spam (CEAS)
SpamHaus (2014) Domain block list. http://www.spamhaus.org/dbl/
Stein T, Chen E, Mangla K (2011) Facebook immune system. In: Workshop on social network systems. ACM, p 8
Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: ACSAC. ACM, pp 1–9
SURBL, URI (2011) Reputation data. http://www.surbl.org/surbl-analysis
Szell M, Grauwin S, Ratti C (2014) Contraction of online response to major events. PLoS One 9(2):e89052 MIT
TheGuardian (2013) Facebook spammers make $200m just posting links, researchers say. http://www.theguardian.com/technology/2013/aug/28/facebook-spam-202-million-italian-research
Traud AL, Mucha PJ, Porter MA (2012) Social structure of facebook networks. Physica A 391(16):4165–4180
Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in facebook. In: Proceedings of the 2nd ACM workshop on online social networks. ACM, pp 37–42
Wang AH (2010) Don’t follow me: Spam detection in twitter. In: SECRYPT. IEEE, pp 1–10
WOT (2014) Web of trust api. https://www.mywot.com/en/api
Zech M (2014) Flight 17 spam scams on facebook, twitter. http://www.nltimes.nl/2014/07/22/flight-17-spam-scams-facebook-twitter/
Zhang X, Zhu S, Liang W (2012) Detecting spam and promoting campaigns in the twitter social network. In: 2012 IEEE 12th international conference on data mining (ICDM). IEEE, pp 1194–1199
Zhu T, Gao H, Yang Y, Bu K, Chen Y, Downey D, Lee K, Choudhary AN (2016) Beating the artificial chaos: fighting OSN spam using its own templates. IEEE/ACM Trans Netw 24(6):3856–3869
Acknowledgements
We would like to thank Manik Panwar for helping with the development of Facebook Inspector and Bhavna Nagpal for helping with conducting the usability survey. We would also like to thank the members of Precog Research Group at IIIT-Delhi for their constant support and feedback.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dewan, P., Kumaraguru, P. Facebook Inspector (FbI): Towards automatic real-time detection of malicious content on Facebook. Soc. Netw. Anal. Min. 7, 15 (2017). https://doi.org/10.1007/s13278-017-0434-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-017-0434-5