Abstract
Social Honeypots are tools deployed in Online Social Networks (OSN) to attract malevolent activities performed by spammers and bots. To this end, their content is designed to be of maximum interest to malicious users. However, by choosing an appropriate content topic, this attractive mechanism could be extended to any OSN users, rather than only luring malicious actors. As a result, honeypots can be used to attract individuals interested in a wide range of topics, from sports and hobbies to more sensitive subjects like political views and conspiracies. With all these individuals gathered in one place, honeypot owners can conduct many analyses, from social to marketing studies.
In this work, we introduce a novel concept of social honeypot for attracting OSN users interested in a generic target topic. We propose a framework based on fully-automated content generation strategies and engagement plans to mimic legit Instagram pages. To validate our framework, we created 21 self-managed social honeypots (i.e., pages) on Instagram, covering three topics, four content generation strategies, and three engaging plans. In nine weeks, our honeypots gathered a total of 753 followers, 5387 comments, and 15739 likes. These results demonstrate the validity of our approach, and through statistical analysis, we examine the characteristics of effective social honeypots.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Instagram API provides to the owner aggregated statistics of followers (gender, age, countries) when their page reaches 100 followers [18].
- 2.
Starting from the main topic hashtags (i.e., #cat, #food, #car), we daily create the set of hashtags contained in the top 25 posts, from which we draw the hashtag to retrieve the starting post.
- 3.
Object detectors are Computer Vision-based tools that identify objects composing a given scene. Each object is accompanied by a probability score.
- 4.
We discard those images that do not contain at least a topic-related element with a high probability.
- 5.
- 6.
Passive followers only follow the page, but they do not engage further.
- 7.
- 8.
- 9.
All sponsored content belongs to weeks before the 9th.
- 10.
Earlybird bias appears in other social contexts like online reviews [43].
- 11.
For instance, we asked whether the page resulted from an already existing page (on IG or other platforms), or the strategies they adopted to manage the pages (e.g., spam, sponsoring).
- 12.
After 1000 followers, users are considered nano influencers [53].
- 13.
IG automatic algorithm maximized the audience toward authors country, i.e., Italy, reporting Italian regions.
- 14.
- 15.
References
Aditya, R., Prafulla, D., Alex, N., Casey, C., Mark, C.: https://openai.com/product/dall-e-2 (2022), Accessed Mar 2023
Ahmed, W., Vidal-Alaball, J., Downing, J., Seguí, F.L., et al.: Covid-19 and the 5g conspiracy theory: social network analysis of twitter data. J. Med. Internet Res. 22(5), e19458 (2020)
Akyon, F.C., Kalfaoglu, M.E.: Instagram fake and automated account detection. In: 2019 Innovations in intelligent systems and applications conference (ASYU). pp. 1–7. IEEE (2019)
Alexa: Alexa top websites. https://www.expireddomains.net/alexa-top-websites/ (2022), Accessed Sept 2022
AppsUK: How long does it take to get 1000 followers on instagram? https://apps.uk/how-long-1000-followers-on-instagram/ (2022) Accessed Jan 2023
Bailey, M., Dittrich, D., Kenneally, E., Maughan, D.: The Menlo report. IEEE Security & Privacy (2012)
Bedi, P., Sharma, C.: Community detection in social networks. Wiley Interdisc. Rev.: Data Mining Knowl. Disc. 6(3), 115–135 (2016)
Boyd, D., Crawford, K.: Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inform. Commun. Society 15(5), 662–679 (2012)
Brooker, P., Barnett, J., Cribbin, T., Sharma, S.: Have we even solved the first big data challenge?’ practical issues concerning data collection and visual representation for social media analytics. In: Snee, H., Hine, C., Morey, Y., Roberts, S., Watson, H. (eds.) Digital Methods for Social Science, pp. 34–50. Palgrave Macmillan UK, London (2016). https://doi.org/10.1057/9781137453662_3
Campbell, C., Ferraro, C., Sands, S.: Segmenting consumer reactions to social network marketing. Europ. J. Market 38 (2014)
Conti, M., Gathani, J., Tricomi, P.P.: Virtual influencers in online social media. IEEE Commun. Mag. 60, 86–91 (2022)
Conti, M., Pajola, L., Tricomi, P.P.: Captcha attack: Turning captchas against humanity. arXiv preprint arXiv:2201.04014 (2022)
Daugherty, A.: https://aigrow.me/follow-unfollow-instagram/ (2022) Accessed Oct 2022
Dayma, B., et al.: Dall-e mini (7 2021). https://doi.org/10.5281/zenodo.5146400,https://github.com/borisdayma/dalle-mini
De Cristofaro, E., Friedman, A., Jourjon, G., Kaafar, M.A., Shafiq, M.Z.: Paying for likes? understanding facebook like fraud using honeypots. In: Proceedings of the 2014 Conference on Internet Measurement Conference, pp. 129–136 (2014)
Del Vicario, M., et al.: The spreading of misinformation online. Proc. Natl. Acad. Sci. 113(3), 554–559 (2016)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 Ieee (2009)
for Developers, M.: Instagram api. hhttps://developers.facebook.com/docs/instagram-api/guides/insights (2021) Accessed Oct 2022
Dey, N., Borah, S., Babo, R., Ashour, A.S.: Social network analytics: computational research methods and techniques. Academic Press (2018)
Dittrich, D.: The ethics of social honeypots. Res. Ethics 11(4), 192–210 (2015)
Face, H.: Keytotext. https://huggingface.co/gagan3012/k2t (2022). Accessed Oct 2022
Ferreira, N.M.: 300+ best instagram captions and selfie quotes for your photos. https://www.oberlo.com/blog/instagram-captions (2022) Accessed Sep 2022
Fisher, D., McAdam, A.: Social traits, social networks and evolutionary biology. J. Evol. Biol. 30(12), 2088–2103 (2017)
Franke, R.H., Kaul, J.D.: The hawthorne experiments: First statistical interpretation. American sociological review, pp. 623–643 (1978)
Hagen, L., Keller, T., Neely, S., DePaula, N., Robert-Cooperman, C.: Crisis communications in the age of social media: a network analysis of zika-related tweets. Soc. Sci. Comput. Rev. 36(5), 523–541 (2018)
Haqimi, N.A., Rokhman, N., Priyanta, S.: Detection of spam comments on instagram using complementary naïve bayes. IJCCS (Indonesian J. Comput. Cybern. Syst.) 13(3), 263–272 (2019)
HQ, H.: How to get followers on instagram. https://www.hopperhq.com/blog/how-to-get-followers-instagram-2021/ (2022) Accessed Jan 2023
Hu, X., Tang, J., Liu, H.: Online social spammer detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28 (2014)
Hub, M.: The state of influencer marketing 2021: Benchmark report. https://influencermarketinghub.com/influencer-marketing-benchmark-report-2021 (2021) Accessed Oct 2022
Infographic: Data never sleeps 5.0. https://www.domo.com/learn/infographic/data-never-sleeps-5 (2022) Accessed Oct 2022
Instagram: Reducing inauthentic activity on instagram. https://about.instagram.com/blog/announcements/reducing-inauthentic-activity-on-instagram (2018) Accessed Feb 2023
Instagram: Introducing new authenticity measures on instagram. https://about.instagram.com/blog/announcements/introducing-new-authenticity-measures-on-instagram/ (2020) Accessed Feb 2023
Jain, A.K., Sahoo, S.R., Kaubiyal, J.: Online social networks security and privacy: comprehensive review and analysis. Complex Intell. Syst. 7(5), 2157–2177 (2021). https://doi.org/10.1007/s40747-021-00409-7
John, J.P., Yu, F., Xie, Y., Krishnamurthy, A., Abadi, M.: Heat-seeking honeypots: design and experience. In: Proceedings of the 20th International Conference on World Wide Web, pp. 207–216 (2011)
Karl: The 15 biggest social media sites and apps. https://www.dreamgrow.com/top-15-most-popular-social-networking-sites/ (2022) Accessed Sept 2022
Kim, R.E., Kotzé, L.J.: Planetary boundaries at the intersection of earth system law, science and governance: A state-of-the-art review. Rev. Europ., Compar. Int. Environ. Law 30(1), 3–15 (2021)
Kreibich, C., Crowcroft, J.: Honeycomb: creating intrusion detection signatures using honeypots. ACM SIGCOMM Comput. Commun. Rev. 34(1), 51–56 (2004)
Kuhn, S.: How to stop instagram spam? https://www.itgeared.com/how-to-stop-instagram-spam/ (2022) Accessed Jan 2023
Laurence, C.: Call to action instagram: 13 creative ctas to test on your account. https://www.plannthat.com/call-to-action-instagram// (2022) Accessed Sept 2022
Lavanya: How to avoid-stop spam comments on instagram posts? https://versionweekly.com/news/instagram/how-to-avoid-stop-spam-comments-on-instagram-posts-easy-method/ (2021) Accessed Oct 2022
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: social honeypots+ machine learning. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 435–442 (2010)
Lee, K., Eoff, B., Caverlee, J.: Seven months with the devils: A long-term study of content polluters on twitter. In: Proceedings of the international AAAI conference on web and social media. vol. 5, pp. 185–192 (2011)
Liu, J., Cao, Y., Lin, C.Y., Huang, Y., Zhou, M.: Low-quality product review detection in opinion summarization. In: Proceedings of the 2007 Joint Conference on Emethods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 334–342 (2007)
Macready, H.: The only instagram metrics you really need to track in 2023. https://blog.hootsuite.com/instagram-metrics (2022) Accessed Jan 2023
McClurg, S.D.: Social networks and political participation: the role of social interaction in explaining political participation. Polit. Res. Q. 56(4), 449–464 (2003)
McCormick, K.: 23 smart ways to get more instagram followers in 2022. https://www.wordstream.com/blog/ws/get-more-instagram-followers (2022), accessed: Sep. 2022
Me, I.: How to get your first 1000 followers on instagram. https://www.epidemicsound.com/blog/how-to-get-your-first-1000-followers-on-instagram/ (2022) Accessed Jan 2023
Meyer, L.: How often to post on social media: 2022 success guide. https://louisem.com/144557/often-post-social-media (2022) Accessed Oct 2022
Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware in the web. In: NDSS. vol. 1, p. 2 (2006)
Murugan, N.S., Devi, G.U.: Detecting spams in social networks using ml algorithms-a review. Int. J. Environ. Waste Manage. 21(1), 22–36 (2018)
Mushtaq, R.: Augmented Dickey Fuller Test. Mathematical Methods & Programming eJournal, Econometrics (2011)
OpenAI: https://openai.com/blog/chatgpt (2022) Accessed Mar 2023
Pereira, N.: 5 different tiers of influencers and when to use each. https://zerogravitymarketing.com/the-different-tiers-of-influencers-and-when-to-use-each/ (2022) Accessed Oct 2022
Petriska, J.: https://gist.github.com/JakubPetriska/060958fd744ca34f099e947cd080b540 (2022) Accessed Oct 2022
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Rani, P., Shokeen, J.: A survey of tools for social network analysis. Int. J. Web Eng. Technol. 16(3), 189–216 (2021)
Raponi, S., Khalifa, Z., Oligeri, G., Di Pietro, R.: Fake news propagation: A review of epidemic models, datasets, and insights. ACM Trans. Web 16(3) (2022)
Richter, F.: Social networking is the no. 1 online activity in the u.s. https://www.statista.com/chart/1238/digital-media-use-in-the-us/ (2022) Accessed Sept 2022
Robertson, M.: Instagram Marketing: How to Grow Your Instagram Page And Gain Millions of Followers Quickly With Step-by-Step Social Media Marketing Strategies. CreateSpace Independent Publishing Platform (2018)
Sheikhi, S.: An efficient method for detection of fake accounts on the instagram platform. Rev. d’Intelligence Artif. 34(4), 429–436 (2020)
Singh, A., Halgamuge, M.N., Moses, B.: An analysis of demographic and behavior trends using social media: Facebook, twitter, and instagram. Social Network Analytics, p. 87 (2019)
Smith, E.B., Brands, R.A., Brashears, M.E., Kleinbaum, A.M.: Social networks and cognition. Ann. Rev. Sociol. 46(1), 159–174 (2020)
SocialBuddy: How often to post on social media: 2022 success guide. https://socialbuddy.com/how-often-should-you-post-on-instagram/ (2022) Accessed Oct 2022
Stallings, W., Brown, L., Bauer, M.D., Howard, M.: Computer security: principles and practice, vol. 2. Pearson Upper Saddle River (2012)
Statusbrew: Instagram algorithm 2022: How to conquer it. https://statusbrew.com/insights/instagram-algorithm/ (2021) Accessed Oct 2022
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 1–9 (2010)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Tricomi, P.P., Chilese, M., Conti, M., Sadeghi, A.R.: Follow us and become famous! insights and guidelines from instagram engagement mechanisms. In: Proceedings of the 15th ACM Web Science Conference 2023, vol. 11, pp. 346–356. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3578503.3583623
Tricomi, P.P., Tarahomi, S., Cattai, C., Martini, F., Conti, M.: Are we all in a truman show? spotting instagram crowdturfing through self-training. arXiv preprint arXiv:2206.12904 (2022)
Vishwamitra, N., Li, Y., Hu, H., Caine, K., Cheng, L., Zhao, Z., Ahn, G.J.: Towards automated content-based photo privacy control in user-centered social networks. In: Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy. Association for Computing Machinery (2022)
Wang, G.,et al.: Serf and turf: crowdturfing for fun and profit. In: Proceedings of the 21st International Conference on World Wide Web, pp. 679–688 (2012)
Wang, Y.M., Beck, D., Jiang, X., Roussev, R.: Automated web patrol with strider honeymonkeys: Finding web sites that exploit browser vulnerabilities. In: IN NDSS. Citeseer (2006)
Webb, S., Caverlee, J., Pu, C.: Social honeypots: Making friends with a spammer near you. In: CEAS, pp. 1–10. San Francisco, CA (2008)
Xiao, Y., Jia, Y., Cheng, X., Wang, S., Mao, J., Liang, Z.: I know your social network accounts: A novel attack architecture for device-identity association. IEEE Transactions on Dependable and Secure Computing, pp. 1–1 (2022). https://doi.org/10.1109/TDSC.2022.3147785
Yang, C., Zhang, J., Gu, G.: A taste of tweets: Reverse engineering twitter spammers. In: Proceedings of the 30th Annual Computer Security Applications Conference, pp. 86–95 (2014)
Yegneswaran, V., Giffin, J.T., Barford, P., Jha, S.: An architecture for generating semantic aware signatures. In: USENIX Security Symposium, pp. 97–112 (2005)
Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: Pegasus: Pre-training with extracted gap-sentences for abstractive summarization (2019)
Zhang, W., Sun, H.M.: Instagram spam detection. In: 2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC), pp. 227–228. IEEE (2017)
Zhang, Y., Zhang, H., Yuan, X.: Toward efficient spammers gathering in twitter social networks. In: Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy, pp. 157–159 (2019)
Zhu, Y., Wang, X., Zhong, E., Liu, N., Li, H., Yang, Q.: Discovering spammers in social networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 26, pp. 171–177 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Implementation Details
1.1 A.1 Models
In this appendix we will describe how InstaModel, ArtModel, UnsplashModel and QuotesModel were implemented. All of them have different characteristics but, at the same time, share some common functionalities that will be explained before of the actual implementation of the four models.
Shared functionalities. One of the shared functionalities is adding emojis to the generated text. This is done with a python script which scans the generated caption trying to find out if there are words that can be translated with the corresponding emoji. To make this script more effective, it looks also for synonyms of nouns and adjectives found in the text to figure out if any of them can be correlated to a particular emoji. As last operation, the script chooses randomly, from a pool of emojis representing the “joy” sentiment, one emoji for each sentence that will be append at the end of each of them.
CTA are simple texts that may encourage a user to do actions. These CTA are sampled randomly from a manually compiled list and then added at the end of the generated caption.
The last shared feature is the selection of hashtags. As said before, through the Instagram Graph API we are able to get the first 25 posts for a specific hashtag and from them we extracted all the hashtags contained in the caption. Thus we compiled an hashtag list for each of the three topic sorted from the most used to the least used. Instagram allows to insert at most 30 hashtags in each posts but we think that this number is too high with respect to the normal user’s behavior. For this reason, we decided to choose 15 hashtags that are chosen with this criteria: 8 hashtags are sampled randomly from the first half of the list in the csv file, giving more weight to the top ones, while the other 7 are sampled randomly from the second half of the list, giving more weight to the bottom part of the list. The intuition is that we are selecting the most popular hashtags together with more specific hashtags.
InstaModel. Starting from the caption generation, InstaModel uses the Instagram Graph API to retrieve the top 25 posts for a specific hashtag. In practice, the chosen hashtag will be the topic on which the corresponding honeypot is based. Once we have all the 25 posts, they are checked to save only those that have an English caption before being passed to the object detector block. The object detector is implemented by using the InceptionV3 model for object detection tasks. InceptionV3 detects, in the original image, the object classes with the corresponding accuracy and if the first’s class score is not greater than or equal to 0.25, the post will be discarded. Otherwise, the other classes are checked as well and only if their scores are greater than 0.05 will be considered as keywords for the next step. Regarding the original caption, nouns and adjectives are extracted by using nltk python library. Notice that words such as “DM” or “credits” and adjectives such as “double” or similar, are not considered. This is because they usually belong to part of the caption that is not useful for this process.
Keyword2textFootnote 14 is the NLP model that transforms a list of keywords in a preliminary sentence. This preliminary sentence is then used by OPT model to generate the complete text. Considering the computational resources available to us, the model used is OPT with 1.3 billion parameters. We suggest to save the text generated by OPT in a file text because it will be used subsequently to generate the corresponding image. Once we have the complete generated text, emojis are added together with a CTA sentence that is standard in any post. The last step for caption generation is to append hashtags: they will be chosen by sampling from the corresponding csv file with the reasoning mentioned above.
The last step of InstaModel is image generation and for this purpose Dall-E Mini ([14]) is used. The prompt will be the text generated after the OPT stage, the one that has been save separately. It is relevant to highlight that the process with Dall-E Mini is not completely automatic and there should be a person that choose the most suitable image for the giving caption.
ArtModel. ArtModel starts from a prompt generated with a python script and uses Dall-E mini, like InstaModel, to generate the corresponding image. The style and the medium are chosen randomly from two lists. Example of styles can be “cyberpunk”, “psychedelic”, “realistic” or “abstract” while examples of medium are “painting”, “drawing”, “sketch” or “graffiti”. The topic of the honeypot is used as subject of the artistic picture generated by Dall-E Mini. Once the image is generated, the prompt, added of emojis, CTA and the corresponding hashtags, will be used as Instagram caption.
UnsplashModel. UnslashModel does not generate images but uses stock images retrieved from the Unsplash websites. Unsplash has been chosen not only because it gives the opportunity to find images together with the relative captions, but also because it offers API for developers that can be used easily. To avoid reusing the same images more than once, each image’s id is saved in a text file which will be checked at each iteration. For the caption generation, the original caption is processed by Pegasus model ( [77]) which is an NLP model quite good in the rephrase task. As always, emojis, CTA and hashtags are added to the final result.
QuotesModel. QuotesModel makes use of PixabayFootnote 15 stock images website to avoid reusing Unsplash even for this model. Also in this case, we use the topic of the specific honeypot as query tag. As for UnsplashModel, to avoid reusing the same image for different posts, once we have downloaded the image, its id is saved in a text file which will be checked every time needed. For the caption generation, a quote is sampled randomly from a citation dataset [22]. In this case, the model does not add emojis to the text because we think that the quote, by itself, can be a valid Instagram caption. On the contrary, as always, CTA and hashtags are added to the text.
1.2 A.2 Spamming
Honeypots with PLAN 1 or PLAN 2 engagement plans will automatically interact with the posts of other users. The idea is to retrieve the top 25 Instagram posts for the hashtag corresponding to the specific topic of the honeypot and like and comment each of them.
For the implementation we used Selenium which is a tool to automates browsers and it can be easily installed with pip command. Selenium requires a driver to interface with the chosen browser and in our case, since we chose Firefox, we have downloaded the geckodriver. The implementation consists of a python class which has three main methods: login, like_post and comment_post
The login method is invoked when the honeypot accesses to Instagram. The like_post method searches, in the DOM, for the button corresponding to the like action and then it clicks it. The comment_post method searches in the DOM for the corresponding comment button and then clicks it. Afterwards, it searches for the dedicated textarea and write a random sampled comment. Finally, it clicks the button to send the comment.
B Sponsored Content Analyses
We report in Table 4 the complete overview of audience attracted by our sponsored content. In particular, we report overall statistics in term of quantity (e.g., number of likes), and demographic information like gender, age, and location distribution.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bardi, S., Conti, M., Pajola, L., Tricomi, P.P. (2023). Social Honeypot for Humans: Luring People Through Self-managed Instagram Pages. In: Tibouchi, M., Wang, X. (eds) Applied Cryptography and Network Security. ACNS 2023. Lecture Notes in Computer Science, vol 13905. Springer, Cham. https://doi.org/10.1007/978-3-031-33488-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-33488-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33487-0
Online ISBN: 978-3-031-33488-7
eBook Packages: Computer ScienceComputer Science (R0)