Skip to main content

BERT and Word Embedding for Interest Mining of Instagram Users

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1653))

Abstract

With more than one billion monthly active users and nearly 100 million photos shared on the platform daily, Instagram has become among the richest sources of information for detecting users’ interests and trends. However, research works on this social network are limited compared to its competitors, e.g., Facebook and Twitter. There is no doubt that the lack of a publicly labeled dataset that summarizes the content of Instagram profiles is a prime problem bothering the researchers. To overcome this issue, here, for the first time, we present an annotated multidomain interests dataset to train and test OSNs’ users and the methodology to create this dataset from Instagram profiles. In addition, through this work, we propose an automatic detection and classification of Instagram users’ interests. We rely on word embedding representations of words and deep learning techniques to introduce two approaches: (i) a feature-based method and (ii) fine-tuning the BERT model. We observed that BERT fine-tuning performed much better.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.omnicoreagency.com/twitter-statistics/.

  2. 2.

    https://instaloader.github.io/.

  3. 3.

    https://help.instagram.com.

  4. 4.

    https://interestexplorer.io/facebook-interests-list/. Last access Mar2022.

  5. 5.

    https://www.dropbox.com/sh/8rd7gppa4bt0koh/AABEAoF8DZMFVB36oCYXSIxVa?dl=0.

  6. 6.

    https://nlp.stanford.edu/projects/glove/.

  7. 7.

    https://fasttext.cc/docs/en/crawl-vectors.html.

References

  1. Abbasi1, R., Rehman, G., Lee, J., Riaz, F.M., Luo, B.: Discovering temporal user interest on twitter using semantic based dynamic interest finding model. In: Proceedings of the IEEE Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, December 2017

    Google Scholar 

  2. Weng, J., Lim, E., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the 3rd International Conference on Web Search and Web Data Mining, WSDM 2010, New York, NY, USA, pp 261–270 (2010)

    Google Scholar 

  3. Xu, Z., Lu, R., Xiang, L., Yang, Q.: Discovering user interest on twitter with a modified author-topic model. In: 2011 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Lyon, France (2011)

    Google Scholar 

  4. Yang, L., Sun, T., Zhang, M., Mei, Q.: We know what @you #tag: does the dual role affect hashtag adoption? In: Proceedings of the 21st WWW Conference, Lyon (2012)

    Google Scholar 

  5. Piao, G., Breslin, J.G.: User modeling on twitter with wordnet Synsets and DBpedia concepts for personalized recommendations. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management CIKM 2016, IN, USA (2016)

    Google Scholar 

  6. Kang, J., Lee, H.: Modeling user interest in social media using news media and Wikipedia. Inf. Syst. 65, 52–64 (2017)

    Article  Google Scholar 

  7. Fani, H., Bagheri, E., Du, W.: Temporally Like-minded User Community Identification through Neural Embeddings. In: Proceedings of the 26th ACM International Conference on Information and Knowledge Management, CIK 2017, Melbourne (2017)

    Google Scholar 

  8. Chong, W.-H., Lim, E.-P., Cohen, W.: Collective entity linking in tweets over space and time. In: Jose, J.M., et al. (eds.) ECIR 2017. LNCS, vol. 10193, pp. 82–94. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56608-5_7

    Chapter  Google Scholar 

  9. Liang, S., Zhang, X., Ren, Z., Kanoulas, E.: Dynamic embeddings for user profiling in twitter Shangsong. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2018), London, UK (2018)

    Google Scholar 

  10. Jain, A., Gupta, A., Sharma, N., Joshi, S., Yadav, D.: Mining application on analyzing users’ interests from twitter. In: Proceedings of the 3rd International Conference on Internet of Things and Connected Technologies, Jaipur, India, March 2018

    Google Scholar 

  11. Ombabi, A.H., Lazzez, O., Ouarda, W., Alimi, A.N.: Deep learning framework based on Word2Vec and CNN for users interests classification. In: Proceedings of the 5th Sudan Conference on Computer Science and Information Technology 2017, Sudan (2017)

    Google Scholar 

  12. Adjali, O., Besançon, R., Ferret, O., Le Borgne, H., Grau, B.: Multimodal entity linking for tweets. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 463–478. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_31

    Chapter  Google Scholar 

  13. Piao, G., Breslin, J.G.: Inferring User interests for passive users on twitter by leveraging followee biographies. In: Jose, J.M., et al. (eds.) ECIR 2017. LNCS, vol. 10193, pp. 122–133. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56608-5_10

    Chapter  Google Scholar 

  14. Arabzadeh, N., Fani, H., Zarrinkalam, F., Navivala, A., Bagheri, B.: Causal dependencies for future interest prediction on twitter. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Turin, Italy (2018)

    Google Scholar 

  15. Jang, J.Y., Han, K., Shih, P.C., Lee, D.: Generation like: comparative characteristics in Instagram. In: Proceedings of the 33rd ACM Conference on Human Factors in Computing Systems, CHI 2015, Seoul, Korea, April 2015

    Google Scholar 

  16. Lee, R.K.-W., Hoang, T.-A., Lim, E.-P.: On analyzing user topic-specific platform preferences across multiple social media sites. In: Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, April 2017

    Google Scholar 

  17. Ferrara, E., Interdonato, R., Tagarelli, A.: Online popularity and topical interests through the lens of Instagram. In: Proceedings of the 25th ACM Conference on Hypertext and Social Media HT, pp 24–34, Santiago, Chile, September 2014

    Google Scholar 

  18. Devlin, J., Chang, M., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. Preprint arXiv:1810.04805 (2018)

  19. Mozafari, M., Farahbakhsh, R., Crespi, N.: A BERT-based transfer learning approach for hate speech detection in online social media. In: Cherifi, H., Gaito, S., Mendes, J.F., Moro, E., Rocha, L.M. (eds.) COMPLEX NETWORKS 2019. SCI, vol. 881, pp. 928–940. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36687-2_77

    Chapter  Google Scholar 

  20. Conneau, A., Lample, G.: Cross-lingual language model pretraining. In: Advances in Neural Information Processing Systems 32 Inc, pp. 7059–7069 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sana Hamdi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hamdi, S., Hamdi, A., Ben Yahia, S. (2022). BERT and Word Embedding for Interest Mining of Instagram Users. In: Bădică, C., Treur, J., Benslimane, D., Hnatkowska, B., Krótkiewicz, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2022. Communications in Computer and Information Science, vol 1653. Springer, Cham. https://doi.org/10.1007/978-3-031-16210-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16210-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16209-1

  • Online ISBN: 978-3-031-16210-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics