Skip to main content

Topic Identification of Instagram Hashtag Sets for Image Tagging: An Empirical Assessment

  • Conference paper
  • First Online:
Metadata and Semantic Research (MTSR 2021)

Abstract

Images are an important part of collection items in any digital library. Mining information from social media networks, and especially the Instagram, for Image description has recently gained increased research interest. In the current study we extend previous work on the use of topic modelling for mining tags from Instagram hashtags for image content description. We examine whether the hashtags accompanying Instagram photos, collected via a common query hashtag (called ‘subject’ hereafter), vary in a statistically significant manner depending on the similarity of their visual content. In the experiment we use the topics mined from Instagram hashtags from a set of Instagram images corresponding to 26 different query hashtags and classified into two categories per subject, named as ‘relevant’ and ‘irrelevant’ depending on the similarity of their visual content. Two different set of users, namely trained students and generic crowd, assess the topics presented to them as word clouds. To invest whether there is significant difference between the word clouds of the images considered as visually relevant to the query subject compared to those considered visually irrelevant. At the same time we investigate whether the word cloud interpretations of trained students and generic crowd differ. The data collected through this empirical study are analyzed with use of independent samples t-test and Pearson rho. We conclude that the word clouds of the relevant Instagram images are much more easily interpretable by both the trained students and the crowd. The results also show some interesting variations across subjects which are analysed and discussed in detail throughout the paper. At the same time the interpretations of trained students and the generic crowd are highly correlated, denoting that no specific training is required to mine relevant tags from Instagram hashtags to describe the accompanied Instagram photos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.crummy.com/software/BeautifulSoup/bs4/doc/.

  2. 2.

    https://wordnet.princeton.edu/.

  3. 3.

    https://amueller.github.io/word_cloud/.

  4. 4.

    https://appen.com/.

  5. 5.

    https://elearning.cut.ac.cy/.

References

  1. Alami, N., Meknassi, M., En-nahnahi, N., El Adlouni, Y., Ammor, O.: Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modeling. Expert Syst. Appl. 172, 114652 (2021)

    Article  Google Scholar 

  2. Argyrou, A., Giannoulakis, S., Tsapatsoulis, N.: Topic modelling on Instagram hashtags: an alternative way to Automatic Image Annotation? In: 13th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP 2018), pp. 61–67, IEEE, Piscataway (2018)

    Google Scholar 

  3. Atenstaedt, R.: Word cloud analysis of the BJGP: 5 years on. Br. J. Gen. Pract. 67(658), 231–232 (2017)

    Article  Google Scholar 

  4. Blei, D.: Probabilistic topic models. Commun. ACM 55, 77–84 (2012)

    Article  Google Scholar 

  5. Cabrall, C., et al.: Validity and reliability of naturalistic driving scene categorization Judgments from crowdsourcing. Accid. Anal. Prev. 114, 25–33 (2018)

    Article  Google Scholar 

  6. Daer, A., Hoffman, R., Goodman, S.: Rhetorical functions of hashtag forms across social media applications. Commun. Des. Q. 3, 12–16 (2015)

    Google Scholar 

  7. Fu, X., Wang, T., Li, J., Yu C., Liu, W.: Improving distributed word representation and topic model by word-topic mixture model. In: Durrant, R.J., Kim, K.-E.b (eds.) Proceedings of the Asian Conference on Machine Learning, vol. 63, pp. 190–205 (2016)

    Google Scholar 

  8. Giannoulakis, S., Tsapatsoulis, N.: Defining and identifying stophashtags in instagram. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds.) INNS 2016. AISC, vol. 529, pp. 304–313. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47898-2_31

    Chapter  Google Scholar 

  9. Giannoulakis, S., Tsapatsoulis, N.: Instagram hashtags as image annotation metadata. In: Chbeir, R., Manolopoulos, Y., Maglogiannis, I., Alhajj, R. (eds.) AIAI 2015. IAICT, vol. 458, pp. 206–220. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23868-5_15

    Chapter  Google Scholar 

  10. Giannoulakis, S., Tspatsoulis, N.: Filtering Instagram hashtags through crowdtagging and the HITS algorithm. IEEE Trans. Comput. Soc. Syst. 6(3), 592–603 (2019)

    Article  Google Scholar 

  11. Hall, M., Clough, P., Stevenson, M.: Evaluating the use of clustering for automatically organising digital library collections. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds.) TPDL 2012. LNCS, vol. 7489, pp. 323–334. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33290-6_35

    Chapter  Google Scholar 

  12. Ibba, S., Pani, F.E.: Digital libraries: the challenge of integrating instagram with a taxonomy for content management. Future Internet 8(2), 16 (2016)

    Article  Google Scholar 

  13. Lohmann, S., Heimerl, F., Bopp, F., Burch, M., Ertl, T.: ConcentriCloud: word cloud visualization for multiple text documents. In: Banissi, E., et al. (eds.) Proceedings of the 19th International Conference on Information Visualisation, pp. 114–120. IEEE, Piscataway (2015)

    Google Scholar 

  14. Maier-Hein, L., et al.: Can masses of non-experts train highly accurate image classifiers? In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 438–445. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10470-6_55

    Chapter  Google Scholar 

  15. Mitry, D., et al.: The accuracy and reliability of crowdsource annotations of digital retinal images. Transl. Vis. Sci. Technol. 5, 6 (2016)

    Article  Google Scholar 

  16. Petrelli, D., Clough, P.: Analysing user’s queries for cross-language image retrieval from digital library collections. Electron. Libr. 30, 197–219 (2012)

    Article  Google Scholar 

  17. Rohani, V., Shayaa, S., Babanejaddehaki, G.: Topic modeling for social media content: a practical approach. In: 3rd International Conference on Computer and Information Sciences (ICCOINS) a Conference of World Engineering, Science & Technology Congress (ESTCON), pp. 397–402. IEEE, Piscataway (2016)

    Google Scholar 

  18. Sfakakis, M., Papachristopoulos, L., Zoutsou, K., Tsakonas, G., Papatheodorou, C.: Automated subject indexing of domain specific collections using word embeddings and general purpose Thesauri. In: Garoufallou, E., Fallucchi, F., William De Luca, E. (eds.) MTSR 2019. CCIS, vol. 1057, pp. 103–114. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36599-8_9

    Chapter  Google Scholar 

  19. Suadaa, L., Purwarianti, A.: Combination of Latent Dirichlet Allocation (LDA) and Term Frequency-Inverse Cluster Frequency (TFxICF) in Indonesian text clustering with labeling. In: 4th International Conference on Information and Communication Technology. IEEE, Piscataway (2016)

    Google Scholar 

  20. Tsapatsoulis, N.: Image retrieval via topic modelling of Instagram hashtags. In: 15th International Workshop on Semantic and Social Media Adaptation & Personalization, pp. 1–6. IEEE, Piscataway (2020)

    Google Scholar 

  21. Xie, I., Matusiak, K.: Metadata. Discover Digital Libraries: Theory and Practice, pp. 129–170. Elsevier, Amsterdam (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stamatios Giannoulakis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Giannoulakis, S., Tsapatsoulis, N. (2022). Topic Identification of Instagram Hashtag Sets for Image Tagging: An Empirical Assessment. In: Garoufallou, E., Ovalle-Perandones, MA., Vlachidis, A. (eds) Metadata and Semantic Research. MTSR 2021. Communications in Computer and Information Science, vol 1537. Springer, Cham. https://doi.org/10.1007/978-3-030-98876-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98876-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98875-3

  • Online ISBN: 978-3-030-98876-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics