Abstract
Classifying tweets into general categories, such as food, music and games, is an essential work for social network platforms, which is the basis for information recommendation, user portraits and content construction. As far as we know, nearly all existing general tweet classification datasets only have textual content. However, textual content in tweets may be short, meaningless, and even none, which would harm the classification performance. In fact, images and videos are widespread in tweets, and they can intuitively provide extra useful information. To fill this gap, we construct a novel Cross-Modal Classification Dataset constructed from Weibo called CMCD. Specifically, we collect tweets with three modalities of text, image and video from 18 general categories, and then filter tweets that can easily be classified by only textual contents. Finally, the whole dataset consists of 85,860 tweets, and all of them have been manually labelled. Among them, 64.4% of tweets contain images, and 16.2% of tweets contain videos. We implement classical baselines for tweets classification and report human performance. Empirical results show that the classification over CMCD is challenging enough and requires further efforts.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arevalo, J., Solorio, T., Montes-y Gómez, M., González, F.A.: Gated multimodal units for information fusion. In: 5th International conference on learning representations 2017 workshop, (2017)
Banerjee, N., Chakraborty, D., Joshi, A., Mittal, S., Rai, A., Ravindran, B.: Towards analyzing micro-blogs for detection and classification of real-time intentions. In: Sixth International AAAI Conference on Weblogs and Social Media, (2012)
Cai, Y., Cai, H., Wan, X.: Multi-modal sarcasm detection in twitter with hierarchical fusion model. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2506–2515 (2019)
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp. 1–9 (2009)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE (2009)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. IEEE (2016)
Huang, F., Zhang, X., Zhao, Z., Xu, J., Li, Z.: Image-text sentiment analysis via deep multimodal attentive fusion. Knowl. Based Syst. 167, 26–37 (2019)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics. 2, 427–431 (2017)
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kumar, A., Singh, J.P., Dwivedi, Y.K., Rana, N.P.: A deep multi-modal neural network for informative twitter content classification during emergencies. Annals of Operations Research, pp. 1–32 (2020)
Kumar, A., Garg, G.: Sentiment analysis of multimodal twitter data. Multimedia Tools Appl. 78(17), 24103–24119 (2019)
Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 251–258. IEEE (2011)
Liu, Z., Yu, W., Chen, W., Wang, S., Wu, F.: Short text feature selection for micro-blog mining. In: 2010 International Conference on Computational Intelligence and Software Engineering, pp. 1–4. IEEE (2010)
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3d residual networks. In: proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541 (2017)
Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 251–260 (2010)
Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: an adversarial winograd schema challenge at scale. arXiv preprint arXiv:1907.10641 (2019)
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 841–842 (2010)
Vadicamo, L., et al.: Cross-media learning for image sentiment analysis in the wild. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 308–317. IEEE (2017)
Wang, X., Kumar, D., Thome, N., Cord, M., Precioso, F.: Recipe recognition with large multimodal food dataset. In: 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–6. IEEE (2015)
You, Q., Luo, J., Jin, H., Yang, J.: Joint visual-textual sentiment analysis with deep neural networks. In: Proceedings of the 23rd ACM international conference on Multimedia, pp. 1071–1074 (2015)
Yunjie, F., Huailiang, L.: Research on chinese short text classification based on wikipedia. Data Anal. Knowl. Discov. 28(3), 47–52 (2012)
Zellers, R., Bisk, Y., Schwartz, R., Choi, Y.: Swag: a large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326 (2018)
Zubiaga, A., Spina, D., Martínez, R., Fresno, V.: Real-time classification of twitter trends. J. Assoc. Inf. Sci. Technol. 66(3), 462–473 (2015)
Acknowledgement
The work is supported by National Key R&D Plan (No. 2016QY03D0602), NSFC (No. U19B2020, 61772076, 61751201 and 61602197) and NSFB (No. Z181100008918002).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, Y., Huang, H., Chen, A., Mao, XL. (2020). A Cross-Modal Classification Dataset on Social Network. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12430. Springer, Cham. https://doi.org/10.1007/978-3-030-60450-9_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-60450-9_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60449-3
Online ISBN: 978-3-030-60450-9
eBook Packages: Computer ScienceComputer Science (R0)