Abstract
As an important research topic with well-recognized practical values, classification of social streams has been identified with increasing popularity with social data, such as the tweet stream generated by Twitter users in chronological order. A salient, and perhaps also the most interesting, feature of such user-generated content is its never-failing novelty, which, unfortunately, would challenge most traditional pre-trained classification models as they are built based on fixed label set and would therefore fail to identify new labels as they emerge. In this paper, we study the problem of classification of social streams with emerging new labels, and propose a novel ensemble framework, integrating an instance-based learner and a label-based learner by completely-random trees. The proposed framework can not only classify known labels in the multi-label scenario, but also detect emerging new labels and update itself in the data stream. Extensive experiments on real-world stream data set from Weibo, a Chinese micro-blogging platform, demonstrate the superiority of our approach over the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The fewer partitions means that instances with new labels are more likely to be of the shorter height in each tree.
- 2.
This is a trade-off parameter, the larger means method needs more memory. In practise, we use the value which is greater than \(\psi \) to guide the setup of this parameter.
- 3.
- 4.
Here “\(\downarrow \)” means the smaller the value, the better the performance; and “\(\uparrow \)” means the larger the value, the better the performance.
References
Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.N.: Twitter trending topic classification. In: ICDM Workshops, pp. 251–258 (2011)
Tsai, M., Aggarwal, C.C., Huang, T.S.: Towards classification of social streams. In: SDM (2015) 649–657
Zubiaga, A., Spina, D., Martínez-Unanue, R., Fresno, V.: Real-time classification of twitter trends. JASIST 66(3), 462–473 (2015)
Zhu, Y., Ting, K.M., Zhou, Z.H.: Multi-label learning with emerging new labels. In: ICDM, pp. 1371–1376 (2016)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Kong, X., Yu, P.S.: An ensemble-based approach to fast classification of multi-label data streams. In: 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom, pp. 95–104 (2011)
Zhou, Z.H., Chen, Z.Q.: Hybrid decision tree. Knowl. Based Syst. 15(8), 515–528 (2002)
Al-Khateeb, T., Masud, M.M., Khan, L., Aggarwal, C., Han, J., Thuraisingham, B.: Stream classification with recurring and novel class detection using class-based ensemble. In: ICDM, pp. 31–40 (2012)
Mu, X., Zhu, F., Du, J., Lim, E.P., Zhou, Z.H.: Streaming classification with emerging new class by class matrix sketching. In: AAAI, pp. 2373–2379 (2017)
Mu, X., Ting, K.M., Zhou, Z.H.: Classification under streaming emerging new classes: A solution using completely-random trees. IEEE TKDE 29(8), 1605–1618 (2017)
Haque, A., Khan, L., Baron, M.: Sand: Semi-supervised adaptive novel class detection and classification over data stream. In: AAAI, pp. 1652–1658 (2016)
Masud, M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE TKDE 23(6), 859–874 (2011)
Da, Q., Yu, Y., Zhou, Z.H.: Learning with augmented class by exploiting unlabeled data. In: AAAI, pp. 1760–1766 (2014)
Liu, F., Zhang, X., Ye, Y., Zhao, Y., Li, Y.: MLRF: multi-label classification through random forest with label-set partition. In: Huang, D.-S., Han, K. (eds.) ICIC 2015. LNCS (LNAI), vol. 9227, pp. 407–418. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22053-6_44
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: ICDM, pp. 413–422 (2008)
Zhou, Z.H.: Learnware: on the future of machine learning. Front. Comput. Sci. 10(4), 355–384 (2016)
Aggarwal, C.C.: Mining text and social streams: a review. SIGKDD Explor. 15(2), 9–19 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Fan, W., Wang, H., Yu, P.S., Ma, S.: Is random model better? on its accuracy and efficiency. In: ICDM, pp. 51–58 (2003)
Liu, F.T., Ting, K.M., Fan, W.: Maximizing tree diversity by building complete-random decision trees. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 605–610. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_70
Zhou, Z.H.: Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC, Boca Raton, FL, USA (2012)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). http://www.csie.ntu.edu.tw/cjlin/libsvm
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall / CRC Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton (2010)
Zhou, Z.H., Zhang, M.L., Huang, S.J., Li, Y.F.: Multi-instance multi-label learning. Artif. Intell. 176(1), 2291–2320 (2012)
Acknowledgement
This research was supported by the National Research Foundation, Prime Minister’s Office, Singapore under its International Research Centres in Singapore Funding Initiative; the NSFC (61333014) and Pinnacle lab for analytics at Singapore Management University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Mu, X., Zhu, F., Liu, Y., Lim, EP., Zhou, ZH. (2018). Social Stream Classification with Emerging New Labels. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10937. Springer, Cham. https://doi.org/10.1007/978-3-319-93034-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-93034-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93033-6
Online ISBN: 978-3-319-93034-3
eBook Packages: Computer ScienceComputer Science (R0)