Abstract
Multi-label learning (MLL) copes with the classification problems where each in-stance can be tagged with multiple labels simultaneously. During the last several years, many MLL algorithms were proposed and they achieved excellent performance in multiple applications. However, these approaches are usually time-consuming and cannot handle large-scale data. In this paper, we propose a fast multi-label learning algorithm HashMLL based on hashing schemes. The approach HashMLL takes advantage of a Locality Sensitive Hashing (LSH) to identify its neighboring instances for each unseen instance, and exploits label correlation by estimating the similarity of labels through a minwise independent permutations locality sensitive hashing (MinHash). After that, relied on statistical information attained from all related labels of the neighboring instances, maxi-mum a posteriori (MAP) principle is used to determine the label set for each unseen instance. Experiments show that the performance of HashMLL is highly competitive to state-of-the-art techniques, whereas its time cost is much less. Particularly, on the dataset NUS-WIDE with 269,648 instances and the dataset Flickr with 565,444 instances where none of existing methods can return results in 24 hours, HashMLL takes only 90 secs and 23266 secs respectively.
Preview
Unable to display preview. Download preview PDF.
References
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, 1–13 (2007)
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern recognition 37, 1757–1771 (2004)
Tsoumakas, G., Vlahavas, I.P.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007)
Read, J.: A pruned problem transformation method for multi-label classification. In: Proc. 2008 New Zealand Computer Science Research Student Conference (NZCSRS), pp. 143–150 (2008)
Fürnkranz, J., Hüllermeier, E., Mencía, E.L., Brinker, K.: Multilabel classification via calibrated label ranking. Machine Learning 73, 133–153 (2008)
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Machine Learning 85, 333–359 (2011)
Vens, C., Struyf, J., Schietgat, E., Blockeel, H.: Decision trees for hierarchical multi-label classification. Machine Learning 73, 185–214 (2008)
Zhang, M.-L., Zhou, Z.-H.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40, 2038–2048 (2007)
Elisseeff, A., Weston, J.: A Kernel Method for Multi-Labelled Classification. Advances in Neural Information Processing Systems 14, 681–687 (2001)
Zhang, M.-L., Zhang, K.: Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 999–1008 (2010)
Guo, Y., Gu, S.: Multi-label classification using conditional dependency networks. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, p. 1300 (2011)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262 (2004)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC 1998, Dallas, TX, vol. 52, pp. 604–613 (1998)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Databases, pp. 518–529 (2000)
Broder, A.: On the resemblance and containment of documents. In: Compression & Complexity of Sequences Proceedings, pp. 21–29 (1997)
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: learning a Lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Snoek, C.G.M., Worring, M., Gemert, J.C.V., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of the ACM International Conference on Multimedia, pp. 421–430 (2006)
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: CIVR (2009)
Tang, L., Liu, H.: Relational learning via latent social dimensions. In: KDD Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data, pp. 817–826 (2009)
Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Machine Learning 76, 211–225 (2009)
Tang, L., Rajan, S., Narayanan, V.K.: Large scale multi-label classification via metalabeler. In: Proceedings of the 18th International Conference on World Wide Web, pp. 211–220 (2009)
Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: A java library for multi-label learning. The Journal of Machine Learning Research 12, 2411–2414 (2011)
Zhang, M.L., Zhou, Z.H.: A Review on Multi-Label Learning Algorithms. IEEE Transactions on Knowledge & Data Engineering 26, 1 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hu, H., Sun, Y., Wu, J. (2015). Fast Multi-label Learning via Hashing. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-25159-2_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25158-5
Online ISBN: 978-3-319-25159-2
eBook Packages: Computer ScienceComputer Science (R0)