Fast Multi-label Learning via Hashing

Hu, Haifeng; Sun, Yong; Wu, Jiansheng

doi:10.1007/978-3-319-25159-2_48

Haifeng Hu²²,
Yong Sun²² &
Jiansheng Wu^23,24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9403))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2962 Accesses

Abstract

Multi-label learning (MLL) copes with the classification problems where each in-stance can be tagged with multiple labels simultaneously. During the last several years, many MLL algorithms were proposed and they achieved excellent performance in multiple applications. However, these approaches are usually time-consuming and cannot handle large-scale data. In this paper, we propose a fast multi-label learning algorithm HashMLL based on hashing schemes. The approach HashMLL takes advantage of a Locality Sensitive Hashing (LSH) to identify its neighboring instances for each unseen instance, and exploits label correlation by estimating the similarity of labels through a minwise independent permutations locality sensitive hashing (MinHash). After that, relied on statistical information attained from all related labels of the neighboring instances, maxi-mum a posteriori (MAP) principle is used to determine the label set for each unseen instance. Experiments show that the performance of HashMLL is highly competitive to state-of-the-art techniques, whereas its time cost is much less. Particularly, on the dataset NUS-WIDE with 269,648 instances and the dataset Flickr with 565,444 instances where none of existing methods can return results in 24 hours, HashMLL takes only 90 secs and 23266 secs respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, 1–13 (2007)
Article Google Scholar
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern recognition 37, 1757–1771 (2004)
Article Google Scholar
Tsoumakas, G., Vlahavas, I.P.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007)
Chapter Google Scholar
Read, J.: A pruned problem transformation method for multi-label classification. In: Proc. 2008 New Zealand Computer Science Research Student Conference (NZCSRS), pp. 143–150 (2008)
Google Scholar
Fürnkranz, J., Hüllermeier, E., Mencía, E.L., Brinker, K.: Multilabel classification via calibrated label ranking. Machine Learning 73, 133–153 (2008)
Article MATH Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Machine Learning 85, 333–359 (2011)
Article MathSciNet Google Scholar
Vens, C., Struyf, J., Schietgat, E., Blockeel, H.: Decision trees for hierarchical multi-label classification. Machine Learning 73, 185–214 (2008)
Article MATH Google Scholar
Zhang, M.-L., Zhou, Z.-H.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40, 2038–2048 (2007)
Article MATH Google Scholar
Elisseeff, A., Weston, J.: A Kernel Method for Multi-Labelled Classification. Advances in Neural Information Processing Systems 14, 681–687 (2001)
Google Scholar
Zhang, M.-L., Zhang, K.: Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 999–1008 (2010)
Google Scholar
Guo, Y., Gu, S.: Multi-label classification using conditional dependency networks. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, p. 1300 (2011)
Google Scholar
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262 (2004)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC 1998, Dallas, TX, vol. 52, pp. 604–613 (1998)
Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Databases, pp. 518–529 (2000)
Google Scholar
Broder, A.: On the resemblance and containment of documents. In: Compression & Complexity of Sequences Proceedings, pp. 21–29 (1997)
Google Scholar
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: learning a Lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Chapter Google Scholar
Snoek, C.G.M., Worring, M., Gemert, J.C.V., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of the ACM International Conference on Multimedia, pp. 421–430 (2006)
Google Scholar
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: CIVR (2009)
Google Scholar
Tang, L., Liu, H.: Relational learning via latent social dimensions. In: KDD Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data, pp. 817–826 (2009)
Google Scholar
Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Machine Learning 76, 211–225 (2009)
Article MATH Google Scholar
Tang, L., Rajan, S., Narayanan, V.K.: Large scale multi-label classification via metalabeler. In: Proceedings of the 18th International Conference on World Wide Web, pp. 211–220 (2009)
Google Scholar
Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: A java library for multi-label learning. The Journal of Machine Learning Research 12, 2411–2414 (2011)
MathSciNet MATH Google Scholar
Zhang, M.L., Zhou, Z.H.: A Review on Multi-Label Learning Algorithms. IEEE Transactions on Knowledge & Data Engineering 26, 1 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
Haifeng Hu & Yong Sun
School of Geographic and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing, China
Jiansheng Wu
Industrial Engineering, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA
Jiansheng Wu

Authors

Haifeng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jiansheng Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiansheng Wu .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Songmao Zhang
Ludwig-Maximilians-Universität München, Munich, Germany
Martin Wirsing
Southwest University, Chongqing, China
Zili Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, H., Sun, Y., Wu, J. (2015). Fast Multi-label Learning via Hashing. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-25159-2_48
Published: 03 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25158-5
Online ISBN: 978-3-319-25159-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics