Skip to main content

Fast Multi-label Learning via Hashing

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9403))

  • 2962 Accesses

Abstract

Multi-label learning (MLL) copes with the classification problems where each in-stance can be tagged with multiple labels simultaneously. During the last several years, many MLL algorithms were proposed and they achieved excellent performance in multiple applications. However, these approaches are usually time-consuming and cannot handle large-scale data. In this paper, we propose a fast multi-label learning algorithm HashMLL based on hashing schemes. The approach HashMLL takes advantage of a Locality Sensitive Hashing (LSH) to identify its neighboring instances for each unseen instance, and exploits label correlation by estimating the similarity of labels through a minwise independent permutations locality sensitive hashing (MinHash). After that, relied on statistical information attained from all related labels of the neighboring instances, maxi-mum a posteriori (MAP) principle is used to determine the label set for each unseen instance. Experiments show that the performance of HashMLL is highly competitive to state-of-the-art techniques, whereas its time cost is much less. Particularly, on the dataset NUS-WIDE with 269,648 instances and the dataset Flickr with 565,444 instances where none of existing methods can return results in 24 hours, HashMLL takes only 90 secs and 23266 secs respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, 1–13 (2007)

    Article  Google Scholar 

  2. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern recognition 37, 1757–1771 (2004)

    Article  Google Scholar 

  3. Tsoumakas, G., Vlahavas, I.P.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Read, J.: A pruned problem transformation method for multi-label classification. In: Proc. 2008 New Zealand Computer Science Research Student Conference (NZCSRS), pp. 143–150 (2008)

    Google Scholar 

  5. Fürnkranz, J., Hüllermeier, E., Mencía, E.L., Brinker, K.: Multilabel classification via calibrated label ranking. Machine Learning 73, 133–153 (2008)

    Article  MATH  Google Scholar 

  6. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Machine Learning 85, 333–359 (2011)

    Article  MathSciNet  Google Scholar 

  7. Vens, C., Struyf, J., Schietgat, E., Blockeel, H.: Decision trees for hierarchical multi-label classification. Machine Learning 73, 185–214 (2008)

    Article  MATH  Google Scholar 

  8. Zhang, M.-L., Zhou, Z.-H.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40, 2038–2048 (2007)

    Article  MATH  Google Scholar 

  9. Elisseeff, A., Weston, J.: A Kernel Method for Multi-Labelled Classification. Advances in Neural Information Processing Systems 14, 681–687 (2001)

    Google Scholar 

  10. Zhang, M.-L., Zhang, K.: Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 999–1008 (2010)

    Google Scholar 

  11. Guo, Y., Gu, S.: Multi-label classification using conditional dependency networks. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, p. 1300 (2011)

    Google Scholar 

  12. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262 (2004)

    Google Scholar 

  13. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC 1998, Dallas, TX, vol. 52, pp. 604–613 (1998)

    Google Scholar 

  14. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Databases, pp. 518–529 (2000)

    Google Scholar 

  15. Broder, A.: On the resemblance and containment of documents. In: Compression & Complexity of Sequences Proceedings, pp. 21–29 (1997)

    Google Scholar 

  16. Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: learning a Lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  17. Snoek, C.G.M., Worring, M., Gemert, J.C.V., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of the ACM International Conference on Multimedia, pp. 421–430 (2006)

    Google Scholar 

  18. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: CIVR (2009)

    Google Scholar 

  19. Tang, L., Liu, H.: Relational learning via latent social dimensions. In: KDD Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data, pp. 817–826 (2009)

    Google Scholar 

  20. Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Machine Learning 76, 211–225 (2009)

    Article  MATH  Google Scholar 

  21. Tang, L., Rajan, S., Narayanan, V.K.: Large scale multi-label classification via metalabeler. In: Proceedings of the 18th International Conference on World Wide Web, pp. 211–220 (2009)

    Google Scholar 

  22. Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: A java library for multi-label learning. The Journal of Machine Learning Research 12, 2411–2414 (2011)

    MathSciNet  MATH  Google Scholar 

  23. Zhang, M.L., Zhou, Z.H.: A Review on Multi-Label Learning Algorithms. IEEE Transactions on Knowledge & Data Engineering 26, 1 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiansheng Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Hu, H., Sun, Y., Wu, J. (2015). Fast Multi-label Learning via Hashing. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25159-2_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25158-5

  • Online ISBN: 978-3-319-25159-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics