Skip to main content

WiseTag: An Ensemble Method for Multi-label Topic Classification

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11109))

  • 2035 Accesses

Abstract

Multi-label topic classification aims to assign one or more relevant topic labels to a text. This paper presents the WiseTag system, which performs multi-label topic classification based on an ensemble of four single models, namely a KNN-based model, an Information Gain-based model, a Keyword Matching-based model and a Deep Learning-based model. These single models are carefully designed so that they are diverse enough to improve the performance of the ensemble model. In the NLPCC 2018 shared task 6 “Automatic Tagging of Zhihu Questions”, the proposed WiseTag system achieves an F1 score of 0.4863 on the test set, and ranks no. 4 among all the teams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. https://github.com/hyperopt/hyperopt

  2. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  3. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  4. Denkowski, M., Neubig G.: Stronger baselines for trustable results in neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation, Association for Computational Linguistics, Vancouver, pp. 18–27 (2017)

    Google Scholar 

  5. https://github.com/HIT-SCIR/ltp-cws

  6. https://radimrehurek.com/gensim/models/word2vec.html

  7. Chollet, F.: Keras: deep learning library for theano and tensorflow (2016). https://keras.io

  8. https://en.wikipedia.org/wiki/Pointwise_mutual_information

  9. https://github.com/chenyuntc/PyTorchText

  10. https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

  11. https://en.wikipedia.org/wiki/Tf-idf

  12. Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)

    Article  Google Scholar 

  13. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recognit. 37(9), 1757–1771 (2004)

    Article  Google Scholar 

  14. Fürnkranz, J., Hüllermeier, E., Mencía, E.L., Brinker, K.: Multilabel classification via calibrated label ranking. Mach. Learn. 73(2), 133–153 (2008)

    Article  Google Scholar 

  15. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38

    Chapter  Google Scholar 

  16. Zhang, M.L., Zhou, Z.H.: ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)

    Article  Google Scholar 

  17. Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: De Raedt, L., Siebes, A. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44794-6_4

    Chapter  MATH  Google Scholar 

  18. Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS 2001), pp. 681–687. MIT Press, Cambridge (2001)

    Google Scholar 

  19. https://zhuanlan.zhihu.com/p/28912353

  20. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS 2011), pp. 2546–2554. Curran Associates Inc., USA (2011)

    Google Scholar 

  21. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)

    Google Scholar 

  22. http://ruder.io/deep-learning-nlp-best-practices/index.html#fn:21

  23. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guanqing Liang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liang, G., Kao, H., Wing-Ki Leung, C., He, C. (2018). WiseTag: An Ensemble Method for Multi-label Topic Classification. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2018. Lecture Notes in Computer Science(), vol 11109. Springer, Cham. https://doi.org/10.1007/978-3-319-99501-4_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99501-4_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99500-7

  • Online ISBN: 978-3-319-99501-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics