Skip to main content

A Novel Hybrid Framework to Enhance Zero-shot Classification

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13502))

Included in the following conference series:

  • 835 Accesses

Abstract

As manually labelling data can be error-prone and labour-intensive, some recent studies automatically classify documents without any training on labelled data and directly exploit pre-trained language models (PLMs) for many downstream tasks, also known as zero-shot text classification. In the same vein, we propose a novel framework aims at improving zero-short learning and enriching domain specific information required by PLMs with transformer models. To unleash the power of PLMs pre-trained on massive cross-section corpus, the framework unifies two transformers for different purposes: 1) expanding categorical labels required by PLMs by creating coherent representative samples with GPT2, which is a language model acclaimed for generating sensical text outputs, and 2) augmenting documents with T5, which has the virtue of synthesizing high quality new samples similar to the original text. The proposed framework can be easily integrated into different general testbeds. Extensive experiments on two popular topic classification datasets have proved its effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(\mathcal {L}\) may not be pre-defined in practical scenarios, while in the experiments we fix it for convenient evaluations.

  2. 2.

    In this study, we use the publicly available BERT of uncased version https://huggingface.co/bert-base-uncased. The output of BERT’s NSP has two logits: the first is the probability of IsNext and the second is the probability of NotNext, both of which are outputs of the SoftMax function from the previous layer.

  3. 3.

    https://tfhub.dev/google/universal-sentence-encoder-cmlm/en-base/1.

  4. 4.

    https://github.com/jasonwei20/eda_nlp.

  5. 5.

    https://huggingface.co/Helsinki-NLP/opus-mt-en-zh.

  6. 6.

    https://huggingface.co/Helsinki-NLP/opus-mt-zh-en.

  7. 7.

    https://huggingface.co/Vamsi/T5_Paraphrase_Paws.

  8. 8.

    https://github.com/google-research-datasets/paws.

  9. 9.

    https://huggingface.co/docs/transformers/model_doc/ctrl.

References

  1. Chang, M.W., Ratinov, L.A., Roth, D., Srikumar, V.: Importance of semantic representation: dataless classification. In: AAAI2008 Proceedings of the 23rd National Conference on Artificial Intelligence, vol. 2, pp. 830–835 (2008)

    Google Scholar 

  2. Chen, X., Xia, Y., Jin, P., Carroll, J.: Dataless text classification with descriptive IDA. In: AAAI2015 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, vol. 29, pp. 2224–2231 (2015)

    Google Scholar 

  3. Conneau, A., et al.: XNLI: Evaluating cross-lingual sentence representations. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP (2018)

    Google Scholar 

  4. Dathathri, S., et al.: Plug and play language models: a simple approach to controlled text generation. CoRR abs/1912.02164 (2019)

    Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, pp. 4171–4186 (2019)

    Google Scholar 

  6. Elhoseiny, M., Saleh, B., Elgammal, A.: Write a classifier: zero-shot learning using purely textual descriptions. In: 2013 IEEE International Conference on Computer Vision, ICCV, pp. 2584–2591 (2013)

    Google Scholar 

  7. Gabrilovich, E., et al.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI2007 Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI, vol. 7, pp. 1606–1611 (2007)

    Google Scholar 

  8. Harrando, I., Troncy, R.: Explainable zero-shot topic extraction using a common-sense knowledge graph. In: Third Biennial Conference on Language, Data and Knowledge, LDK (2021)

    Google Scholar 

  9. Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The curious case of neural text degeneration. CoRR abs/1904.09751 (2019)

    Google Scholar 

  10. Keskar, N.S., McCann, B., Varshney, L., Xiong, C., Socher, R.: CTRL - a conditional transformer language model for controllable generation. CoRR abs/1909.05858 (2019)

    Google Scholar 

  11. Kumar, V., Choudhary, A., Cho, E.: Data augmentation using pre-trained transformer models. CoRR abs/2003.02245 (2020)

    Google Scholar 

  12. Li, C., Xing, J., Sun, A., Ma, Z.: Effective document labeling with very few seed words: a topic model approach. In: CIKM 2016 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM, pp. 85–94 (2016)

    Google Scholar 

  13. Li, Y., Zheng, R., Tian, T., Hu, Z., Iyer, R., Sycara, K.: Joint embedding of hierarchical categories and entities for concept categorization and dataless classification. CoRR abs/1607.07956 (2016)

    Google Scholar 

  14. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. CoRR abs/2107.13586 (2021)

    Google Scholar 

  15. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019)

    Google Scholar 

  16. Longpre, S., Wang, Y., DuBois, C.: How effective is task-agnostic data augmentation for pretrained transformers? In: the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 4401–4411 (2020)

    Google Scholar 

  17. Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., Kiela, D.: Adversarial NLI: a new benchmark for natural language understanding. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL (2020)

    Google Scholar 

  18. Palatucci, M.M., Pomerleau, D.A., Hinton, G.E., Mitchell, T.: Zero-shot learning with semantic output codes. In: NIPS2009 Proceedings of the 22nd International Conference on Neural Information Processing Systems, NIPS, vol. 22 (2009)

    Google Scholar 

  19. Rei, M., Søgaard, A.: Zero-shot sequence labeling: transferring knowledge from sentences to tokens. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), NAACL-HLT, pp. 293–302 (2018)

    Google Scholar 

  20. Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, pp. 3982–3992 (2019)

    Google Scholar 

  21. Rios, A., Kavuluru, R.: Few-shot and zero-shot multi-label learning for structured label spaces. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 3132–3142 (2018)

    Google Scholar 

  22. Schick, T., Schütze, H.: Exploiting cloze questions for few-shot text classification and natural language inference. Computing Research Repository arXiv:2001.07676 (2020)

  23. Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL, pp. 86–96 (2016)

    Google Scholar 

  24. Shleifer, S.: Low resource text classification with ulmfit and backtranslation. CoRR abs/1903.09244 (2019)

    Google Scholar 

  25. Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI2014 Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI, pp. 1579–1585 (2014)

    Google Scholar 

  26. Song, Y., Upadhyay, S., Peng, H., Roth, D.: Cross-lingual dataless classification for many languages. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI, pp. 2901–2907 (2016)

    Google Scholar 

  27. Sun, Y., Zheng, Y., Hao, C., Qiu, H.: NSP-BERT: a prompt-based zero-shot learner through an original pre-training task-next sentence prediction. arXiv:2109.03564 (2021)

  28. Veeranna, S.P., Nam, J., Mencía, E., Furnkranz, J.: Using semantic similarity for multi-label zero-shot classification of text documents. In: European Symposium on Artificial Neural Networks, ESANN, pp. 423–428 (2016)

    Google Scholar 

  29. Wei, J., Zou, K.: Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, pp. 6383–6389 (2019)

    Google Scholar 

  30. Xie, Q., Dai, Z., Hovy, E., Luong, T., Le, Q.: Unsupervised data augmentation for consistency training. In: Advances in Neural Information Processing Systems 33 (2020)

    Google Scholar 

  31. Yang, Z., Yang, Y., Cer, D., Law, J., Darve, E.: Universal sentence representation learning with conditional masked language model. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 6216–6228 (2021)

    Google Scholar 

  32. Yin, W., Hay, J., Roth, D.: Benchmarking zero-shot text classification: datasets, evaluation and entailment approach. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, pp. 3905–3914 (2019)

    Google Scholar 

  33. Zhang, J., Lertvittayakumjorn, P., Guo, Y.: Integrating semantic knowledge to tackle zero-shot text classification. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pp. 1031–1040 (2019)

    Google Scholar 

  34. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: NIPS2015 Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS, vol. 1, pp. 649–657 (2015)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the NSERC Discovery Grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Y., Liu, Y. (2022). A Novel Hybrid Framework to Enhance Zero-shot Classification. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16270-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16269-5

  • Online ISBN: 978-3-031-16270-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics