Skip to main content

Subject Recognition in Chinese Sentences for Chatbots

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

  • 4664 Accesses

Abstract

Subject (In this paper, subject means “ /zhu ti” in Chinese, while we use “grammatical subject” to denote traditional “ /zhu yu” in Chinese.) recognition plays a significant role in the conversation with a Chatbot. The misclassification of the subject of a sentence leads to the misjudgment of the intention recognition. In this paper, we build a new dataset for subject recognition and propose several systems based on pre-trained language models. We first design annotation guidelines for human-chatbot conversational data, and hire annotators to build a new dataset according to the guidelines. Then, classification methods based on deep neural network are proposed. Finally, extensive experiments are conducted to testify the performance of different algorithms. The results show that our method achieves 88.5% \(F_1\) in the task of subject recognition. We also compare our systems with three other Chatbot systems and find ours perform the best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The term limited refers to specific reference and definite quantity, while non-limited means those generic reference and non-definite quantity. The subject discussed in this paper belongs to the limited, so the non-limited people are usually not regarded as subjects, such as the “men” in “All men must die”.

  2. 2.

    https://github.com/kpu/kenlm.

  3. 3.

    A subject-verb relation in LTP. https://github.com/HIT-SCIR/ltp.

  4. 4.

    https://github.com/winnie0/ChineseSubjectRecognition.

  5. 5.

    https://radimrehurek.com/gensim/models/word2vec.html.

References

  1. Christensen, J., Soderland, S., Etzioni, O., et al.: An analysis of open information extraction based on semantic role labeling. In: K-CAP, pp. 113–120. ACM (2011)

    Google Scholar 

  2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  3. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  4. Kim, S.M., Hovy, E.: Identifying opinion holders for question answering in opinion texts. In: AAAI 2005 Workshop, pp. 1367–1373 (2005)

    Google Scholar 

  5. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  6. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI (2015)

    Google Scholar 

  7. Li, F., et al.: Structure-aware review mining and summarization. In: COLING, pp. 653–661 (2010)

    Google Scholar 

  8. Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016)

  9. Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)

  10. Punyakanok, V., Roth, D., Yih, W.t.: The importance of syntactic parsing and inference in semantic role labeling. Comput. Linguist. 34(2), 257–287 (2008)

    Article  Google Scholar 

  11. Qi, H., Yang, M., Meng, Y., Han, X., Zhao, T.: Skeleton parsing for specific domain Chinese text. J. Chin. Inf. Process. 18(1), 1–5 (2004). (in Chinese)

    Google Scholar 

  12. Qiu, G., Liu, B., Bu, J., Chen, C.: Opinion word expansion and target extraction through double propagation. Comput. Linguist. 37(1), 9–27 (2011)

    Article  Google Scholar 

  13. Rusu, D., Dali, L., Fortuna, B., Grobelnik, M., Mladenic, D.: Triplet extraction from sentences. In: IMSCI, pp. 8–12 (2007)

    Google Scholar 

  14. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  15. Wang, R., Ju, J., Li, S., Zhou, G.: Feature engineering for CRFs based opinion target extraction. J. Chin. Inf. Process. 26(2), 56–61 (2012). (in Chinese)

    Google Scholar 

  16. Wiegand, M., Klakow, D.: Convolution kernels for opinion holder extraction. In: NAACL-HLT, pp. 795–803 (2010)

    Google Scholar 

  17. Zhou, H., Huang, M., Zhang, T., Zhu, X., Liu, B.: Emotional chatting machine: emotional conversation generation with internal and external memory. In: AAAI (2018)

    Google Scholar 

  18. Zhou, P., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: ACL (2016)

    Google Scholar 

Download references

Acknowledgements

The authors are supported by the National Natural Science Foundation of China (Grant No. 61603240). Corresponding author is Hao Shao. We also thank the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Shao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, F., Wei, H., Hao, Q., Zeng, R., Shao, H., Chen, W. (2019). Subject Recognition in Chinese Sentences for Chatbots. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32236-6_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32235-9

  • Online ISBN: 978-3-030-32236-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics