Abstract
Generative automatic summarization is a basic problem in natural language processing. We propose a cross-language generative automatic summarization model. Unlike the traditional methods that have to go through machine translation, our model can directly generate a text summary of another language from a text body in one language. We use the RNNLM(Recurrent Neural Network based Language Model) structure to pre-train word vectors to obtain semantic information in different languages. We combined the Soft Attention mechanism in the Seq2Seq model, using Chinese, Korean and English to build a parallel corpus to train the model, thereby, cross-language automatic summarization can be achieved without the help of machine translation technology. Experiments show that the improvement of our proposed model on ROUGE-1, ROUGE-2 and ROUGE-L indicators reached 6%, 2.46%, and 5.13%, respectively. The experimental effect is friendly.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Graves, A., Schmidhuber, J.: Frame wise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005). https://doi.org/10.1016/j.neunet.2005.06.042
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5999–6009 (2017)
Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Computer Science (2014)
Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22(1), 457–479 (2004). https://doi.org/10.1613/jair.1523
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural network advances. In: Neural Information Processing Systems (2014)
Kai, L., Hongling, W.: Research on coherence of automatic abstracts based on text rhetoric structure. J. Chin. Inf. 33(1), 77–84 (2019)
Page, L., et al.: The pagerank citation ranking: Bringing order to the web. Stanf. Digital Libraries Work. Paper 9(1), 1–14 (1998)
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization Branches Out, Text Summarization Branches Out, pp. 74–81. Association for Computational Linguistics, Barcelona, Spain (2004)
Meng, X., Cui, R., Zhao, Y., et al.: Multilingual short text classification based on LDA and BiLSTM-CNN neural network. Web Inf. Syst. Appl. 2019, 319–323 (2019)
Mikolov, T., et al.: Efficient estimation of word representations in vector space. In: Computer Science (2013)
Mikolov, T., Karafit, M., et al.: Recurrent neural network based language model. In: Proceedings of INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 26–30. Makuhari, Chiba, Japan (2010). https://doi.org/10.1109/EIDWT.2013.25
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958). https://doi.org/10.1147/rd.22.0159
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411. EMNLP (2004)
Qingyu, Z., Nan, Y., Furu, W., et al.: Selective encoding for abstractive sentence summarization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1095–1104 (2016). https://doi.org/10.18653/v1/P17-1101
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013). https://doi.org/10.1007/s12088-011-0245-8
Chopra, S., Auli, M., Rush, A.M.: Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of NAACL-HLT, pp. 93–98. NAACL, San Diego (2016). https://doi.org/10.18653/v1/N16-1012
Ma, S., et al.: A hierarchical end-to-end model for jointly improving text summarization and sentiment classification. In: IJCAI 2018: International Joint Conference on Artificial Intelligence, IJCAI (2018). https://doi.org/10.24963/ijcai.2018/591
Senin, P., Malinchik, S.: SAX-VSM: Interpretable time series classification using SAX and vector space model. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 1175–1180. IEEE (2013). https://doi.org/10.1109/ICDM.2013.52
Shuai, W.: Tp-as: A two-stage automatic summary method for long texts. J. Chin. Inf. 32(6), 71–79 (2018)
Srivastava, N., Hinton, G., Krizhevsky, A., et al.: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Wan., X.J.: Bilingual information for cross-language document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics ACL, Portland, USA, pp. 1546–1555. ACL (2011)
Acknowledgements
This work was supported by the National Language Commission Scientific Research Project (YB135-76); Yanbian University Foreign Language and Literature First-Class Subject Construction Project (18YLPY13).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, F., Cui, R., Yi, Z., Zhao, Y. (2020). Cross-Language Generative Automatic Summarization Based on Attention Mechanism. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds) Web Information Systems and Applications. WISA 2020. Lecture Notes in Computer Science(), vol 12432. Springer, Cham. https://doi.org/10.1007/978-3-030-60029-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-60029-7_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60028-0
Online ISBN: 978-3-030-60029-7
eBook Packages: Computer ScienceComputer Science (R0)