Abstract
Extractive text summarization is one of the most important tasks in natural language processing. In this work, we use K-Means clustering to create the clusters on the Vietnamese large-scale dataset, then use these clusters to extract the most relevant sentences on the single-document to produce the summary. At first, we collected the articles in the Vietnamese online newspapers, cleaned up and packaged them into the dataset, after that we applied our summarization model for the experimentation. The best F-Score of this model based on ROUGE-2 and ROUGE-L are 15.48% and 28.68%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This dataset is popular used in text summary research.
References
Agrawal, A., Gupta, U.: Extraction based approach for text summarization using K-means clustering. Int. J. Sci. Res. Publ. 4(11), 1–4 (2014)
Akter, S., Asa, A.S., Uddin, M.P., Hossain, M.D., Roy, S.K., Afjal, M.I.: An extractive text summarization technique for Bengali document(s) using K-means clustering algorithm. In: 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR), pp. 1–6. IEEE (2017)
Allahyari, M., et al.: Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)
Deshpande, A.R., Lobo, L.: Text summarization using clustering technique. Int. J. Eng. Trends Technol. 4(8), 3348–3351 (2013)
Graff, D., Kong, J., Chen, K., Maeda, K.: English gigaword. Linguis. Data Consortium Philadelphia 4(1), 34 (2003)
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a K-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
Le, H.T., Le, T.M.: An approach to abstractive text summarization. In: 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), pp. 371–376. IEEE (2013)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Nguyen, V.H., Nguyen, T.C., Nguyen, M.T., Hoai, N.X.: VNDS: a Vietnamese dataset for summarization. In: 2019 6th NAFOSTED Conference on Information and Computer Science (NICS), pp. 375–380. IEEE (2019)
Nguyen-Hoang, T.A., Nguyen, K., Tran, Q.V.: TSGVi: a graph-based summarization system for Vietnamese documents. J. Ambient. Intell. Human. Comput. 3(4), 305–313 (2012). https://doi.org/10.1007/s12652-012-0143-x
Quoc, H.T., Van Nguyen, K., Nguyen, N.L.T., Nguyen, A.G.T.: Monolingual versus multilingual bertology for Vietnamese extractive multi-document summarization. arXiv preprint arXiv:2108.13741 (2021)
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015). https://doi.org/10.18653/v1/d15-1044
Sculley, D.: Web-scale K-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178 (2010)
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017)
Zhang, P.Y., Li, C.H.: Automatic text summarization based on sentences clustering and extraction. In: 2009 2nd IEEE International Conference on Computer Science and Information Technology, pp. 167–170. IEEE (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Nguyen, TH., Do, TN. (2022). Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_62
Download citation
DOI: https://doi.org/10.1007/978-3-031-08530-7_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08529-1
Online ISBN: 978-3-031-08530-7
eBook Packages: Computer ScienceComputer Science (R0)