Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering

Nguyen, Ti-Hon; Do, Thanh-Nghi

doi:10.1007/978-3-031-08530-7_62

Ti-Hon Nguyen¹¹ &
Thanh-Nghi Do¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13343))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1510 Accesses
3 Citations

Abstract

Extractive text summarization is one of the most important tasks in natural language processing. In this work, we use K-Means clustering to create the clusters on the Vietnamese large-scale dataset, then use these clusters to extract the most relevant sentences on the single-document to produce the summary. At first, we collected the articles in the Vietnamese online newspapers, cleaned up and packaged them into the dataset, after that we applied our summarization model for the experimentation. The best F-Score of this model based on ROUGE-2 and ROUGE-L are 15.48% and 28.68%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This dataset is popular used in text summary research.

References

Agrawal, A., Gupta, U.: Extraction based approach for text summarization using K-means clustering. Int. J. Sci. Res. Publ. 4(11), 1–4 (2014)
Google Scholar
Akter, S., Asa, A.S., Uddin, M.P., Hossain, M.D., Roy, S.K., Afjal, M.I.: An extractive text summarization technique for Bengali document(s) using K-means clustering algorithm. In: 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR), pp. 1–6. IEEE (2017)
Google Scholar
Allahyari, M., et al.: Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)
Deshpande, A.R., Lobo, L.: Text summarization using clustering technique. Int. J. Eng. Trends Technol. 4(8), 3348–3351 (2013)
Google Scholar
Graff, D., Kong, J., Chen, K., Maeda, K.: English gigaword. Linguis. Data Consortium Philadelphia 4(1), 34 (2003)
Google Scholar
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Article Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a K-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
Google Scholar
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
Google Scholar
Le, H.T., Le, T.M.: An approach to abstractive text summarization. In: 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), pp. 371–376. IEEE (2013)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Nguyen, V.H., Nguyen, T.C., Nguyen, M.T., Hoai, N.X.: VNDS: a Vietnamese dataset for summarization. In: 2019 6th NAFOSTED Conference on Information and Computer Science (NICS), pp. 375–380. IEEE (2019)
Google Scholar
Nguyen-Hoang, T.A., Nguyen, K., Tran, Q.V.: TSGVi: a graph-based summarization system for Vietnamese documents. J. Ambient. Intell. Human. Comput. 3(4), 305–313 (2012). https://doi.org/10.1007/s12652-012-0143-x
Article Google Scholar
Quoc, H.T., Van Nguyen, K., Nguyen, N.L.T., Nguyen, A.G.T.: Monolingual versus multilingual bertology for Vietnamese extractive multi-document summarization. arXiv preprint arXiv:2108.13741 (2021)
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015). https://doi.org/10.18653/v1/d15-1044
Sculley, D.: Web-scale K-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178 (2010)
Google Scholar
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017)
Zhang, P.Y., Li, C.H.: Automatic text summarization based on sentences clustering and extraction. In: 2009 2nd IEEE International Conference on Computer Science and Information Technology, pp. 167–170. IEEE (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Can-Tho University, Can-Tho, Vietnam
Ti-Hon Nguyen & Thanh-Nghi Do

Authors

Ti-Hon Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thanh-Nghi Do
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ti-Hon Nguyen .

Editor information

Editors and Affiliations

i-SOMET, Inc., Morioka-shi, Iwate, Japan
Hamido Fujita
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, China
Philippe Fournier-Viger
Texas State University, San Marcos, TX, USA
Moonis Ali
Shanghai University of Finance and Economics, Shanghai, China
Yinglin Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, TH., Do, TN. (2022). Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_62

Download citation

DOI: https://doi.org/10.1007/978-3-031-08530-7_62
Published: 30 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08529-1
Online ISBN: 978-3-031-08530-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering