Skip to main content

Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering

  • Conference paper
  • First Online:
Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence (IEA/AIE 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13343))

Abstract

Extractive text summarization is one of the most important tasks in natural language processing. In this work, we use K-Means clustering to create the clusters on the Vietnamese large-scale dataset, then use these clusters to extract the most relevant sentences on the single-document to produce the summary. At first, we collected the articles in the Vietnamese online newspapers, cleaned up and packaged them into the dataset, after that we applied our summarization model for the experimentation. The best F-Score of this model based on ROUGE-2 and ROUGE-L are 15.48% and 28.68%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This dataset is popular used in text summary research.

References

  1. Agrawal, A., Gupta, U.: Extraction based approach for text summarization using K-means clustering. Int. J. Sci. Res. Publ. 4(11), 1–4 (2014)

    Google Scholar 

  2. Akter, S., Asa, A.S., Uddin, M.P., Hossain, M.D., Roy, S.K., Afjal, M.I.: An extractive text summarization technique for Bengali document(s) using K-means clustering algorithm. In: 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR), pp. 1–6. IEEE (2017)

    Google Scholar 

  3. Allahyari, M., et al.: Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)

  4. Deshpande, A.R., Lobo, L.: Text summarization using clustering technique. Int. J. Eng. Trends Technol. 4(8), 3348–3351 (2013)

    Google Scholar 

  5. Graff, D., Kong, J., Chen, K., Maeda, K.: English gigaword. Linguis. Data Consortium Philadelphia 4(1), 34 (2003)

    Google Scholar 

  6. Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  7. Hartigan, J.A., Wong, M.A.: Algorithm as 136: a K-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)

    Google Scholar 

  8. Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)

    Google Scholar 

  9. Le, H.T., Le, T.M.: An approach to abstractive text summarization. In: 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), pp. 371–376. IEEE (2013)

    Google Scholar 

  10. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)

    Google Scholar 

  11. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  12. Nguyen, V.H., Nguyen, T.C., Nguyen, M.T., Hoai, N.X.: VNDS: a Vietnamese dataset for summarization. In: 2019 6th NAFOSTED Conference on Information and Computer Science (NICS), pp. 375–380. IEEE (2019)

    Google Scholar 

  13. Nguyen-Hoang, T.A., Nguyen, K., Tran, Q.V.: TSGVi: a graph-based summarization system for Vietnamese documents. J. Ambient. Intell. Human. Comput. 3(4), 305–313 (2012). https://doi.org/10.1007/s12652-012-0143-x

    Article  Google Scholar 

  14. Quoc, H.T., Van Nguyen, K., Nguyen, N.L.T., Nguyen, A.G.T.: Monolingual versus multilingual bertology for Vietnamese extractive multi-document summarization. arXiv preprint arXiv:2108.13741 (2021)

  15. Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015). https://doi.org/10.18653/v1/d15-1044

  16. Sculley, D.: Web-scale K-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178 (2010)

    Google Scholar 

  17. See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017)

  18. Zhang, P.Y., Li, C.H.: Automatic text summarization based on sentences clustering and extraction. In: 2009 2nd IEEE International Conference on Computer Science and Information Technology, pp. 167–170. IEEE (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ti-Hon Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, TH., Do, TN. (2022). Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08530-7_62

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08529-1

  • Online ISBN: 978-3-031-08530-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics