Skip to main content
Log in

TSGVi: a graph-based summarization system for Vietnamese documents

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

This paper proposes an automatic method to generate an extractive summary of multiple Vietnamese documents which are related to a common topic by modeling text documents as weighted undirected graphs. It initially builds undirected graphs with vertices representing the sentences of documents and edges indicate the similarity between sentences. Then, by adopting PageRank algorithm, we can generate salient scores for sentences. Sentences are ranked according to their salient scores and selected based on maximal marginal relevance to form the summaries. These summaries are combined and applied the same process one more time to form the final extractive summary of the document set. A series of experiments are performed on Vietnamese news articles and English data of DUC 2002, 2003, 2007. The results demonstrate the effectiveness of the proposed technique over reference systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.vnexpress.net.

  2. http://www.tuoitre.com.vn.

  3. http://www.thanhnien.com.vn.

  4. http://www.dantri.com.vn

  5. http://www.sharpnlp.codeplex.com/

References

  • Barzilay R, Elhadad M (1997) Using lexical chains for text summarization. In: In Proceedings of the ACL workshop on intelligent scalable text summarization, pp 10–17

  • Berger AL, Mittal VO (2000) Ocelot: a system for summarizing web pages. In: SIGIR, pp 144–151

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 30(1-7):107–117

    Google Scholar 

  • Carbonell JG, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp 335–336

  • Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479

    Google Scholar 

  • Goldstein J, Kantrowitz M, Mittal VO, Carbonell JG (1999) Summarizing text documents: sentence selection and evaluation metrics. In: SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, August 15–19, 1999, Berkeley, CA, USA. ACM, New York, pp 121–128

  • Ha TL, Huynh TQ, Luong MC (2005) A primary studies on summarization of documents in Vietnamese. In: The First World Congress of the International Federation for Systems Research

  • Lin CY, Hovy EH (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL

  • Lin CY, Lin CY, Hovy E (2002) Automated multi-document summarization in neats. In: Proceedings of the human language technology conference (HLT2002), pp 23–27

  • Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2:159–165

    Article  MathSciNet  Google Scholar 

  • Mani I, Bloedorn E (1997) Multi-document summarization by graph search and matching. In: AAAI/IAAI, pp 622–628

  • McKeown KR, Barzilay R, Evans D, Hatzivassiloglou V, Klavans JL, Nenkova A, Sable C, Schiffman B, Sigelman S, Summarization M (2002) Tracking and summarizing news on a daily basis with columbia’s newsblaster

  • Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: EMNLP, pp 404–411

  • Mihalcea R, Tarau P (2005a) A language independent algorithm for single and multiple document summarization. In: Proceedings of IJCNLP’2005

  • Mihalcea R, Tarau P (2005b) Multi-document summarization with iterative graph-based algorithms. In: 1st International conference on intelligent analysis methods and tools (IA)

  • Mittal VO, Kantrowitz M, Goldstein J, Carbonell JG (1999) Selecting text spans for document summaries: heuristics and metrics. In: AAAI/IAAI, pp 467–473

  • Nguyen ML, Shimazu A, Phan XH, Ho TB, Horiguchi S (2005) Sentence extraction with support vector machine ensemble. In: The First World Congress of the international federation for systems research

  • Nguyen HTA, Nguyen HK, Tran QV (2010) An efficient Vietnamese text summarization approach based on graph model. In: RIVF

  • Nomoto T, Matsumoto Y (2001) A new approach to unsupervised text summarization. In: SIGIR, pp 26–34

  • Phuc D, Hung MX (2008) Using SOM based graph clustering for extracting main ideas from documents. In: RIVF, pp 209–214

  • Radev DR (2001) Experiments in single and multidocument summarization using mead. In: First document understanding conference

  • Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33(2):193–207

    Article  Google Scholar 

  • Schiffman B, Mani I, Concepcion KJ (2001) Producing biographical summaries: combining linguistic knowledge with corpus statistics. In: ACL, pp 450–457

  • Wei F, Li W, Lu Q, He Y (2010) A document-sensitive graph model for multi-document summarization. Knowl Inf Syst 22(2):245–259

    Article  Google Scholar 

  • Zha H (2002) Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: SIGIR, pp 113–120

Download references

Acknowledgments

The authors would like to thank Prof. Kiem Hoang from the University of Information Technology, VNU, HCM City for his invaluable and insightful comments. The authors also thank the anonymous reviewers for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tu-Anh Nguyen-Hoang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen-Hoang, TA., Nguyen, K. & Tran, QV. TSGVi: a graph-based summarization system for Vietnamese documents. J Ambient Intell Human Comput 3, 305–313 (2012). https://doi.org/10.1007/s12652-012-0143-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-012-0143-x

Keywords

Navigation