skip to main content
10.1145/2838931.2838935acmotherconferencesArticle/Chapter ViewAbstractPublication PagesadcsConference Proceedingsconference-collections
research-article

A Study on the Use of Word Embeddings and PageRank for Vietnamese Text Summarization

Authors Info & Claims
Published:08 December 2015Publication History

ABSTRACT

Automatic text summarization is the process of automatically reducing the length of documents without losing the primary ideas. Due to the flood of digital text-based information, there is a great demand for summarization systems. In this paper, we investigate a number of word-embedding based approaches for sentence representation which are combined with the PageRank algorithm to select sentences for summary construction. We compare these new methods with a range of other current approaches to summarization. While the same summarization approaches can generally be applied across different languages, we target Vietnamese because of the relative lack of previous work in this space and also because it provides a good example of a language which generally requires word segmentation. Our experiments find that a word-embedding and graph based approach is an effective strategy for Vietnamese summarization and that word segmentation is not necessary for achieving good summarization results.

References

  1. S. Aji and R. Kaimal. Document summarization using positive pointwise mutual information. International Journal of Computer Science & Information Technology (IJCSIT), 4(2):47--55, 2012.Google ScholarGoogle Scholar
  2. R. Arora and R. Balaraman. Latent dirichlet allocation and singular value decomposition based multi-document summarization. In Data Mining, 2008. ICDM'08 Eighth IEE International Conference on, pages 713--718. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Bansal, K. Gimpel, and K. Livescu. Tailoring continuous word representations for dependency parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  4. Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137--1155, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Campr and K. Ježek. Topic models for comparative summarization. Text, Speech, and Dialogue, 8082:568--574, 2013.Google ScholarGoogle Scholar
  7. Y. L. Chang and J. T. Chien. Latent dirichlet learning for document summarization. In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, pages 1689--1692. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. M. Conroy and D. P. O'leary. Text summarization via hidden markov models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 406--407. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Erkan and D. R. Radev. Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, pages 457--479, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Goldstein and J. Carbonell. Summarization:(1) using mmr for diversity-based reranking and (2) evaluating summaries. In Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998, pages 181--195. Association for Computational Linguistics, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Guo and S. Sanner. Probabilistic latent maximal marginal relevance. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 833--834. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. A. N. Hoang, H. K. Nguyen, and Q. V. Tran. An efficient vietnamese text summarization approach based on graph model. In Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pages 1--6. IEEE, 2010.Google ScholarGoogle Scholar
  13. M. Kågebäck, O. Mogren, N. Tahmasebi, and D. Dubhashi. Extractive summarization using continuous vector space models. In Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC)@ EACL, pages 31--39, 2014.Google ScholarGoogle Scholar
  14. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Kubina and J. Conroy. Mss multiling 2015 task, 2015.Google ScholarGoogle Scholar
  16. Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053, 2014.Google ScholarGoogle Scholar
  17. C. Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 71--78. Association for Computational Linguistics, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Mani. Automatic summarization, volume 3. John Benjamins Publishing, 2001.Google ScholarGoogle Scholar
  19. A. K. McCallum. Mallet: A machine learning for language toolkit, 2002.Google ScholarGoogle Scholar
  20. R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Association for Computational Linguistics, 2004.Google ScholarGoogle Scholar
  21. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In Proceedings of Workshop at International Conference on Learning Representations, 2013.Google ScholarGoogle Scholar
  22. C. T. Nguyen, X. H. Phan, and T. T. Nguyen. Jvntextpro: A java-based vietnamese text processing tool, 2010.Google ScholarGoogle Scholar
  23. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google ScholarGoogle Scholar
  24. G. Salton, A. Singhal, M. Mitra, and C. Buckley. Automatic text structuring and summarization. Information Processing and Management, 33(2):193--207, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. M. Svore, L. Vanderwende, and C. J. C. Burges. Enhancing single-document summarization by combining ranknet and third-party sources. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 448--457, Prague, Czech Republic, June 2007. Association for Computational Linguistics.Google ScholarGoogle Scholar
  26. D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, volume 1, pages 1555--1565, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  27. H. N. T. Thu. An optimization text summarization method based on naive bayes and topic word for single syllable language. Applied Mathematical Sciences, 8(3):99--115, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  28. H. N. T. Thu, Q. N. Huu, and T. N. T. Ngoc. A supervised learning method combine with dimensionality reduction in vietnamese text summarization. In Computing, Communications and IT Applications Conference (ComComAp), pages 69--73. IEEE, 2013.Google ScholarGoogle Scholar

Index Terms

  1. A Study on the Use of Word Embeddings and PageRank for Vietnamese Text Summarization

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ADCS '15: Proceedings of the 20th Australasian Document Computing Symposium
            December 2015
            72 pages
            ISBN:9781450340403
            DOI:10.1145/2838931

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 8 December 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            ADCS '15 Paper Acceptance Rate5of14submissions,36%Overall Acceptance Rate30of57submissions,53%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader