Skip to main content

Automatic Text Generation via Text Extraction Based on Submodular

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10612))

Abstract

Automatic text generation is the generation of natural language texts by computer. It has many applications, including automatic report generation, online promotion, etc. However, the problem is still a challenged task due to the lack of readability and coherence even there are many existing works studied it. In this paper, we propose a two-phase algorithm, which consists of text cleanup and text extraction, to automatically generate text from multiple texts. In the first phase, we generate paragraphs based on the topic modeling and clustering analysis. In the second phase, we model the text extraction as a set covering problem after we find the keywords in terms of the scores of TF-IDF, and solve the problem via employing the tool of submodular. We conduct a set of experiments to evaluate our proposed method and experimental results demonstrate the effectiveness of our proposed method by comparing with some comparable baselines.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. McKeown, K.R., Barzilay, R., Evans, D.K., et al.: Tracking and summarizing news on a daily basis with Columbia’s newsblaster. In: Proceedings of Human Language Technology Conference (2002)

    Google Scholar 

  2. Radev, D.R., McKeovwn, K.R.: Generating natural languages summaries from multiple on-line sources. Comput. Linguist. 24, 21–29 (1998)

    Google Scholar 

  3. Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Association for Computational Linguistics, pp. 912–920 (2010)

    Google Scholar 

  4. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336 (1998)

    Google Scholar 

  5. Zhang, Y., Krieger, H.U.: Large-scale corpus-driven PCFG approximation of an HPSG. In: Association for Computational Linguistics, pp. 198–208 (2011)

    Google Scholar 

  6. Sripada, S., Reiter, E., Davy, I.: SumTime-Mousam: configurable marine weather forecast generator. Expert Update. 6, 4–10 (2003)

    Google Scholar 

  7. Kukich, K.: Design of a knowledge-based report generator. In: Association for Computational Linguistics, pp. 145–150 (1983)

    Google Scholar 

  8. Portet, F., Reiter, E., Gatt, A., et al.: Automatic generation of textual summaries from neonatal intensive care data. Artif. Intell. 173, 789–816 (2009)

    Article  Google Scholar 

  9. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)

    Google Scholar 

  10. Li, S., Ouyang, Y., Wang, W., et al.: Multi-document summarization using support vector regression. In: Proceedings of DUC (2007)

    Google Scholar 

  11. Clarke, J., Lapata, M.: Global inference for sentence compression: an integer linear programming approach. J. Artif. Intell. Res. 31, 399–429 (2008)

    MATH  Google Scholar 

  12. Filippova, K.: Multi-sentence compression: finding shortest paths in word graphs. In: Association for Computational Linguistics, pp. 322–330 (2010)

    Google Scholar 

  13. Thadani, K., McKeown, K.: Supervised Sentence Fusion with single-stage inference. In: IJCNL, pp. 1410–1418 (2013)

    Google Scholar 

  14. Fujita, A., Inui, K., Matsumoto, Y.: Exploiting lexical conceptual structure for paraphrase generation. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS, vol. 3651, pp. 908–919. Springer, Heidelberg (2005). 10.1007/11562214_79

    Chapter  Google Scholar 

  15. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  16. Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Association for Computational Linguistics, pp. 404–411 (2004)

    Google Scholar 

  17. Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16, 264–285 (1969)

    Article  MATH  Google Scholar 

  18. Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions I. Math. Program. 14, 265–294 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  19. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of ACL-2004 Workshop, pp. 74–81 (2004)

    Google Scholar 

Download references

Acknowledgements

This work has been supported by the National Key Research and Development Program of China under grant 2016YFB1000905, NSFC under Grant Nos. U1401256, 61402177, 61672234, 61402180, 61502236, 61462017, and 61363005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ai, L., Li, N., Zheng, J., Gao, M. (2017). Automatic Text Generation via Text Extraction Based on Submodular. In: Song, S., Renz, M., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10612. Springer, Cham. https://doi.org/10.1007/978-3-319-69781-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69781-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69780-2

  • Online ISBN: 978-3-319-69781-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics