Automatic Text Generation via Text Extraction Based on Submodular

Ai, Lisi; Li, Na; Zheng, Jianbing; Gao, Ming

doi:10.1007/978-3-319-69781-9_23

Automatic Text Generation via Text Extraction Based on Submodular

Lisi Ai¹⁶,
Na Li¹⁶,
Jianbing Zheng¹⁶ &
…
Ming Gao¹⁶

Conference paper
First Online: 08 November 2017

1209 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10612))

Abstract

Automatic text generation is the generation of natural language texts by computer. It has many applications, including automatic report generation, online promotion, etc. However, the problem is still a challenged task due to the lack of readability and coherence even there are many existing works studied it. In this paper, we propose a two-phase algorithm, which consists of text cleanup and text extraction, to automatically generate text from multiple texts. In the first phase, we generate paragraphs based on the topic modeling and clustering analysis. In the second phase, we model the text extraction as a set covering problem after we find the keywords in terms of the scores of TF-IDF, and solve the problem via employing the tool of submodular. We conduct a set of experiments to evaluate our proposed method and experimental results demonstrate the effectiveness of our proposed method by comparing with some comparable baselines.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

McKeown, K.R., Barzilay, R., Evans, D.K., et al.: Tracking and summarizing news on a daily basis with Columbia’s newsblaster. In: Proceedings of Human Language Technology Conference (2002)
Google Scholar
Radev, D.R., McKeovwn, K.R.: Generating natural languages summaries from multiple on-line sources. Comput. Linguist. 24, 21–29 (1998)
Google Scholar
Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Association for Computational Linguistics, pp. 912–920 (2010)
Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336 (1998)
Google Scholar
Zhang, Y., Krieger, H.U.: Large-scale corpus-driven PCFG approximation of an HPSG. In: Association for Computational Linguistics, pp. 198–208 (2011)
Google Scholar
Sripada, S., Reiter, E., Davy, I.: SumTime-Mousam: configurable marine weather forecast generator. Expert Update. 6, 4–10 (2003)
Google Scholar
Kukich, K.: Design of a knowledge-based report generator. In: Association for Computational Linguistics, pp. 145–150 (1983)
Google Scholar
Portet, F., Reiter, E., Gatt, A., et al.: Automatic generation of textual summaries from neonatal intensive care data. Artif. Intell. 173, 789–816 (2009)
Article Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
Google Scholar
Li, S., Ouyang, Y., Wang, W., et al.: Multi-document summarization using support vector regression. In: Proceedings of DUC (2007)
Google Scholar
Clarke, J., Lapata, M.: Global inference for sentence compression: an integer linear programming approach. J. Artif. Intell. Res. 31, 399–429 (2008)
MATH Google Scholar
Filippova, K.: Multi-sentence compression: finding shortest paths in word graphs. In: Association for Computational Linguistics, pp. 322–330 (2010)
Google Scholar
Thadani, K., McKeown, K.: Supervised Sentence Fusion with single-stage inference. In: IJCNL, pp. 1410–1418 (2013)
Google Scholar
Fujita, A., Inui, K., Matsumoto, Y.: Exploiting lexical conceptual structure for paraphrase generation. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS, vol. 3651, pp. 908–919. Springer, Heidelberg (2005). 10.1007/11562214_79
Chapter Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Association for Computational Linguistics, pp. 404–411 (2004)
Google Scholar
Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16, 264–285 (1969)
Article MATH Google Scholar
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions I. Math. Program. 14, 265–294 (1978)
Article MATH MathSciNet Google Scholar
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of ACL-2004 Workshop, pp. 74–81 (2004)
Google Scholar

Download references

Acknowledgements

This work has been supported by the National Key Research and Development Program of China under grant 2016YFB1000905, NSFC under Grant Nos. U1401256, 61402177, 61672234, 61402180, 61502236, 61462017, and 61363005.

Author information

Authors and Affiliations

School of Data Science and Engineering, East China Normal University, Shanghai, 200062, China
Lisi Ai, Na Li, Jianbing Zheng & Ming Gao

Authors

Lisi Ai
View author publications
You can also search for this author in PubMed Google Scholar
Na Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianbing Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ming Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Gao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Shaoxu Song
George Mason University, Fairfax, Virginia, USA
Matthias Renz
Kangwon National University, Chuncheon, Korea (Republic of)
Yang-Sae Moon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ai, L., Li, N., Zheng, J., Gao, M. (2017). Automatic Text Generation via Text Extraction Based on Submodular. In: Song, S., Renz, M., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10612. Springer, Cham. https://doi.org/10.1007/978-3-319-69781-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-69781-9_23
Published: 08 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69780-2
Online ISBN: 978-3-319-69781-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics