Abstract
This paper presents SoCRFSum, a summary model which integrates user-generated content as comments and third-party sources such as relevant articles of a Web document to generate a high-quality summarization. The summarization was formulated as a sequence labeling problem, which exploits the support of external information to model sentences and comments. After modeling, Conditional Random Fields were adopted for sentence selection. SoCRFSum was validated on a dataset collected from Yahoo News. Promising results indicate that by integrating the user-generated and third-party information, our method obtains improvements of ROUGE-scores over state-of-the-art baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
We remove stopwords when modeling all features.
- 6.
- 7.
- 8.
We do this because baselines also pick up top m sentences.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
References
Amitay, E., Paris, C.: Automatically summarising web sites: is there a way around it? In: CIKM, pp. 173–179 (2000)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: ICML, pp. 89–96 (2005)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Delort, J.Y., Bouchon-Meunier, B., Rifqi, M.: Enhanced web document summarization using hyperlinks. In: Hypertext, pp. 208–215 (2003)
Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Freund, Y., Lyeryer, R.D., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news and social media streams. In: CIKM, pp. 1173–1182 (2012)
Hu, M., Sun, A., Lim, E.P.: Comments-oriented document summarization: understanding document with readers’ feedback. In: SIGIR, pp. 291–298 (2008)
Joachims, T.: Training linear svms in linear time. In: KDD, pp. 217–226 (2006)
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: SIGIR, pp. 68–73 (1995)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)
Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL, vol. 1, pp. 71–78 (2003)
Lu, Y., Zhai, C., Sundaresan, N.: Rated aspect summarization of short comments. In: WWW, pp. 131–140 (2009)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Nenkova, A.: Automatic text summarization of newswire: lessons learned from the document understanding conference. In: AAAI, pp. 1436–1441 (2005)
Nguyen, M.-T., Nguyen, M.-L.: SoRTESum: a social context framework for single-document summarization. In: Ferro, N., Crestani, F., Moens, M.-F., Mothe, J., Silvestri, F., Nunzio, G.M., Hauff, C., Silvello, G. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 3–14. Springer, Cham (2016). doi:10.1007/978-3-319-30671-1_1
Nguyen, M.T., Nguyen, M.L.: Intra-relation or inter-relation?: exploiting social information for web document summarization. Expert Syst. Appl. 76, 71–84 (2017)
Nguyen, M.T., Tran, C.X., Tran, D.V., Nguyen, M.L.: Solscsum: a linked sentence-comment dataset for social context summarization. In: CIKM, pp. 2409–2412 (2016)
Nguyen, M.T., Tran, D.V., Tran, C.X., Nguyen, M.L.: Learning to summarize web documents using social information. In: ICTAI, pp. 619–626 (2016)
Shen, D., Sun, J.T., Li, H., Yang, Q., Chen, Z.: Document summarization using conditional random fields. In: IJCAI, pp. 2862–2867 (2007)
Sun, J.T., Shen, D., Zeng, H.J., Yang, Q., Lu, Y., Chen, Z.: Web-page summarization using clickthrough data. In: SIGIR, pp. 194–201 (2005)
Svore, K.M., Vanderwende, L., Burges, C.J.: Enhancing single-document summarization by combining ranknet and third-party sources. In: EMNLP-CoNLL, pp. 448–457 (2007)
Wei, Z., Gao, W.: Utilizing microblogs for automatic news highlights extraction. In: COLING, pp. 872–883 (2014)
Wei, Z., Gao, W.: Gibberish, assistant, or master?: using tweets linking to news for extractive single-document summarization. In: SIGIR, pp. 1003–1006 (2015)
Yang, Z., Cai, K., Tang, J., Zhang, L., Su, Z., Li, J.: Social context summarization. In: SIGIR, pp. 255–264 (2011)
Acknowledgements
This work was supported by JSPS KAKENHI Grant number JP15K16048, JSPS KAKENHI Grant Number JP15K12094, and JST CREST Grant Number JPMJCR1513, Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Nguyen, MT., Tran, DV., Tran, CX., Nguyen, ML. (2017). Summarizing Web Documents Using Sequence Labeling with User-Generated Content and Third-Party Sources. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-59569-6_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59568-9
Online ISBN: 978-3-319-59569-6
eBook Packages: Computer ScienceComputer Science (R0)