Abstract
This paper presents an automatic sentiment-oriented summarization of multi-documents using soft computing (called ASMUS). It integrates two main phases: sentiment analysis and sentiment summarization. Sentiment analysis phase includes multiple strategies to tackle the following drawbacks: (1) word coverage limit of an individual lexicon; (2) contextual polarity; (3) sentence types, while the sentiment summarization phase is a graph-based ranking model that integrates the sentiment information, statistical and linguistic methods to improve the sentence ranking result. We found that the current methods are suffering from the following problems: (1) they do not consider the semantic and syntactic information in comparison between two sentences when they share the similar bag-of-words (capturing meaning); (2) vocabulary mismatch problem (lexical gaps). Furthermore, ASMUS also considers content coverage and redundancy. We conduct the experiments on the Document Understanding Conference datasets. The results present the excellent outcomes of the ASMUS in sentiment-oriented summarization.
Similar content being viewed by others
Notes
We use the negation words collected in Kolchyna et al. (2015) as a basic set, such as “no”, “not”, “don’t”, “hardly”, “none”, “never”, “are not”, “was not”, “did not”, “seldom”, “nothing”, “isn’t”,….
“Not only”, “not wholly”, “not all”, “not just”, “not quite”, “not least”, “no question”, ….
Like “but”, “with the exception of”, “except that”, “except for”, “however”, “yet”, “unfortunately”,….
We use the negation words collected in Abdi et al. (2016) as a basic set, such as “therefore”, “thus”, “consequently”, “hence”, “as a result”, “to conclude”, “in conclusion”, “as a result”,” in short”, ….
References
Abdi A, Idris N (2014) Automated summarization assessment system: quality assessment without a reference summary. In: The international conference on advances in applied science and environmental engineering—ASEE 2014. IRED Press
Abdi SA, Idris N (2014b) An analysis on student-written summaries: automatic assessment of summary writing. Int J Enhanc Res Sci Technol Eng 3:466–472
Abdi A, Idris N, Alguliev RM, Aliguliyev RM (2015a) Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems. Inf Process Manag 51:340–358
Abdi A, Idris N, Alguliyev RM, Aliguliyev RM (2015b) Query-based multi-documents summarization using linguistic knowledge and content word expansion. Soft Comput. https://doi.org/10.1007/s00500-015-1881-4
Abdi A, Idris N, Alguliyev RM, Aliguliyev RM (2016) An automated summarization assessment algorithm for identifying summarizing strategies. PLoS ONE 11:e0145809
Abdi A, Shamsuddin SM, Aliguliyev RM (2018a) QMOS: query-based multi-documents opinion-oriented summarization. Inf Process Manag 54:318–338
Abdi A, Shamsuddin SM, Hasan S, Piran J (2018b) Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment. Expert Syst Appl 109:66–85
Alfaro C, Cano-Montero J, Gómez J, Moguerza JM, Ortega F (2016) A multi-stage method for content classification and opinion mining on weblog comments. Ann Oper Res 236:197–213
Alguliyev RM, Aliguliyev RM, Isazade NR (2015) An unsupervised approach to generating generic summaries of documents. Appl Soft Comput 34:236–250
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, pp 2200–2204
Bahrainian S-A, Dengel A (2013) Sentiment analysis and summarization of twitter data. In: IEEE 16th international conference on computational science and engineering (CSE). IEEE, pp 227–234
Balahur A, Kabadjov M, Steinberger J, Steinberger R, Montoyo A (2012) Challenges and solutions in the opinion summarization of user-generated content. J Intell Inf Syst 39:375–398
Cambria E, Poria S, Bajpai R, Schuller BW (2016) SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: COLING, pp 2666–2677
Chen T, Xu R, He Y, Wang X (2017) Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst Appl 72:221–230
Cohen J (1968) Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol Bull 70:213
Deshwal A, Sharma SK (2016) Twitter sentiment analysis using various classification algorithms. In: 5th International conference on reliability, infocom technologies and optimization (trends and future directions) (ICRITO). IEEE, pp 251–257
Di Capua M, Petrosino A (2016) A deep learning approach to deal with data uncertainty in sentiment analysis. In: International workshop on fuzzy logic and applications. Springer, pp 172–184
Edmundson HP (1969) New methods in automatic extracting. J ACM (JACM) 16:264–285
Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, Favaro L (2014) A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl 41:5780–5787
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47:1–66
Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2:258–268
Gupta P, Tiwari R, Robert N (2016) Sentiment analysis and text summarization of online reviews: a survey. In: International conference on communication and signal processing (ICCSP). IEEE, pp 0241–0245
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 168–177
Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews: a text summarization approach. Inf Process Manag 53:436–449
Hung C, Chen S-J (2016) Word sense disambiguation based sentiment lexicons for sentiment classification. Knowl Based Syst 110:224–232
Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11:37–50
Kabadjov M, Balahur A, Boldrini E (2009) Sentiment intensity: is it a good summary indicator? In: Language and technology conference. Springer, pp 203–212
Khan FH, Qamar U, Bashir S (2016) SWIMS: semi-supervised subjective feature weighting and intelligent model selection for sentiment analysis. Knowl Based Syst 100:97–111
Kim S, Calvo R (2011) Sentiment-oriented summarisation of peer reviews. In: Artificial intelligence in education. Springer, pp 491–493
Kiyoumarsi F (2015) Evaluation of automatic text summarizations based on human summaries. Procedia Soc Behav Sci 192:83–91
Kolchyna O, Souza TT, Treleaven P, Aste T (2015) Twitter sentiment analysis: Lexicon method, machine learning method and their combination. arXiv preprint: arXiv:150700955
Landauer TK (2002) On the computational basis of learning and cognition: arguments from LSA. Psychol Learn Motiv 41:43–84
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Li Y, McLean D, Bandar ZA, O’shea JD, Crockett K (2006) Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng 18:1138–1150
Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, pp 74–81
Lin C-Y, Hovy E (1997) Identifying topics by position. In: Proceedings of the fifth conference on applied natural language processing. Association for Computational Linguistics, pp 283–290
Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5:1–167
Lloret E, Saggion H, Palomar M (2010) Experiments on summary-based opinion classification. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text. Association for Computational Linguistics, pp 107–115
Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41:4158–4169
Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, Del Fiol G (2014) Text summarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 52:457–467
Mohammad SM, Kiritchenko S, Zhu X (2013) NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. arXiv preprint: arXiv:13086242
Narayanan R, Liu B, Choudhary A (2009) Sentiment analysis of conditional sentences. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, pp 180–189
Nielsen FÅ (2011) A new ANEW: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint: arXiv:11032903
O’Connor B, Balasubramanyan R, Routledge BR, Smith NA (2010) From tweets to polls: linking text sentiment to public opinion time series. ICWSM 11:1–2
Pérez D, Gliozzo AM, Strapparava C, Alfonseca E, Rodríguez P, Magnini B (2005) Automatic assessment of students’ free-text answers underpinned by the combination of a BLEU-inspired algorithm and latent semantic analysis. In: FLAIRS conference, pp 358–363
Rana TA, Cheah Y-N (2016) Aspect extraction in sentiment analysis: comparative analysis and survey. Artif Intell Rev 46:459–483
Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 105–112
Saggion H (2014) Creating summarization systems with SUMMA. In: LREC, pp 4157–4163
Saggion H, Poibeau T (2013) Automatic text summarization: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, pp 3–21
Sankarasubramaniam Y, Ramanathan K, Ghosh S (2014) Text summarization using Wikipedia. Inf Process Manag 50:443–461
Sarker A, Mollá D, Paris C (2013) An approach for query-focused text summarisation for evidence based medicine. In: Artificial intelligence in medicine. Springer, pp 295–304
Statistics L (2015) Wilcoxon signed-rank test using SPSS statistics. In: Statistical tutorials and software guides
Stone PJ, Hunt EB (1963) A computer approach to content analysis: studies using the general inquirer system. In: Proceedings of the spring joint computer conference, 21–23 May 1963. ACM, pp 241–256
Strapparava C, Valitutti A (2004) WordNet affect: an affective extension of WordNet. In: LREC. Citeseer, pp 1083–1086
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37:267–307
Tayal MA, Raghuwanshi MM, Malik LG (2017) ATSSC: development of an approach based on soft computing for text summarization. Comput Speech Lang 41:214–235
Teufel S, Moens M (1997) Sentence extraction as a classification task. In: Proceedings of the ACL, vol 1997, pp 58–65
Vani K, Gupta D (2014) Using K-means cluster based techniques in external plagiarism detection. In: International conference on contemporary computing and informatics (IC3I). IEEE, pp 1268–1273
Xia R, Xu F, Yu J, Qi Y, Cambria E (2016) Polarity shift detection, elimination and ensemble: a three-stage model for document-level sentiment analysis. Inf Process Manag 52:36–45
Yadav N, Chatterjee N (2016) Text summarization using sentiment analysis for DUC data. In: International conference on information technology (ICIT). IEEE, pp 229–234
Yadav CS, Sharan A (2015) Hybrid approach for single text document summarization using statistical and sentiment features. Int J Inf Retr Res (IJIRR) 5:46–70
Zhang J, Sun L, Zhou Q (2005) A cue-based hub-authority approach for multi-document text summarization. In: Proceedings of 2005 IEEE international conference on natural language processing and knowledge engineering. IEEE NLP-KE’05. IEEE, pp 642–645
Acknowledgements
The authors would like to thank Research Management Centre (RMC), Universiti Teknologi Malaysia (UTM) for the support in R&D, UTM Big Data Centre (BDC), for the inspiration in making this study a success. The authors would also like to thank the anonymous reviewers who have contributed enormously to this work.
Funding
This work is supported by The Ministry of Higher Education (MOHE) under “Q.J130000.21A2.03E53 - STATISTICAL MACHINE LEARNING METHODS TO TEXT SUMMARIZATIONS” and “13H82 (INTELLIGENT PREDICTIVE ANALYTICS FOR RETAIL INDUSTRY).”
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
I hereby and on behalf of the co-authors declare that all the authors agreed to submit the article exclusively to this journal and also declare that there is no conflict of interests regarding the publication of this article.
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abdi, A., Shamsuddin, S.M., Hasan, S. et al. Automatic sentiment-oriented summarization of multi-documents using soft computing. Soft Comput 23, 10551–10568 (2019). https://doi.org/10.1007/s00500-018-3653-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3653-4