Abstract
Content curation is a significant step to identify the relevant content for the searched topics. There are many methods introduced to generate summarized contents but those methods focussed only on generating precise contents that lacked the key essence of the input texts. Therefore, we propose a hybrid model with the integration of self-attention to the bi-directional long short-term memory auto-encoder (Bi-LSTM-AE) to generate information-rich abstracts. Initially, the dataset is pre-processed and then the major word-level and sentence-level features are extracted. Then, based on the similarities between the contents, the extractive summary is generated which is then given to the auto-encoder for final abstraction. The efficiency of the model has been proved through simulations with the CNN/Daily Mail dataset in terms of ROUGE metrics. The proposed model outperformed the other compared models with a score of 0.59 for ROUGE 1, 0.39 for ROUGE 2 and 0.71 for ROUGE L with high generalization.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Belwal RC, Rai S, Gupta A (2020) A new graph-based extractive text summarization using keywords or topic modeling. J Amb Intell Human Comput:1–6
Bidoki M, Moosavi MR, Fakhrahmad M (2020) A semantic approach to extractive multi-document summarization: applying sentence expansion for tuning of conceptual densities. Inf Process Manag 57(6):102341
Cong Y, Liu J, Sun G, You Q, Li Y, Luo J (2016) Adaptive greedy dictionary selection for web media summarization. IEEE Trans Image Process 26(1):185–195
Gerani S, Carenini G, Ng RT (2019) Modeling content and structure for abstractive review summarization. Comput Speech Lang 53:302–331
Joshi A, Fidalgo E, Alegre E, Fernandez-Robles L (2019) SummCoder: an unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Syst Appl 129:200–215
Karthikeyan T, Sekaran K, Ranjith D, Balajee JM (2019) Personalized content extraction and text classification using effective web scraping techniques. Int J Web Portals (IJWP) 11(2):41–52
Khatter H, Ahlawat AK (2020) An intelligent personalized web blog searching technique using fuzzy-based feedback recurrent neural network. Soft Comput 1-3.
Khatter H, Kalra BM (2012) A new approach to blog information searching and curating." in 2012 CSI sixth international conference on software engineering (CONSEG), IEEE 1-6.
Khatter H, Trivedi MC, Kalra BM (2015) An implementation of intelligent searching and curating technique on blog web 2.0 tool. Int J u-and e-Serv, Sci Technol Intro Citation 8(6):45–54
Kingma DP and Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kumar S, Bhatia KK (2020) Semantic similarity and text summarization based novelty detection. SN Appl Sci 2(3):332
Lamsiyah S, El Mahdaouy A, Espinasse B, Ouatik SE (2020) An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings. Exp Syst Appl 167:114152
Malhotra D, Rishi OP (2019) A comprehensive review from hyperlink to intelligent technologies based personalized search systems. J Manag Analytics 6(4):365–389
Manjari KU, Rousha S, Sumanth D, Devi JS (2020) Extractive text summarization from web pages using selenium and TF-IDF algorithm. In2020 4th international conference on trends in electronics and informatics (ICOEI) (48184) IEEE 648-652.
Mao X, Yang H, Huang S, Liu Y, Li R (2019) Extractive summarization using supervised and unsupervised learning. Expert Syst Appl 133:173–181
Nallapati R, Zhou B, Gulcehre C and Xiang B (2016) Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
Nguyen MT, Tran VC, Nguyen XH, Nguyen LM (2019) Web document summarization by exploiting social context with matrix co-factorization. Inf Process Manag 56(3):495–515
Oliveira H, Ferreira R, Lima R, Lins RD, Freitas F, Riss M, Simske SJ (2016) Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization. Expert Syst Appl 65:68–86
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532-1543.
Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS (2018) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv (CSUR) 51(5):1–36
Rani R, Lobiyal DK (2020) An extractive text summarization approach using tagged-LDA based topic modeling. Multimedia tools and applications 1-31.
Roul RK (2020) Topic modeling combined with classification technique for extractive multi-document text summarization. Soft computing 1-5.
Souza CM, Meireles MR, Almeida PE (2020) A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset. Scientometrics 1-22.Kit g.
Trappey AJ, Trappey CV, Wu JL, Wang JW (2020) Intelligent compilation of patent summaries using machine learning and natural language processing techniques. Adv Eng Inform 43:101027
Vázquez E, Arnulfo Garcia-Hernandez R, Ledeneva Y (2018) Sentence features relevance for extractive text summarization using genetic algorithms. J Intell Fuzzy Syst 35(1):353–365
Verma P, Om H (2019) MCRMR: maximum coverage and relevancy with minimal redundancy based multi-document summarization. Expert Syst Appl 120:43–56
Wang R, Luo S, Pan L, Wu Z, Yuan Y, Chen Q (2019) Microblog summarization using paragraph vector and semantic structure. Comput Speech Lang 57:1–9
Wu L, Wang D, Zhang X, Liu S, Zhang L, Chen CW (2017) MLLDA: multi-level LDA for modelling users on content curation social networks. Neurocomputing 236:73–81
You X (2019) Automatic summarization and keyword extraction from web page or text file. In2019 IEEE 2nd international conference on computer and communication engineering technology (CCET) IEEE 154-158.
Zhao M, Yan S, Liu B, Zhong X, Hao Q, Chen H, Niu D, Long B, Guo W (2020) QBSUM: a large-scale query-based document summarization dataset from real-world applications. Comput Speech Lang 66:101166
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Khatter, H., Ahlawat, A.K. Content curation algorithm on blog posts using hybrid computing. Multimed Tools Appl 81, 7589–7609 (2022). https://doi.org/10.1007/s11042-022-12105-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12105-w