Content curation algorithm on blog posts using hybrid computing

Khatter, Harsh; Ahlawat, Anil Kumar

doi:10.1007/s11042-022-12105-w

Content curation algorithm on blog posts using hybrid computing

Published: 28 January 2022

Volume 81, pages 7589–7609, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Harsh Khatter¹ &
Anil Kumar Ahlawat¹

655 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

Content curation is a significant step to identify the relevant content for the searched topics. There are many methods introduced to generate summarized contents but those methods focussed only on generating precise contents that lacked the key essence of the input texts. Therefore, we propose a hybrid model with the integration of self-attention to the bi-directional long short-term memory auto-encoder (Bi-LSTM-AE) to generate information-rich abstracts. Initially, the dataset is pre-processed and then the major word-level and sentence-level features are extracted. Then, based on the similarities between the contents, the extractive summary is generated which is then given to the auto-encoder for final abstraction. The efficiency of the model has been proved through simulations with the CNN/Daily Mail dataset in terms of ROUGE metrics. The proposed model outperformed the other compared models with a score of 0.59 for ROUGE 1, 0.39 for ROUGE 2 and 0.71 for ROUGE L with high generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

Belwal RC, Rai S, Gupta A (2020) A new graph-based extractive text summarization using keywords or topic modeling. J Amb Intell Human Comput:1–6
Bidoki M, Moosavi MR, Fakhrahmad M (2020) A semantic approach to extractive multi-document summarization: applying sentence expansion for tuning of conceptual densities. Inf Process Manag 57(6):102341
Article Google Scholar
Cong Y, Liu J, Sun G, You Q, Li Y, Luo J (2016) Adaptive greedy dictionary selection for web media summarization. IEEE Trans Image Process 26(1):185–195
Article MathSciNet Google Scholar
Gerani S, Carenini G, Ng RT (2019) Modeling content and structure for abstractive review summarization. Comput Speech Lang 53:302–331
Article Google Scholar
Joshi A, Fidalgo E, Alegre E, Fernandez-Robles L (2019) SummCoder: an unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Syst Appl 129:200–215
Article Google Scholar
Karthikeyan T, Sekaran K, Ranjith D, Balajee JM (2019) Personalized content extraction and text classification using effective web scraping techniques. Int J Web Portals (IJWP) 11(2):41–52
Article Google Scholar
Khatter H, Ahlawat AK (2020) An intelligent personalized web blog searching technique using fuzzy-based feedback recurrent neural network. Soft Comput 1-3.
Khatter H, Kalra BM (2012) A new approach to blog information searching and curating." in 2012 CSI sixth international conference on software engineering (CONSEG), IEEE 1-6.
Khatter H, Trivedi MC, Kalra BM (2015) An implementation of intelligent searching and curating technique on blog web 2.0 tool. Int J u-and e-Serv, Sci Technol Intro Citation 8(6):45–54
Google Scholar
Kingma DP and Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kumar S, Bhatia KK (2020) Semantic similarity and text summarization based novelty detection. SN Appl Sci 2(3):332
Article Google Scholar
Lamsiyah S, El Mahdaouy A, Espinasse B, Ouatik SE (2020) An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings. Exp Syst Appl 167:114152
Article Google Scholar
Malhotra D, Rishi OP (2019) A comprehensive review from hyperlink to intelligent technologies based personalized search systems. J Manag Analytics 6(4):365–389
Article Google Scholar
Manjari KU, Rousha S, Sumanth D, Devi JS (2020) Extractive text summarization from web pages using selenium and TF-IDF algorithm. In2020 4th international conference on trends in electronics and informatics (ICOEI) (48184) IEEE 648-652.
Mao X, Yang H, Huang S, Liu Y, Li R (2019) Extractive summarization using supervised and unsupervised learning. Expert Syst Appl 133:173–181
Article Google Scholar
Nallapati R, Zhou B, Gulcehre C and Xiang B (2016) Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
Nguyen MT, Tran VC, Nguyen XH, Nguyen LM (2019) Web document summarization by exploiting social context with matrix co-factorization. Inf Process Manag 56(3):495–515
Article MathSciNet Google Scholar
Oliveira H, Ferreira R, Lima R, Lins RD, Freitas F, Riss M, Simske SJ (2016) Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization. Expert Syst Appl 65:68–86
Article Google Scholar
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532-1543.
Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS (2018) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv (CSUR) 51(5):1–36
Article Google Scholar
Rani R, Lobiyal DK (2020) An extractive text summarization approach using tagged-LDA based topic modeling. Multimedia tools and applications 1-31.
Roul RK (2020) Topic modeling combined with classification technique for extractive multi-document text summarization. Soft computing 1-5.
Souza CM, Meireles MR, Almeida PE (2020) A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset. Scientometrics 1-22.Kit g.
Trappey AJ, Trappey CV, Wu JL, Wang JW (2020) Intelligent compilation of patent summaries using machine learning and natural language processing techniques. Adv Eng Inform 43:101027
Article Google Scholar
Vázquez E, Arnulfo Garcia-Hernandez R, Ledeneva Y (2018) Sentence features relevance for extractive text summarization using genetic algorithms. J Intell Fuzzy Syst 35(1):353–365
Article Google Scholar
Verma P, Om H (2019) MCRMR: maximum coverage and relevancy with minimal redundancy based multi-document summarization. Expert Syst Appl 120:43–56
Article Google Scholar
Wang R, Luo S, Pan L, Wu Z, Yuan Y, Chen Q (2019) Microblog summarization using paragraph vector and semantic structure. Comput Speech Lang 57:1–9
Article Google Scholar
Wu L, Wang D, Zhang X, Liu S, Zhang L, Chen CW (2017) MLLDA: multi-level LDA for modelling users on content curation social networks. Neurocomputing 236:73–81
Article Google Scholar
You X (2019) Automatic summarization and keyword extraction from web page or text file. In2019 IEEE 2nd international conference on computer and communication engineering technology (CCET) IEEE 154-158.
Zhao M, Yan S, Liu B, Zhong X, Hao Q, Chen H, Niu D, Long B, Guo W (2020) QBSUM: a large-scale query-based document summarization dataset from real-world applications. Comput Speech Lang 66:101166
Article Google Scholar

Download references

Author information

Authors and Affiliations

KIET Group of Institutions, Delhi-NCR, Ghaziabad, Affiliated to Dr. APJ Abdul Kalam Technical University, Lucknow, India
Harsh Khatter & Anil Kumar Ahlawat

Authors

Harsh Khatter
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Ahlawat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harsh Khatter.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khatter, H., Ahlawat, A.K. Content curation algorithm on blog posts using hybrid computing. Multimed Tools Appl 81, 7589–7609 (2022). https://doi.org/10.1007/s11042-022-12105-w

Download citation

Received: 12 April 2021
Revised: 31 July 2021
Accepted: 03 January 2022
Published: 28 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11042-022-12105-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Content curation algorithm on blog posts using hybrid computing

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Sentiment Analysis in the Age of Generative AI

A survey on deep learning approaches for text-to-SQL

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Content curation algorithm on blog posts using hybrid computing

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Sentiment Analysis in the Age of Generative AI

A survey on deep learning approaches for text-to-SQL

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation