Skip to main content
Log in

Content curation algorithm on blog posts using hybrid computing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Content curation is a significant step to identify the relevant content for the searched topics. There are many methods introduced to generate summarized contents but those methods focussed only on generating precise contents that lacked the key essence of the input texts. Therefore, we propose a hybrid model with the integration of self-attention to the bi-directional long short-term memory auto-encoder (Bi-LSTM-AE) to generate information-rich abstracts. Initially, the dataset is pre-processed and then the major word-level and sentence-level features are extracted. Then, based on the similarities between the contents, the extractive summary is generated which is then given to the auto-encoder for final abstraction. The efficiency of the model has been proved through simulations with the CNN/Daily Mail dataset in terms of ROUGE metrics. The proposed model outperformed the other compared models with a score of 0.59 for ROUGE 1, 0.39 for ROUGE 2 and 0.71 for ROUGE L with high generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Belwal RC, Rai S, Gupta A (2020) A new graph-based extractive text summarization using keywords or topic modeling. J Amb Intell Human Comput:1–6

  2. Bidoki M, Moosavi MR, Fakhrahmad M (2020) A semantic approach to extractive multi-document summarization: applying sentence expansion for tuning of conceptual densities. Inf Process Manag 57(6):102341

    Article  Google Scholar 

  3. Cong Y, Liu J, Sun G, You Q, Li Y, Luo J (2016) Adaptive greedy dictionary selection for web media summarization. IEEE Trans Image Process 26(1):185–195

    Article  MathSciNet  Google Scholar 

  4. Gerani S, Carenini G, Ng RT (2019) Modeling content and structure for abstractive review summarization. Comput Speech Lang 53:302–331

    Article  Google Scholar 

  5. Joshi A, Fidalgo E, Alegre E, Fernandez-Robles L (2019) SummCoder: an unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Syst Appl 129:200–215

    Article  Google Scholar 

  6. Karthikeyan T, Sekaran K, Ranjith D, Balajee JM (2019) Personalized content extraction and text classification using effective web scraping techniques. Int J Web Portals (IJWP) 11(2):41–52

    Article  Google Scholar 

  7. Khatter H, Ahlawat AK (2020) An intelligent personalized web blog searching technique using fuzzy-based feedback recurrent neural network. Soft Comput 1-3.

  8. Khatter H, Kalra BM (2012) A new approach to blog information searching and curating." in 2012 CSI sixth international conference on software engineering (CONSEG), IEEE 1-6.

  9. Khatter H, Trivedi MC, Kalra BM (2015) An implementation of intelligent searching and curating technique on blog web 2.0 tool. Int J u-and e-Serv, Sci Technol Intro Citation 8(6):45–54

    Google Scholar 

  10. Kingma DP and Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.

  11. Kumar S, Bhatia KK (2020) Semantic similarity and text summarization based novelty detection. SN Appl Sci 2(3):332

    Article  Google Scholar 

  12. Lamsiyah S, El Mahdaouy A, Espinasse B, Ouatik SE (2020) An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings. Exp Syst Appl 167:114152

    Article  Google Scholar 

  13. Malhotra D, Rishi OP (2019) A comprehensive review from hyperlink to intelligent technologies based personalized search systems. J Manag Analytics 6(4):365–389

    Article  Google Scholar 

  14. Manjari KU, Rousha S, Sumanth D, Devi JS (2020) Extractive text summarization from web pages using selenium and TF-IDF algorithm. In2020 4th international conference on trends in electronics and informatics (ICOEI) (48184) IEEE 648-652.

  15. Mao X, Yang H, Huang S, Liu Y, Li R (2019) Extractive summarization using supervised and unsupervised learning. Expert Syst Appl 133:173–181

    Article  Google Scholar 

  16. Nallapati R, Zhou B, Gulcehre C and Xiang B (2016) Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.

  17. Nguyen MT, Tran VC, Nguyen XH, Nguyen LM (2019) Web document summarization by exploiting social context with matrix co-factorization. Inf Process Manag 56(3):495–515

    Article  MathSciNet  Google Scholar 

  18. Oliveira H, Ferreira R, Lima R, Lins RD, Freitas F, Riss M, Simske SJ (2016) Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization. Expert Syst Appl 65:68–86

    Article  Google Scholar 

  19. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532-1543.

  20. Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS (2018) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv (CSUR) 51(5):1–36

    Article  Google Scholar 

  21. Rani R, Lobiyal DK (2020) An extractive text summarization approach using tagged-LDA based topic modeling. Multimedia tools and applications 1-31.

  22. Roul RK (2020) Topic modeling combined with classification technique for extractive multi-document text summarization. Soft computing 1-5.

  23. Souza CM, Meireles MR, Almeida PE (2020) A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset. Scientometrics 1-22.Kit g.

  24. Trappey AJ, Trappey CV, Wu JL, Wang JW (2020) Intelligent compilation of patent summaries using machine learning and natural language processing techniques. Adv Eng Inform 43:101027

    Article  Google Scholar 

  25. Vázquez E, Arnulfo Garcia-Hernandez R, Ledeneva Y (2018) Sentence features relevance for extractive text summarization using genetic algorithms. J Intell Fuzzy Syst 35(1):353–365

    Article  Google Scholar 

  26. Verma P, Om H (2019) MCRMR: maximum coverage and relevancy with minimal redundancy based multi-document summarization. Expert Syst Appl 120:43–56

    Article  Google Scholar 

  27. Wang R, Luo S, Pan L, Wu Z, Yuan Y, Chen Q (2019) Microblog summarization using paragraph vector and semantic structure. Comput Speech Lang 57:1–9

    Article  Google Scholar 

  28. Wu L, Wang D, Zhang X, Liu S, Zhang L, Chen CW (2017) MLLDA: multi-level LDA for modelling users on content curation social networks. Neurocomputing 236:73–81

    Article  Google Scholar 

  29. You X (2019) Automatic summarization and keyword extraction from web page or text file. In2019 IEEE 2nd international conference on computer and communication engineering technology (CCET) IEEE 154-158.

  30. Zhao M, Yan S, Liu B, Zhong X, Hao Q, Chen H, Niu D, Long B, Guo W (2020) QBSUM: a large-scale query-based document summarization dataset from real-world applications. Comput Speech Lang 66:101166

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harsh Khatter.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khatter, H., Ahlawat, A.K. Content curation algorithm on blog posts using hybrid computing. Multimed Tools Appl 81, 7589–7609 (2022). https://doi.org/10.1007/s11042-022-12105-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12105-w

Keywords

Navigation