Skip to main content

Advertisement

Log in

A novel time-shifting method to find popular blog post topics

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Blog search engines and general search engines automatically crawl Web pages from the Internet and generate search results to users. One difference between the two is that blog search engines focus on blog posts and filter general pages. This difference allows bloggers to focus only on page results for posts rather than other types of page results. Another difference is that posts involve more time-related issues than general pages. For general pages, the general search engine can only show the last update time for the page. However, for posts, the blog search engine can show all possible time clues for the post. For some frequently updated posts, time clues help bloggers find information more efficiently. In this paper, we first use some well-known semantic analysis models to analyze the performance of Google Blog Search. Next, we apply a hybrid strategy that considers the document link and time clue relationships between posts to further improve its retrieval performance. Finally, we present several experiments to simulate various possible scenarios to confirm the effectiveness of our strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Today, Google has strictly disabled the Google Blog Search homepage on google.com/blogsearch and redirected it to the Google homepage. Now, if we want to filter content based on blog posts, we can do this by visiting Google News, clicking on the search tool and selecting the "All News" drop-down menu and clicking on "Blog".

References

  • Becchi M, Crowley P (2008) Extending finite automata to efficiently match perl-compatible regular expressions. In: Proceedings of the 2008 ACM CoNEXT conference. ACM, Madrid, p 25

    Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(1):993–1022

    MATH  Google Scholar 

  • Bolelli L, Ertekin Ş, Giles CL (2009) Topic and trend detection in text collections using latent dirichlet allocation. In: Proceedings of the 31th european conference on IR research on advances in information retrieval. Springer Press, Toulouse, France, pp 776–780

    Google Scholar 

  • Brahmane AV, Amune A (2014) A survey of dynamic distributed network intrusion detection using online adaboost-based parameterized methods. Int J Innov Res Adv Eng 1(9):256–262

    Google Scholar 

  • Chen L-C (2012) Building a term suggestion and ranking system based on a probabilistic analysis model and a semantic analysis graph. Decis Support Syst 53(1):257–266

    Google Scholar 

  • Chen L-C (2017) An effective LDA-based time topic model to Improve blog search performance. Inf Process Manage 53:1299–1319

    Google Scholar 

  • Chen L-C (2018) A novel page clipping search engine based on page discussion topics. Knowl Inf Syst. https://doi.org/10.1007/s10115-018-1173-2

    Article  Google Scholar 

  • Cilibrasi RL, Vit’anyi PMB (2007) The Google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383

    Google Scholar 

  • Cosma G, Joy M (2012) An approach to source-code plagiarism detection and investigation using latent semantic analysis. IEEE Trans Comput 61(3):379–394

    MathSciNet  MATH  Google Scholar 

  • Dumais ST (2004) Latent semantic analysis. Ann Rev Inf Sci Technol 38(1):189–230

    Google Scholar 

  • Fernandez-Beltran R, Pla F (2015) Incremental probabilistic latent semantic analysis for video retrieval. Image Vis Comput 38(C):1–12

    Google Scholar 

  • Fox C (1989) A stop list for general text. ACM SIGIR Forum 24(1–2):19–35

    Google Scholar 

  • Fu R, Qin B, Liu T (2015) Open-categorical text classification based on multi-LDA models. Soft Comput 19(1):29–38

    Google Scholar 

  • Fujimura K, Toda H, Inoue T, Hiroshima N, Kataoka R, Sugizaki M (2006) BLOGRANGER—a multi-faceted blog search engine. In: Proceedings of the WWW 2006 workshop on the weblogging ecosystem: aggregation, analysis and dynamics, Edinburgh, W3C, pp 22–26

  • Gomaa WH, Fahmy AA (2013) A survey of text similarity approaches. Int J Comput Appl 68(13):13–18

    Google Scholar 

  • Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1):177–196

    MATH  Google Scholar 

  • Hofmann T (2003) Collaborative filtering via Gaussian probabilistic latent semantic analysis. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 259–266

  • Hofmann T (2004) Latent semantic models for collaborative filtering. ACM Trans Inf Syst 22(1):89–115

    Google Scholar 

  • Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220

    MathSciNet  MATH  Google Scholar 

  • Hsieh J-W, Chen L-C, Chen S-Y, Chen D-Y, Alghyaline S, Chiang H-F (2015) Vehicle color classification under different lighting conditions through color correction. IEEE Sens J 15(2):971–983

    Google Scholar 

  • Jeong O-R, Oh J (2012) Social community based blog search framework. In: Proceedings of the 17th international conference on database systems for advanced applications, vol 2012. Springer, Busan, pp 130–141

    Google Scholar 

  • Ji Z, Jing P, Wang J, Su Y (2012) Scene image classification with biased spatial block and PLSA. Int J Dig Content Technol Appl 6(1):398–404

    Google Scholar 

  • Keikha M, Crestani F, Carman MJ (2013) Searching blog sites with product reviews. In: Proceedings of the 15th international conference on human interface and the management of information: information and interaction for learning, culture, collaboration and business—volume part III. Springer, Las Vegas, pp 495–500

    Google Scholar 

  • Kim J, Yun U, Pyun G, Ryang H, Lee G, Yoon E, Ryu KH (2015) A blog ranking algorithm using analysis of both blog influence and characteristics of blog posts. Cluster Comput 18(1):157–164

    Google Scholar 

  • Klein R, Kyrilov A, Tokman M (2011) Automated assessment of short free-text responses in computer science using latent semantic analysis. In: Proceedings of the 16th annual joint conference on innovation and technology in computer science education, pp 158–162

  • Krestel R, Fankhauser P, Nejdl W (2009) Latent Dirichlet allocation for tag recommendation. In: Proceedings of the 3rd ACM conference on recommender systems, 22nd–25th October 2009. ACM, New York, pp 61–68

    Google Scholar 

  • Kuo F-F, Shan M-K, Lee S-Y (2013) Background music recommendation for video based on multimodal latent semantic analysis. In: Proceedings of the 2013 IEEE international conference on multimedia and expo. IEEE, San Jose, pp 1–6

    Google Scholar 

  • Landauer TK, Dumais ST (1997) A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240

    Google Scholar 

  • Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25(1):259–284

    Google Scholar 

  • Landauer TK, McNamara DS, Dennis S, Kintsch W (2013) Handbook of latent semantic analysis. Psychology Press, London

    Google Scholar 

  • Lemaire B, Denhiere G (2004) Incremental construction of an associative network from a corpus. In: Proceedings of the 26th annual meeting of the cognitive science society, pp 825–830

  • Li M, Li WK, Li G (2013) On mixture memory Garch models. J Time Ser Anal 34(6):606–624

    MathSciNet  MATH  Google Scholar 

  • Liénou M, Maître H, Datcu M (2010) Semantic annotation of satellite images using latent Dirichlet allocation. IEEE Geosci Remote Sens Lett 7(1):28–32

    Google Scholar 

  • Lin D, Li S, Cao D (2010) Making intelligent business decisions by mining the implicit relation from bloggers’ posts. Soft Comput 14(12):1317–1327

    Google Scholar 

  • Lindsey R, Veksler VD, Grintsvayg A, Gray WD (2007) Be wary of what your computer reads: the effects of corpus selection on measuring semantic relatedness. In: Proceedings of the 8th international conference on cognitive modeling. Taylor & Francis Press, Ann Arbor, pp 279–284

    Google Scholar 

  • Lintean M, Moldovan C, Rus V, McNamara D (2010) The role of local and global weighting in assessing the semantic similarity of texts using latent semantic analysis. In: Proceedings of the 23th international florida artificial intelligence research society conference. AAAI Press, Marco Island, pp 235–240

    Google Scholar 

  • Liu Z, Zhang Y, Chang EY, Sun M (2011) PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans Intell Syst Technol 2(3):26:21–26:18

    Google Scholar 

  • Logan B, Kositsky A, Moreno P (2004) Semantic analysis of song lyrics. In: Proceedings of the 2004 IEEE international conference on multimedia and expo. IEEE, Taipei, pp 827–830

    Google Scholar 

  • Luh C-J, Yang S-A, Huang DT-L (2012) Estimating search engine ranking function with latent semantic analysis and a genetic algorithm. In: Proceedings of the 2012 3rd international conference on e-business and e-government—volume 04, pp 439–442

  • Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent Dirichlet allocation. In: Proceedings of the 2008 15th working conference on reverse engineering. IEEE, Antwerp, pp 155–164

    Google Scholar 

  • Matveeva I, Levow G-A, Farahat A, Royer C (2005) Term representation with generalized latent semantic analysis. In: Proceedings of the international conference on recent advances in natural language processing (RANLP-05)

  • McInerney J, Rogers A, Jennings NR (2012) Improving location prediction services for new users with probabilistic latent semantic analysis. In: Proceedings of the 2012 ACM conference on ubiquitous computing. ACM, Pittsburgh, pp 906–910

    Google Scholar 

  • Mesaros A, Heittola T, Klapuri A (2011) Latent semantic analysis in sound event detection. In: Proceedings of the 19th european signal processing conference, Barcelona, Spain, August 29–September 2. EURASIP, pp 1307–1311

  • Mishne G, Rijke Md (2006) A study of blog search. Lect Notes Comput Sci 3936(1):289–301

    Google Scholar 

  • Nguyen HV, Bai L (2011) Cosine similarity metric learning for face verification. Lect Notes Comput Sci 6493(2011):709–720

    Google Scholar 

  • Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417

    MathSciNet  Google Scholar 

  • Pingdom (2015) 2015 The Web Shown in Numbers! https://goo.gl/mW77a3. Accessed 24 Nov 2018

  • Porter MF (2018) Snowball: a language for stemming algorithms. http://snowball.tartarus.org/. Accessed 24 Nov 2018

  • Prayiush (2012) Number of Blogs up from 35 Million in 2006 to 181 Million by the End of 2011. https://goo.gl/8WLlTs. Accessed 24 Nov 2018

  • Shi C, Quan J, Li M (2013) Information extraction for computer science academic rankings system. In: Proceedings of the 2013 international conference on cloud and service computing. IEEE, Beijing, pp 69–76

    Google Scholar 

  • Siddiqui A, Mishra N, Verma JS (2015) A survey on automatic image annotation and retrieval. Int J Comput Appl 118(20):27–32

    Google Scholar 

  • Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using latent Dirichlet allocation. In: Proceedings of the 5th India software engineering conference, 22–25, 2012. ACM, Kanpur, pp 125–130

    Google Scholar 

  • Speh J, Muhic A, Rupnik J (2013) Parameter estimation for the latent Dirichlet allocation. In: Proceedings of the 2013 conference on data mining and data warehouses. Information Society, Ljubljana, pp 1–4

    Google Scholar 

  • Takama Y, Kajinami T, Matsumura A (2005) Blog search with keyword map-based relevance feedback. In: Proceedings of the 2nd international conference on fuzzy systems and knowledge discovery—volume part II, vol 2005. Springer, Changsha, pp 1208–1215

    Google Scholar 

  • Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Boston

    Google Scholar 

  • Thelwall M, Hasler L (2007) Blog search engines. Online Inf Rev 31(4):467–479

    Google Scholar 

  • Tsai FS (2011) A tag-topic model for blog mining. Expert Syst Appl 38(5):5330–5335

    Google Scholar 

  • Wang C, Blei DM (2013) Variational inference in nonconjugate models. J Mach Learn Res 14(1):1005–1031

    MathSciNet  MATH  Google Scholar 

  • Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Philadelphia, PA, USA, pp 424–433

    Google Scholar 

  • Wang Y, Agichtein E, Benzi M (2012) TM-LDA: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Beijing, China, pp 123–131

    Google Scholar 

  • Wang H, Jiang Y, Jiang X, Wu J, Yang X (2018) Automatic vessel segmentation on fundus images using vessel filtering and fuzzy entropy. Soft Comput 22(5):1501–1509

    Google Scholar 

  • Wyner A, Engers T (2010) A framework for enriched, controlled on-line discussion forums for e-government policy-making. In: Proceedings of ongoing research and projects of IFIP eGOV and ePart 2010. Trauner Druck, Linz, pp 357–366

    Google Scholar 

  • Xu C, Zhang Y-F, Zhu G, Rui Y, Lu H, Huang Q (2008) Using webcast text for semantic event detection in broadcast sports video. IEEE Trans Multimed 10(7):1342–1355

    Google Scholar 

  • Yeh J-Y, Keb H-R, Yang W-P, Meng I-H (2005) Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manage 41(1):75–95

    Google Scholar 

  • Zhai J, Zhang S, Zhang M, Liu X (2018) Fuzzy integral-based ELM ensemble for imbalanced big data classification. Soft Comput 22(11):3519–3531

    Google Scholar 

  • Zhao WX, Jiang J, Weng J, He J, Lim E-P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. In: Proceedings of the 33rd european conference on advances in information retrieval. Springer Press, Dublin, Ireland, pp 338–349

    Google Scholar 

  • Zhu L, Sun A, Choi B (2008) Online spam-blog detection through blog search. In: Proceedings of the 17th ACM conference on information and knowledge management, pp 1347–1348

Download references

Funding

This study was supported by Ministry of Science and Technology of Taiwan (Grant Nos. 108-2410-H-259-048-MY3, 107-2410-H-259-016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin-Chih Chen.

Ethics declarations

Conflict of interest

All authors declare that he/she has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The experimental system of this study is available at http://hlcs.sytes.net/llta.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 5048 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, LC., Chen, DR. & Lai, MF. A novel time-shifting method to find popular blog post topics. Soft Comput 24, 9705–9725 (2020). https://doi.org/10.1007/s00500-019-04485-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04485-3

Keywords