Abstract
Blog search engines and general search engines automatically crawl Web pages from the Internet and generate search results to users. One difference between the two is that blog search engines focus on blog posts and filter general pages. This difference allows bloggers to focus only on page results for posts rather than other types of page results. Another difference is that posts involve more time-related issues than general pages. For general pages, the general search engine can only show the last update time for the page. However, for posts, the blog search engine can show all possible time clues for the post. For some frequently updated posts, time clues help bloggers find information more efficiently. In this paper, we first use some well-known semantic analysis models to analyze the performance of Google Blog Search. Next, we apply a hybrid strategy that considers the document link and time clue relationships between posts to further improve its retrieval performance. Finally, we present several experiments to simulate various possible scenarios to confirm the effectiveness of our strategy.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Today, Google has strictly disabled the Google Blog Search homepage on google.com/blogsearch and redirected it to the Google homepage. Now, if we want to filter content based on blog posts, we can do this by visiting Google News, clicking on the search tool and selecting the "All News" drop-down menu and clicking on "Blog".
References
Becchi M, Crowley P (2008) Extending finite automata to efficiently match perl-compatible regular expressions. In: Proceedings of the 2008 ACM CoNEXT conference. ACM, Madrid, p 25
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(1):993–1022
Bolelli L, Ertekin Ş, Giles CL (2009) Topic and trend detection in text collections using latent dirichlet allocation. In: Proceedings of the 31th european conference on IR research on advances in information retrieval. Springer Press, Toulouse, France, pp 776–780
Brahmane AV, Amune A (2014) A survey of dynamic distributed network intrusion detection using online adaboost-based parameterized methods. Int J Innov Res Adv Eng 1(9):256–262
Chen L-C (2012) Building a term suggestion and ranking system based on a probabilistic analysis model and a semantic analysis graph. Decis Support Syst 53(1):257–266
Chen L-C (2017) An effective LDA-based time topic model to Improve blog search performance. Inf Process Manage 53:1299–1319
Chen L-C (2018) A novel page clipping search engine based on page discussion topics. Knowl Inf Syst. https://doi.org/10.1007/s10115-018-1173-2
Cilibrasi RL, Vit’anyi PMB (2007) The Google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383
Cosma G, Joy M (2012) An approach to source-code plagiarism detection and investigation using latent semantic analysis. IEEE Trans Comput 61(3):379–394
Dumais ST (2004) Latent semantic analysis. Ann Rev Inf Sci Technol 38(1):189–230
Fernandez-Beltran R, Pla F (2015) Incremental probabilistic latent semantic analysis for video retrieval. Image Vis Comput 38(C):1–12
Fox C (1989) A stop list for general text. ACM SIGIR Forum 24(1–2):19–35
Fu R, Qin B, Liu T (2015) Open-categorical text classification based on multi-LDA models. Soft Comput 19(1):29–38
Fujimura K, Toda H, Inoue T, Hiroshima N, Kataoka R, Sugizaki M (2006) BLOGRANGER—a multi-faceted blog search engine. In: Proceedings of the WWW 2006 workshop on the weblogging ecosystem: aggregation, analysis and dynamics, Edinburgh, W3C, pp 22–26
Gomaa WH, Fahmy AA (2013) A survey of text similarity approaches. Int J Comput Appl 68(13):13–18
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1):177–196
Hofmann T (2003) Collaborative filtering via Gaussian probabilistic latent semantic analysis. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 259–266
Hofmann T (2004) Latent semantic models for collaborative filtering. ACM Trans Inf Syst 22(1):89–115
Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220
Hsieh J-W, Chen L-C, Chen S-Y, Chen D-Y, Alghyaline S, Chiang H-F (2015) Vehicle color classification under different lighting conditions through color correction. IEEE Sens J 15(2):971–983
Jeong O-R, Oh J (2012) Social community based blog search framework. In: Proceedings of the 17th international conference on database systems for advanced applications, vol 2012. Springer, Busan, pp 130–141
Ji Z, Jing P, Wang J, Su Y (2012) Scene image classification with biased spatial block and PLSA. Int J Dig Content Technol Appl 6(1):398–404
Keikha M, Crestani F, Carman MJ (2013) Searching blog sites with product reviews. In: Proceedings of the 15th international conference on human interface and the management of information: information and interaction for learning, culture, collaboration and business—volume part III. Springer, Las Vegas, pp 495–500
Kim J, Yun U, Pyun G, Ryang H, Lee G, Yoon E, Ryu KH (2015) A blog ranking algorithm using analysis of both blog influence and characteristics of blog posts. Cluster Comput 18(1):157–164
Klein R, Kyrilov A, Tokman M (2011) Automated assessment of short free-text responses in computer science using latent semantic analysis. In: Proceedings of the 16th annual joint conference on innovation and technology in computer science education, pp 158–162
Krestel R, Fankhauser P, Nejdl W (2009) Latent Dirichlet allocation for tag recommendation. In: Proceedings of the 3rd ACM conference on recommender systems, 22nd–25th October 2009. ACM, New York, pp 61–68
Kuo F-F, Shan M-K, Lee S-Y (2013) Background music recommendation for video based on multimodal latent semantic analysis. In: Proceedings of the 2013 IEEE international conference on multimedia and expo. IEEE, San Jose, pp 1–6
Landauer TK, Dumais ST (1997) A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240
Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25(1):259–284
Landauer TK, McNamara DS, Dennis S, Kintsch W (2013) Handbook of latent semantic analysis. Psychology Press, London
Lemaire B, Denhiere G (2004) Incremental construction of an associative network from a corpus. In: Proceedings of the 26th annual meeting of the cognitive science society, pp 825–830
Li M, Li WK, Li G (2013) On mixture memory Garch models. J Time Ser Anal 34(6):606–624
Liénou M, Maître H, Datcu M (2010) Semantic annotation of satellite images using latent Dirichlet allocation. IEEE Geosci Remote Sens Lett 7(1):28–32
Lin D, Li S, Cao D (2010) Making intelligent business decisions by mining the implicit relation from bloggers’ posts. Soft Comput 14(12):1317–1327
Lindsey R, Veksler VD, Grintsvayg A, Gray WD (2007) Be wary of what your computer reads: the effects of corpus selection on measuring semantic relatedness. In: Proceedings of the 8th international conference on cognitive modeling. Taylor & Francis Press, Ann Arbor, pp 279–284
Lintean M, Moldovan C, Rus V, McNamara D (2010) The role of local and global weighting in assessing the semantic similarity of texts using latent semantic analysis. In: Proceedings of the 23th international florida artificial intelligence research society conference. AAAI Press, Marco Island, pp 235–240
Liu Z, Zhang Y, Chang EY, Sun M (2011) PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans Intell Syst Technol 2(3):26:21–26:18
Logan B, Kositsky A, Moreno P (2004) Semantic analysis of song lyrics. In: Proceedings of the 2004 IEEE international conference on multimedia and expo. IEEE, Taipei, pp 827–830
Luh C-J, Yang S-A, Huang DT-L (2012) Estimating search engine ranking function with latent semantic analysis and a genetic algorithm. In: Proceedings of the 2012 3rd international conference on e-business and e-government—volume 04, pp 439–442
Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent Dirichlet allocation. In: Proceedings of the 2008 15th working conference on reverse engineering. IEEE, Antwerp, pp 155–164
Matveeva I, Levow G-A, Farahat A, Royer C (2005) Term representation with generalized latent semantic analysis. In: Proceedings of the international conference on recent advances in natural language processing (RANLP-05)
McInerney J, Rogers A, Jennings NR (2012) Improving location prediction services for new users with probabilistic latent semantic analysis. In: Proceedings of the 2012 ACM conference on ubiquitous computing. ACM, Pittsburgh, pp 906–910
Mesaros A, Heittola T, Klapuri A (2011) Latent semantic analysis in sound event detection. In: Proceedings of the 19th european signal processing conference, Barcelona, Spain, August 29–September 2. EURASIP, pp 1307–1311
Mishne G, Rijke Md (2006) A study of blog search. Lect Notes Comput Sci 3936(1):289–301
Nguyen HV, Bai L (2011) Cosine similarity metric learning for face verification. Lect Notes Comput Sci 6493(2011):709–720
Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417
Pingdom (2015) 2015 The Web Shown in Numbers! https://goo.gl/mW77a3. Accessed 24 Nov 2018
Porter MF (2018) Snowball: a language for stemming algorithms. http://snowball.tartarus.org/. Accessed 24 Nov 2018
Prayiush (2012) Number of Blogs up from 35 Million in 2006 to 181 Million by the End of 2011. https://goo.gl/8WLlTs. Accessed 24 Nov 2018
Shi C, Quan J, Li M (2013) Information extraction for computer science academic rankings system. In: Proceedings of the 2013 international conference on cloud and service computing. IEEE, Beijing, pp 69–76
Siddiqui A, Mishra N, Verma JS (2015) A survey on automatic image annotation and retrieval. Int J Comput Appl 118(20):27–32
Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using latent Dirichlet allocation. In: Proceedings of the 5th India software engineering conference, 22–25, 2012. ACM, Kanpur, pp 125–130
Speh J, Muhic A, Rupnik J (2013) Parameter estimation for the latent Dirichlet allocation. In: Proceedings of the 2013 conference on data mining and data warehouses. Information Society, Ljubljana, pp 1–4
Takama Y, Kajinami T, Matsumura A (2005) Blog search with keyword map-based relevance feedback. In: Proceedings of the 2nd international conference on fuzzy systems and knowledge discovery—volume part II, vol 2005. Springer, Changsha, pp 1208–1215
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Boston
Thelwall M, Hasler L (2007) Blog search engines. Online Inf Rev 31(4):467–479
Tsai FS (2011) A tag-topic model for blog mining. Expert Syst Appl 38(5):5330–5335
Wang C, Blei DM (2013) Variational inference in nonconjugate models. J Mach Learn Res 14(1):1005–1031
Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Philadelphia, PA, USA, pp 424–433
Wang Y, Agichtein E, Benzi M (2012) TM-LDA: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Beijing, China, pp 123–131
Wang H, Jiang Y, Jiang X, Wu J, Yang X (2018) Automatic vessel segmentation on fundus images using vessel filtering and fuzzy entropy. Soft Comput 22(5):1501–1509
Wyner A, Engers T (2010) A framework for enriched, controlled on-line discussion forums for e-government policy-making. In: Proceedings of ongoing research and projects of IFIP eGOV and ePart 2010. Trauner Druck, Linz, pp 357–366
Xu C, Zhang Y-F, Zhu G, Rui Y, Lu H, Huang Q (2008) Using webcast text for semantic event detection in broadcast sports video. IEEE Trans Multimed 10(7):1342–1355
Yeh J-Y, Keb H-R, Yang W-P, Meng I-H (2005) Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manage 41(1):75–95
Zhai J, Zhang S, Zhang M, Liu X (2018) Fuzzy integral-based ELM ensemble for imbalanced big data classification. Soft Comput 22(11):3519–3531
Zhao WX, Jiang J, Weng J, He J, Lim E-P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. In: Proceedings of the 33rd european conference on advances in information retrieval. Springer Press, Dublin, Ireland, pp 338–349
Zhu L, Sun A, Choi B (2008) Online spam-blog detection through blog search. In: Proceedings of the 17th ACM conference on information and knowledge management, pp 1347–1348
Funding
This study was supported by Ministry of Science and Technology of Taiwan (Grant Nos. 108-2410-H-259-048-MY3, 107-2410-H-259-016).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that he/she has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The experimental system of this study is available at http://hlcs.sytes.net/llta.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chen, LC., Chen, DR. & Lai, MF. A novel time-shifting method to find popular blog post topics. Soft Comput 24, 9705–9725 (2020). https://doi.org/10.1007/s00500-019-04485-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04485-3