Abstract
The optimization of search results has always been the research hot spot in the area of search engine. In previous work, various kinds of document ranking were used to optimize the search results, in which topic partition by clustering has been proved to be a good way. However, the clusters, containing a lot of documents unorganized, still directly limit the retrieval speed. To address this issue, the paper firstly integrates the two methods together to re-rank the documents in clusters. We find that the category features, which have great discernibility for categories, have good effects on the document sequencing. Thereupon we attempt to apply the category features into search results on the basis of the clusters. Related experiments show that our Top N results are more in line with the users’ needs and the retrieval speed can be implicitly improved, which proves that our approach significantly outperforms the original clustering method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Kuncheva LI (2010) Full-class set classification using the Hungarian algorithm. Int J Mach Learn Cybern 1:53–61
Biggio B, Fumera G, Roli F (2010) Multiple classifier systems for robust classifier design in adversarial environments. Int J Mach Learn Cybern 1:27–41
Luo W, Fang K, Zhu X (2010) The ranking algorithms of search engine. Hunan Agric Sci 7:137–140
Su C, Pan Y, Yuan J et al (2009) PageRank, HITS and impact factor for journal ranking. CSIE 285–290
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632
Agosti M, Pretto L (2005) A theoretical study of a generalized version of Kleinberg’s HITS algorithm. Inform Retr 8:219–243
Feng G, Liu T, Zhang X et al (2005) Level-based link analysis. APWC 183–194
Zheng D, Yu F, Zhao T et al (2006) Documents ranking based on a hybrid language model for chinese information retrieval. Proceedings of IEEE Information Conference on Information Acquisition, pp 279–283
Zhou D, Wade V (2009) Latent document re-ranking. Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp 1571–1580
Yan J, Liu N, Chang Qing E et al (2009) Search result re-ranking based on gap between search queries and social tags. ACM 1197–1198
Liu T et al (2008) Information retrieval system introduction. China Machine Press, Beijing, pp 195–204
Osiski S, Weiss D (2010) Carrot2 User and Developer Manual, Version 3.5.0, 2010-11-03, http://download.carrot2.org/head/manual/index.html
Osiski S, Stefanowski J, Weiss D, Lingo (2004) Search results clustering algorithm based on singular value decomposition, advances in soft computing, intelligent information processing and web mining. Proceedings of the International IIS, pp 359–368
Li Y, Zhou L, Cao W (2008) Joint Feature Selection Method of Document Frequency and CHI with Application to Web Pages Categorization. J Beijing Univ Technol 34(9):995–1000
Jing H, Wang B, Yang Y et al (2009) Category Distribution-Based Feature Selection Framework. J Comput Res Dev 46(9):1586–1593
Shi C, Xu C, Yang X (2009) Study of TFIDF algorithm. J Comput Appl 29:167–170
Li R Chinese Text Classification Corpus (Fu Dan) Test Corpus. http://www.nlp.org.cn/docs/download.php?doc_id=295
Acknowledgments
This work is supported by the national natural science foundation of China (No.61073130) and the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology (No.HIT.NSRIF.2009072).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qin, Y., Zheng, D. & Zhao, T. Research on search results optimization technology with category features integration. Int. J. Mach. Learn. & Cyber. 3, 71–76 (2012). https://doi.org/10.1007/s13042-011-0037-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-011-0037-9