Skip to main content
Log in

Research on search results optimization technology with category features integration

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The optimization of search results has always been the research hot spot in the area of search engine. In previous work, various kinds of document ranking were used to optimize the search results, in which topic partition by clustering has been proved to be a good way. However, the clusters, containing a lot of documents unorganized, still directly limit the retrieval speed. To address this issue, the paper firstly integrates the two methods together to re-rank the documents in clusters. We find that the category features, which have great discernibility for categories, have good effects on the document sequencing. Thereupon we attempt to apply the category features into search results on the basis of the clusters. Related experiments show that our Top N results are more in line with the users’ needs and the retrieval speed can be implicitly improved, which proves that our approach significantly outperforms the original clustering method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Kuncheva LI (2010) Full-class set classification using the Hungarian algorithm. Int J Mach Learn Cybern 1:53–61

    Article  Google Scholar 

  2. Biggio B, Fumera G, Roli F (2010) Multiple classifier systems for robust classifier design in adversarial environments. Int J Mach Learn Cybern 1:27–41

    Article  Google Scholar 

  3. Luo W, Fang K, Zhu X (2010) The ranking algorithms of search engine. Hunan Agric Sci 7:137–140

    Google Scholar 

  4. Su C, Pan Y, Yuan J et al (2009) PageRank, HITS and impact factor for journal ranking. CSIE 285–290

  5. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632

    Article  MathSciNet  MATH  Google Scholar 

  6. Agosti M, Pretto L (2005) A theoretical study of a generalized version of Kleinberg’s HITS algorithm. Inform Retr 8:219–243

    Article  Google Scholar 

  7. Feng G, Liu T, Zhang X et al (2005) Level-based link analysis. APWC 183–194

  8. Zheng D, Yu F, Zhao T et al (2006) Documents ranking based on a hybrid language model for chinese information retrieval. Proceedings of IEEE Information Conference on Information Acquisition, pp 279–283

  9. Zhou D, Wade V (2009) Latent document re-ranking. Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp 1571–1580

  10. Yan J, Liu N, Chang Qing E et al (2009) Search result re-ranking based on gap between search queries and social tags. ACM 1197–1198

  11. Liu T et al (2008) Information retrieval system introduction. China Machine Press, Beijing, pp 195–204

    Google Scholar 

  12. Osiski S, Weiss D (2010) Carrot2 User and Developer Manual, Version 3.5.0, 2010-11-03, http://download.carrot2.org/head/manual/index.html

  13. Osiski S, Stefanowski J, Weiss D, Lingo (2004) Search results clustering algorithm based on singular value decomposition, advances in soft computing, intelligent information processing and web mining. Proceedings of the International IIS, pp 359–368

  14. Li Y, Zhou L, Cao W (2008) Joint Feature Selection Method of Document Frequency and CHI with Application to Web Pages Categorization. J Beijing Univ Technol 34(9):995–1000

    Google Scholar 

  15. Jing H, Wang B, Yang Y et al (2009) Category Distribution-Based Feature Selection Framework. J Comput Res Dev 46(9):1586–1593

    Google Scholar 

  16. Shi C, Xu C, Yang X (2009) Study of TFIDF algorithm. J Comput Appl 29:167–170

    Google Scholar 

  17. Li R Chinese Text Classification Corpus (Fu Dan) Test Corpus. http://www.nlp.org.cn/docs/download.php?doc_id=295

Download references

Acknowledgments

This work is supported by the national natural science foundation of China (No.61073130) and the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology (No.HIT.NSRIF.2009072).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanxia Qin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, Y., Zheng, D. & Zhao, T. Research on search results optimization technology with category features integration. Int. J. Mach. Learn. & Cyber. 3, 71–76 (2012). https://doi.org/10.1007/s13042-011-0037-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-011-0037-9

Keywords

Navigation