A novel approach for ranking web documents based on query-optimized personalized pagerank

Roul, Rajendra Kumar; Sahoo, Jajati Keshari

doi:10.1007/s41060-020-00232-2

A novel approach for ranking web documents based on query-optimized personalized pagerank

Regular Paper
Published: 18 August 2020

Volume 11, pages 37–55, (2021)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

545 Accesses
8 Citations
Explore all metrics

Abstract

Ranking plays an important role in the search process of web documents on a huge corpus. This not only reduces the searching time but also provides useful documents to the users. In this paper, we extend our earlier query-optimized PageRank approach by combining the TF-IDF and personalized PageRank algorithm to generate a robust ranking mechanism. In our earlier approach, we modeled a ranking scheme by considering the link structures of the documents along with their content. A novel feature selection technique named as ‘Term-term correlation-based feature selection’ (TCFS) is also proposed which removes all noise terms from the document before the ranking process starts. We believe that by incorporating TCFS and personalized PageRank of the documents along with their relevance will improve the retrieval results. The aim is to modify the link structure based on the similarity score between the content of the document and the user query. Experimental results show that the proposed feature selection technique can outperform the conventional feature selection techniques, and the performance of the combined TF-IDF and personalized PageRank approach is promising compared to the traditional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Article Open access 03 May 2022

Toward an intelligent tourism recommendation system based on artificial intelligence and IoT using Apriori algorithm

Article 20 October 2023

Notes

http://www.worldwidewebsize.com/
the query either having one top term or two top terms
http://snowball.tartarus.org/algorithms/porter/stemmer.html
the threshold is decided by the experiment
http://www.dmoz.org
http://www.dataminingresearch.com/index.php/2010/09/classic3-classic4-datasets/
http://www.daviddlewis.com/resources/testcollections/reuters21578/
http://qwone.com/~jason/20Newsgroups/
http://www.cs.cmu.edu/afs/cs/project/theo-20/www/data/
http://wiki.dbpedia.org/Datasets
decided experimentally
decided experimentally

References

Agichtein, E., Brill, E., Dumais, S.: Improving web search ranking by incorporating user behavior information. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 19–26 (2006)
Andersen, R., Borgs, C., Chayes, J., Hopcraft, J., Mirrokni, V.S., Teng, S.H.: Local computation of pagerank contributions. In: Algorithms and Models for the Web-Graph, Springer, pp 150–165 (2007)
Arun, K., Govindan, V., Kumar, S.M.: On integrating re-ranking and rank list fusion techniques for image retrieval. Int. J. Data Sci. Analytics 4(1), 53–81 (2017)
Article Google Scholar
Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 276–284 (2001)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Computers Geosci. 10(2), 191–203 (1984)
Article Google Scholar
Bougouin, A., Boudin, F., Daille, B.: Topicrank: Graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp 543–551 (2013)
Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., pp 43–52 (1998)
Chahal, P., Singh, M., Kumar, S.: An efficient web page ranking for semantic web. J. Inst. Eng. India Ser B 95(1), 15–21 (2014)
Article Google Scholar
Chen, L., Kulasiri, D., Samarasinghe, S.: A novel data-driven boolean model for genetic regulatory networks. Front. Physiol. 9, 1328 (2018)
Article Google Scholar
Chirita, P.A., Diederich, J., Nejdl, W.: Mailrank: Using ranking for spam detection. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, ACM, pp 373–380 (2005)
Collins, M.: Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 489–496 (2002)
Craswell, N., Hawking, D.: Overview of the trec-2002 web track. In: TREC, pp 78–92 (2002)
Dali, L., Fortuna, B., Duc, TT., Mladenić, D.: Query-independent learning to rank for rdf entity search. In: Extended Semantic Web Conference, Springer, pp 484–498 (2012)
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, ACM, pp 519–528 (2003)
Derhami, V., Khodadadian, E., Ghasemzadeh, M., Bidoki, A.M.Z.: Applying reinforcement learning for web pages ranking algorithms. Appl. Soft Comput. 13(4), 1686–1692 (2013)
Article Google Scholar
Diaconis, P., Graham, R.L.: Spearman’s footrule as a measure of disarray. J. R. Stat. Soc. Ser. B Methodological 39, 262–268 (1977)
MathSciNet MATH Google Scholar
Du, Y., Hai, Y.: Semantic ranking of web pages based on formal concept analysis. J. Syst. Softw. 86(1), 187–197 (2013)
Article Google Scholar
Ekstrand, M.D., Riedl, J.T., Konstan, J.A.: Collaborative filtering recommender systems. Found. Trends Human-Computer Interact. 4(2), 81–173 (2011)
Article Google Scholar
Fafalios, P., Kasturia, V., Nejdl, W.: Ranking archived documents for structured queries on semantic layers. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, ACM, pp. 155–164 (2018)
Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE Trans. Knowl. Data Eng. 27(6), 1629–1642 (2015)
Article Google Scholar
Gugnani, S., Roul, R.K.: Triple indexing: an efficient technique for fast phrase query evaluation. Int. J. Computer Appl. 87(13), 9–13 (2014)
Google Scholar
Gugnani, S., Bihany, T., Roul, R.K.: A complete survey on web document ranking. Int. J. Computer Appl. ICACEA 975, 8887 (2014)
Google Scholar
Guo, Z., Zhang, L., Zhang, D.: A completed modeling of local binary pattern operator for texture classification. IEEE Trans. Image Process. 19(6), 1657–1663 (2010)
Article MathSciNet Google Scholar
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., Zhao, L.: Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey. Multimedia Tools Appl. 78(11), 15169–15211 (2019)
Article Google Scholar
Khodaei, A., Shahabi, C., Li, C.: Skif-p: a point-based indexing and ranking of web documents for spatial-keyword search. Geoinformatica 16(3), 563–596 (2012)
Article Google Scholar
Kwak, N., Choi, C.H.: Input feature selection by mutual information based on parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1667–1671 (2002)
Article Google Scholar
Langville, A.N., Meyer, C.D.: Deeper inside pagerank. Internet Math. 1(3), 335–380 (2004)
Article MathSciNet Google Scholar
Liu, T.Y., et al.: Learning to rank for information retrieval. Found. Trends® Inf. Retr. 3(3), 225–331 (2009)
Article Google Scholar
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 299–306 (2009)
Meymandpour, R., Davis, J.G.: A semantic similarity measure for linked data: an information content-based approach. Knowl.-Based Syst. 109, 276–293 (2016)
Article Google Scholar
Mirzal, A.: Clustering and latent semantic indexing aspects of the singular value decomposition. Int. J. Inf. Decision Sci. 8(1), 53–72 (2016)
Google Scholar
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 115–124 (2005)
Pang, L., Lan, Y., Guo, J., Xu, J., Xu, J., Cheng, X.: Deeprank: a new deep architecture for relevance ranking in information retrieval. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, ACM, pp. 257–266 (2017)
Pasquinelli, M.: Google’s pagerank algorithm: a diagram of cognitive capitalism and the rentier of the common intellect. Deep Search: The Politics of Search Beyond Google pp. 152–163 (2009)
Pon, R.K., Cardenas, A.F., Buttler, D., Critchlow, T.: Tracking multiple topics for finding interesting articles. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 560–569 (2007)
Qin, T., Liu, T.Y., Zhang, X.D., Wang, D.S., Xiong, W.Y., Li, H.: Learning to rank relational objects and its application to web search. In: Proceedings of the 17th International Conference on World Wide Web, ACM, pp. 407–416 (2008)
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Springer, New York, pp. 232–241 (1994)
Roul, R.K.: Detecting spam web pages using multilayer extreme learning machine. Int. J. Big Data Intell. 5(1–2), 49–61 (2018a)
Article Google Scholar
Roul, R.K.: An effective approach for semantic-based clustering and topic-based ranking of web documents. Int. J. Data Sci. Analytics 5(4), 269–284 (2018b)
Article Google Scholar
Roul, R.K., Arora, K.: A nifty review to text summarization-based recommendation system for electronic products. Soft. Comput. 23(24), 13183–13204 (2019)
Article Google Scholar
Roul, R.K., Rai, P.: A new feature selection technique combined with elm feature space for text classification. In: Proceedings of the 13th International Conference on Natural Language Processing, pp. 285–292 (2016)
Roul, R.K., Sahoo, J.K.: Query-optimized pagerank: a novel approach. In: Advances in Intelligent Systems and Computing 711, Springer, pp. 673–683 (2017)
Roul, R.K., Sahoo, J.K.: Sentiment analysis and extractive summarization based recommendation system. In: Computational Intelligence in Data Mining, Springer, pp. 473–487 (2020)
Roul, R.K., Gugnani, S., Kalpeshbhai, S.M.: Clustering based feature selection using extreme learning machines for text classification. In: 2015 Annual IEEE India Conference (INDICON), IEEE, pp. 1–6 (2015)
Roul, R.K., Asthana, S.R., Kumar, G.: Spam web page detection using combined content and link features. Int. J. Data Min. Modell. Manag. 8(3), 209–222 (2016a)
Google Scholar
Roul, R.K., Bhalla, A., Srivastava, A.: Commonality-rarity score computation: a novel feature selection technique using extended feature space of elm for text classification. In: Proceedings of the 8th Annual Meeting of the Forum on Information Retrieval Evaluation, pp. 37–41 (2016b)
Roul, R.K., Asthana, S.R., Kumar, G.: Study on suitability and importance of multilayer extreme learning machine for classification of text data. Soft Comput. 21, 4239 (2017a)
Article Google Scholar
Roul, R.K., Sahoo, J.K., Goel, R.: Deep learning in the domain of multi-document text summarization. PReMI, LNCS 10597, 575–581 (2017b)
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced topic-based vector space model for semantics-aware spam filtering. Expert Syst. Appl. 39(1), 437–444 (2012)
Article Google Scholar
Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A novel feature selection algorithm for text categorization. Expert Syst. Appl. 33(1), 1–5 (2007)
Article Google Scholar
Song, Y., Pan, S., Liu, S., Zhou, M.X., Qian, W.: Topic and keyword re-ranking for LDA-based topic modeling. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, ACM, pp. 1757–1760 (2009)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)
Article Google Scholar
Spink, A., Wolfram, D., Jansen, M.B., Saracevic, T.: Searching the web: the public and their queries. J. Am. Soc. Inform. Sci. Technol. 52(3), 226–234 (2001)
Article Google Scholar
Tao, T., Zhai, C.: Regularized estimation of mixture models for robust pseudo-relevance feedback. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 162–169 (2006)
Vuurens, J.B., de Vries, A.P.: Distance matters! cumulative proximity expansions for ranking documents. Inf. Retr. 17(4), 380–406 (2014)
Article Google Scholar
Wang, Y., Lu, J., Chen, J., Li, Y.: Crawling ranked deep web data sources. World Wide Web 20(1), 89–110 (2017)
Article Google Scholar
Xu, J., Cao, Y., Li, H., Zhao, M.: Ranking definitions with supervised learning methods. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, ACM, pp. 811–819 (2005)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. ICML 97, 412–420 (1997)
Google Scholar
Yulianti, E., Chen, R.C., Scholer, F., Croft, W.B., Sanderson, M.: Ranking documents by answer-passage quality. In: Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 335–344 (2018)
Zhai, C., Cohen, W.W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: ACM SIGIR Forum, ACM vol. 49, pp. 2–9 (2015)
Zhao, J., Yun, Y.: A proximity language model for information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 291–298 (2009)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, 147004, India
Rajendra Kumar Roul
Department of Mathematics, BITS,Pilani-K.K.Birla Goa Campus, Goa, 403726, India
Jajati Keshari Sahoo

Authors

Rajendra Kumar Roul
View author publications
You can also search for this author in PubMed Google Scholar
Jajati Keshari Sahoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajendra Kumar Roul.

Ethics declarations

Conflict of interest

The corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roul, R.K., Sahoo, J.K. A novel approach for ranking web documents based on query-optimized personalized pagerank. Int J Data Sci Anal 11, 37–55 (2021). https://doi.org/10.1007/s41060-020-00232-2

Download citation

Received: 20 March 2018
Accepted: 01 August 2020
Published: 18 August 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s41060-020-00232-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel approach for ranking web documents based on query-optimized personalized pagerank

Abstract

Access this article

Similar content being viewed by others

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Toward an intelligent tourism recommendation system based on artificial intelligence and IoT using Apriori algorithm

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel approach for ranking web documents based on query-optimized personalized pagerank

Abstract

Access this article

Similar content being viewed by others

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Toward an intelligent tourism recommendation system based on artificial intelligence and IoT using Apriori algorithm

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation