Skip to main content

Exploiting Linguistic Analysis on URLs for Recommending Web Pages: A Comparative Study

  • Chapter
  • First Online:
Transactions on Computational Collective Intelligence XXVI

Abstract

Nowadays, citizens require high level quality information from public institutions in order to guarantee their transparency. Institutional websites of governmental and public bodies must publish and keep updated a large amount of information stored in thousands of web pages in order to satisfy the demands of their users. Due to the amount of information, the “search form”, which is typically available in most such websites, is proven limited to support the users, since it requires them to explicitly express their information needs through keywords. The sites are also affected by the so-called “long tail” phenomenon, a phenomenon that is typically observed in e-commerce portals. The phenomenon is the one in which not all the pages are considered highly important and as a consequence, users searching for information located in pages that are not condiered important are having a hard time locating these pages.

The development of a recommender system than can guess the next best page that a user wouild like to see in the web site has gained a lot of attention. Complex models and approaches have been proposed for recommending web pages to individual users. These approached typically require personal preferences and other kinds of user information in order to make successful predictions.

In this paper, we analyze and compare three different approaches to leverage information embedded in the structure of web sites and the logs of their web servers to improve the effectiveness of web page recommendation. Our proposals exploit the context of the users’ navigations, i.e., their current sessions when surfing a specific web site. These approaches do not require either information about the personal preferences of the users to be stored and processed, or complex structures to be created and maintained. They can be easily incorporated to current large websites to facilitate the users’ navigation experience. Last but not least, the paper reports some comparative experiments using a real-world website to analyze the performance of the proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://ec.europa.eu/ipg/services/statistics/performance_en.htm, statistics computed on June 1st, 2015.

  2. 2.

    http://www.wired.com/2004/10/tail/.

  3. 3.

    https://radimrehurek.com/gensim/.

  4. 4.

    http://www.comune.modena.it.

  5. 5.

    As described in Sect. 3.2, a session includes the pages which are visited by the same user, i.e., the same IP address and User-Agent, in 30 min.

References

  1. Balabanović, M.: Learning to surf: multiagent systems for adaptive web page recommendation. Ph.D. thesis, Stanford University, May 1998

    Google Scholar 

  2. Balabanović, M., Shoham, Y.: Fab: content-based, collaborative recommendation. Commun. ACM 40(3), 66–72 (1997)

    Article  Google Scholar 

  3. Bergamaschi, S., Ferrari, D., Guerra, F., Simonini, G., Velegrakis, Y.: Providing insight into data source topics. J. Data Semant. 5(4), 211–228 (2016)

    Article  Google Scholar 

  4. Bergamaschi, S., Guerra, F., Interlandi, M., Lado, R.T., Velegrakis, Y.: Combining user and database perspective for solving keyword queries over relational databases. Inf. Syst. 55, 1–19 (2016)

    Article  Google Scholar 

  5. Cadegnani, S., Guerra, F., Ilarri, S., Carmen Rodríguez-Hernández, M., Trillo-Lado, R., Velegrakis, Y.: Recommending web pages using item-based collaborative filtering approaches. In: Cardoso, J., Guerra, F., Houben, G.-J., Pinto, A.M., Velegrakis, Y. (eds.) KEYSTONE 2015. LNCS, vol. 9398, pp. 17–29. Springer, Cham (2015). doi:10.1007/978-3-319-27932-9_2

    Chapter  Google Scholar 

  6. Chanda, J., Annappa, B.: An improved web page recommendation system using partitioning and web usage mining. In: International Conference on Intelligent Information Processing, Security and Advanced Communication (IPAC 2015), pp. 80:1–80:6. ACM, New York (2015)

    Google Scholar 

  7. Gündüz, S., Özsu, M.T.: A web page prediction model based on click-stream tree representation of user behavior. In: Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), pp. 535–540. ACM 2003

    Google Scholar 

  8. Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Trans. Internet Technol. 3(1), 1–27 (2003)

    Article  Google Scholar 

  9. Fu, X., Budzik, J., Hammond, K.J.: Mining navigation history for recommendation. In: Fifth International Conference on Intelligent User Interfaces (IUI 2000), pp. 106–112. ACM (2000)

    Google Scholar 

  10. Hernández, I., Rivero, C.R., Ruiz, D., Corchuelo, R.: A statistical approach to URL-based web page clustering. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012 Companion, pp. 525–526. ACM, New York (2012)

    Google Scholar 

  11. Ittoo, A., Bouma, G., Maruster, L., Wortmann, H.: Extracting meronymy relationships from domain-specific, textual corporate databases. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 48–59. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13881-2_5

    Chapter  Google Scholar 

  12. Kazienko, P., Kiewra, M.: Integration of relational databases and web site content for product and page recommendation. In: International Database Engineering and Applications Symposium (IDEAS 2004), pp. 111–116, July 2004

    Google Scholar 

  13. Kosala, R., Blockeel, H.: Web mining research: a survey. SIGKDD Explor. 2(1), 1–15 (2000)

    Article  Google Scholar 

  14. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. CoRR, abs/1405.4053 (2014)

    Google Scholar 

  15. Lieberman, H.: Letizia: an agent that assists web browsing. In: 14th International Joint Conference on Artificial Intelligence (IJCAI 1995), vol. 1, pp. 924–929. Morgan Kaufmann (1995)

    Google Scholar 

  16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)

    Google Scholar 

  17. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Vanderwende, L., III, H.D., Kirchhoff, K. (eds.) Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, 9–14 June 2013, pp. 746–751. The Association for Computational Linguistics (2013)

    Google Scholar 

  18. Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on web usage mining. Commun. ACM 43(8), 142–151 (2000)

    Article  Google Scholar 

  19. Nguyen, T.T.S., Lu, H., Lu, J.: Web-page recommendation based on web usage and domain knowledge. IEEE Trans. Knowl. Data Eng. 26(10), 2574–2587 (2014)

    Article  Google Scholar 

  20. Nirenburg, S., Raskin, V.: Supply-side and demand-side lexical semantics. In: Viegas, E. (ed.) Breadth and Depth of Semantic Lexicons. Text, Speech and Language Technology, vol. 10, pp. 283–298. Springer, Netherlands (1999)

    Chapter  Google Scholar 

  21. Peng, J., Zeng, D.: Topic-based web page recommendation using tags. In: IEEE International Conference on Intelligence and Security Informatics (ISI 2009), pp. 269–271, June 2009

    Google Scholar 

  22. Shahabi, C., Zarkesh, A.M., Adibi, J., Shah, V.: Knowledge discovery from users web-page navigation. In: Seventh International Workshop on Research Issues in Data Engineering (RIDE 1997), pp. 20–29. IEEE Computer Society, April 1997

    Google Scholar 

  23. Souza, T., Demidova, E., Risse, T., Holzmann, H., Gossen, G., Szymanski, J.: Semantic URL Analytics to support efficient annotation of large scale web archives. In: Cardoso, J., Guerra, F., Houben, G.-J., Pinto, A.M., Velegrakis, Y. (eds.) KEYSTONE 2015. LNCS, vol. 9398, pp. 153–166. Springer, Cham (2015). doi:10.1007/978-3-319-27932-9_14

    Chapter  Google Scholar 

  24. Yang, Q., Fan, J., Wang, J., Zhou, L.: Personalizing web page recommendation via collaborative filtering and topic-aware Markov model. In: 10th International Conference on Data Mining (ICDM 2010), pp. 1145–1150, December 2010

    Google Scholar 

  25. Zeng, D., Li, H.: How useful are tags? — An empirical analysis of collaborative tagging for web page recommendation. In: Yang, C.C., et al. (eds.) ISI 2008. LNCS, vol. 5075, pp. 320–330. Springer, Heidelberg (2008). doi:10.1007/978-3-540-69304-8_32

    Chapter  Google Scholar 

Download references

Acknowledgement

The authors would like to acknowledge networking support by the ICT COST Action IC1302 KEYSTONE - Semantic keyword-based search on structured data sources (www.keystone-cost.eu). We also thank the support of the projects TIN2016-78011-C4-3-R (AEI/FEDER, UE), TIN2013-46238-C4-4-R, and DGA-FSE and the Rete Civica Mo-Net from the Comune di Modena for having provided the data exploited in this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Guerra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Cadegnani, S. et al. (2017). Exploiting Linguistic Analysis on URLs for Recommending Web Pages: A Comparative Study. In: Nguyen, N., Kowalczyk, R., Pinto, A., Cardoso, J. (eds) Transactions on Computational Collective Intelligence XXVI. Lecture Notes in Computer Science(), vol 10190. Springer, Cham. https://doi.org/10.1007/978-3-319-59268-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59268-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59267-1

  • Online ISBN: 978-3-319-59268-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics