Abstract
Nowadays, citizens require high level quality information from public institutions in order to guarantee their transparency. Institutional websites of governmental and public bodies must publish and keep updated a large amount of information stored in thousands of web pages in order to satisfy the demands of their users. Due to the amount of information, the “search form”, which is typically available in most such websites, is proven limited to support the users, since it requires them to explicitly express their information needs through keywords. The sites are also affected by the so-called “long tail” phenomenon, a phenomenon that is typically observed in e-commerce portals. The phenomenon is the one in which not all the pages are considered highly important and as a consequence, users searching for information located in pages that are not condiered important are having a hard time locating these pages.
The development of a recommender system than can guess the next best page that a user wouild like to see in the web site has gained a lot of attention. Complex models and approaches have been proposed for recommending web pages to individual users. These approached typically require personal preferences and other kinds of user information in order to make successful predictions.
In this paper, we analyze and compare three different approaches to leverage information embedded in the structure of web sites and the logs of their web servers to improve the effectiveness of web page recommendation. Our proposals exploit the context of the users’ navigations, i.e., their current sessions when surfing a specific web site. These approaches do not require either information about the personal preferences of the users to be stored and processed, or complex structures to be created and maintained. They can be easily incorporated to current large websites to facilitate the users’ navigation experience. Last but not least, the paper reports some comparative experiments using a real-world website to analyze the performance of the proposed approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
http://ec.europa.eu/ipg/services/statistics/performance_en.htm, statistics computed on June 1st, 2015.
- 2.
- 3.
- 4.
- 5.
As described in Sect. 3.2, a session includes the pages which are visited by the same user, i.e., the same IP address and User-Agent, in 30 min.
References
Balabanović, M.: Learning to surf: multiagent systems for adaptive web page recommendation. Ph.D. thesis, Stanford University, May 1998
Balabanović, M., Shoham, Y.: Fab: content-based, collaborative recommendation. Commun. ACM 40(3), 66–72 (1997)
Bergamaschi, S., Ferrari, D., Guerra, F., Simonini, G., Velegrakis, Y.: Providing insight into data source topics. J. Data Semant. 5(4), 211–228 (2016)
Bergamaschi, S., Guerra, F., Interlandi, M., Lado, R.T., Velegrakis, Y.: Combining user and database perspective for solving keyword queries over relational databases. Inf. Syst. 55, 1–19 (2016)
Cadegnani, S., Guerra, F., Ilarri, S., Carmen Rodríguez-Hernández, M., Trillo-Lado, R., Velegrakis, Y.: Recommending web pages using item-based collaborative filtering approaches. In: Cardoso, J., Guerra, F., Houben, G.-J., Pinto, A.M., Velegrakis, Y. (eds.) KEYSTONE 2015. LNCS, vol. 9398, pp. 17–29. Springer, Cham (2015). doi:10.1007/978-3-319-27932-9_2
Chanda, J., Annappa, B.: An improved web page recommendation system using partitioning and web usage mining. In: International Conference on Intelligent Information Processing, Security and Advanced Communication (IPAC 2015), pp. 80:1–80:6. ACM, New York (2015)
Gündüz, S., Özsu, M.T.: A web page prediction model based on click-stream tree representation of user behavior. In: Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), pp. 535–540. ACM 2003
Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Trans. Internet Technol. 3(1), 1–27 (2003)
Fu, X., Budzik, J., Hammond, K.J.: Mining navigation history for recommendation. In: Fifth International Conference on Intelligent User Interfaces (IUI 2000), pp. 106–112. ACM (2000)
Hernández, I., Rivero, C.R., Ruiz, D., Corchuelo, R.: A statistical approach to URL-based web page clustering. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012 Companion, pp. 525–526. ACM, New York (2012)
Ittoo, A., Bouma, G., Maruster, L., Wortmann, H.: Extracting meronymy relationships from domain-specific, textual corporate databases. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 48–59. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13881-2_5
Kazienko, P., Kiewra, M.: Integration of relational databases and web site content for product and page recommendation. In: International Database Engineering and Applications Symposium (IDEAS 2004), pp. 111–116, July 2004
Kosala, R., Blockeel, H.: Web mining research: a survey. SIGKDD Explor. 2(1), 1–15 (2000)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. CoRR, abs/1405.4053 (2014)
Lieberman, H.: Letizia: an agent that assists web browsing. In: 14th International Joint Conference on Artificial Intelligence (IJCAI 1995), vol. 1, pp. 924–929. Morgan Kaufmann (1995)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Vanderwende, L., III, H.D., Kirchhoff, K. (eds.) Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, 9–14 June 2013, pp. 746–751. The Association for Computational Linguistics (2013)
Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on web usage mining. Commun. ACM 43(8), 142–151 (2000)
Nguyen, T.T.S., Lu, H., Lu, J.: Web-page recommendation based on web usage and domain knowledge. IEEE Trans. Knowl. Data Eng. 26(10), 2574–2587 (2014)
Nirenburg, S., Raskin, V.: Supply-side and demand-side lexical semantics. In: Viegas, E. (ed.) Breadth and Depth of Semantic Lexicons. Text, Speech and Language Technology, vol. 10, pp. 283–298. Springer, Netherlands (1999)
Peng, J., Zeng, D.: Topic-based web page recommendation using tags. In: IEEE International Conference on Intelligence and Security Informatics (ISI 2009), pp. 269–271, June 2009
Shahabi, C., Zarkesh, A.M., Adibi, J., Shah, V.: Knowledge discovery from users web-page navigation. In: Seventh International Workshop on Research Issues in Data Engineering (RIDE 1997), pp. 20–29. IEEE Computer Society, April 1997
Souza, T., Demidova, E., Risse, T., Holzmann, H., Gossen, G., Szymanski, J.: Semantic URL Analytics to support efficient annotation of large scale web archives. In: Cardoso, J., Guerra, F., Houben, G.-J., Pinto, A.M., Velegrakis, Y. (eds.) KEYSTONE 2015. LNCS, vol. 9398, pp. 153–166. Springer, Cham (2015). doi:10.1007/978-3-319-27932-9_14
Yang, Q., Fan, J., Wang, J., Zhou, L.: Personalizing web page recommendation via collaborative filtering and topic-aware Markov model. In: 10th International Conference on Data Mining (ICDM 2010), pp. 1145–1150, December 2010
Zeng, D., Li, H.: How useful are tags? — An empirical analysis of collaborative tagging for web page recommendation. In: Yang, C.C., et al. (eds.) ISI 2008. LNCS, vol. 5075, pp. 320–330. Springer, Heidelberg (2008). doi:10.1007/978-3-540-69304-8_32
Acknowledgement
The authors would like to acknowledge networking support by the ICT COST Action IC1302 KEYSTONE - Semantic keyword-based search on structured data sources (www.keystone-cost.eu). We also thank the support of the projects TIN2016-78011-C4-3-R (AEI/FEDER, UE), TIN2013-46238-C4-4-R, and DGA-FSE and the Rete Civica Mo-Net from the Comune di Modena for having provided the data exploited in this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Cadegnani, S. et al. (2017). Exploiting Linguistic Analysis on URLs for Recommending Web Pages: A Comparative Study. In: Nguyen, N., Kowalczyk, R., Pinto, A., Cardoso, J. (eds) Transactions on Computational Collective Intelligence XXVI. Lecture Notes in Computer Science(), vol 10190. Springer, Cham. https://doi.org/10.1007/978-3-319-59268-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-59268-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59267-1
Online ISBN: 978-3-319-59268-8
eBook Packages: Computer ScienceComputer Science (R0)