Abstract
The world wide web is growing continuously and rapidly; it is quickly facilitating the migration of tasks of the daily life into web-based. This trend shows time will come when everyone is forced to use the web for daily activities. Naive users arc the major concern of such a shift; so, it is necessary to have the web ready to serve them. We argue that this requires well optimized websites for users to quickly locate the information they arc looking for. This, on the other hand, becomes more and more important due to the widespread reliance on the many services available on the Internet nowadays. It is true that search engines can facilitate the task of finding the information one is looking for. However, search engines will never replace but do complement the optimization of a website’s internal structure based on previously recorded user behavior. In this chapter, wc will present a novel approach for identifying problematic structures in websites. This method consists of two phases. The first phase compares user behavior, derived via web log mining techniques, to a combined analysis of the website’s link structure obtained by applying three methods leading to more robust framework and hence strong and consistent outcome: (1) constructing and analyzing a social network of the pages constituting the website by considering both the structure and the usage information; (2) applying the Weighted PageRank algorithm; and (3) applying the Hypertext Induced Topic Selection (HITS) method. In the second phase, we use the term frequency-inverse document frequency (TFIDF) measure to investigate further the correlation between the page that contains the link and the linked to pages in order to further support the findings of the first phase of our approach. We will then show how to use these intermediate results in order to point out problematic website structures to the website owner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Abitcboul, M. Prcda, and G. Cobcna. Adaptive on-line page importance computation. Proceedings of the International Conference on World Wide Web, pp. 280–290, 2003.
M. Adnan and R. Alhajj, “DRFP-Tree: Disk-Resident Frequent Pattern Tree,” Applied Intelligence, Vol. 30, No.2, pp. 84–97, 2009.
A. Altman and M. Tennenholtz. Ranking systems: the pagerank axioms. Proceedings of ACM Conference International on Electronic commerce, pp. 1–8, 2005.
V. Batagclj, A. Mrvar: Pajek — Program for Large Network Analysis. Home page: http:// vlado. fmf.uni-lj. si/pub/networ ks/paj ek/
M. Bianchini, M. Gori, and F. Scarselli. Inside pagerank. ACM Transactions on Internet Technology, 5(1):92–128, 2005.
P. Boldi, M. Santini, and S. Vigna. Pagcrank as a function of the damping factor. Proceedings of the. International Conference on World Wide. Web, pp. 557–566, 2005.
C. Borgclt. Efficient implementations of apriori and eclat, Proceedings of the. Workshop of Frequent Item Set Mining Implementations, Melbourne, FL, 2003.
A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. ACM Transactions on Internet Technology, 5(l):231–297, 2005.
J. T. Bradley, D. V. de Jager, W. J. Knottenbelt, and A. Trifunovic. Hypergraph partitioning for faster parallel pagcrank computation. Proceedings of Formal Techniques for Computer Systems and Business Processes, European Performance Engineering Workshop, pp. 155–171, 2005.
S. Chakrabarti, B. Dom, D. Gibson, J. Klcinbcrg, P. Raghavan, and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. Proceedings of the. International Conference on World Wide. Web, 1998.
Y.-Y. Chen, Q. Gan, and T. Sucl. I/o-cfficicnt techniques for computing pagcrank. Proceedings of ACM International Conference on Information and knowledge management, pp. 549 557, 2002.
P.-A. Chirita, J. Dicdcrich, and W. Ncjdl. Mailrank: using ranking for spam detection. Proceedings of ACM International Conference on Information and knowledge. management, pp. 373–380, 2005.
J. Clio, S. Roy, and R. E. Adams. Page quality: in search of an unbiased web ranking. Proceedings of ACM SIGMOD, pp. 551–562, 2005.
L. da F. Costa, F. A. Rodrigues, G. Travieso and P. R. Villas Boas. Characterization of complex networks: a survey of measurements Advanced Physics, Vol. 56, pp 167–242, 2007.
J. Dean and M. Henzinger. Finding related pages in the world wide web. Proceedings of the International Conference on World Wide Weh, 1999.
G. Guo, et al., An kNN Model-Based Approach and Its Application in Text Categorization. Proceedings of the. International Conference on Computational Linguistics and Intelligent Text Processing, pp. 559–570, 2004.
J. Hou and Y. Zhang. Effectively finding relevant web pages from linkage information. IEEE Transactions on Knowledge and Data Engineering, 15(4):940–951, 2003.
W. H. Hsu, A. King, M. S. Paradesi, T. Pydimarri, and T. Weninger. Collaborative and structural recommendation of friends using weblog-based social network analysis. In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs (CAAW), volume SS-06-03, pages 55–60, Menlo Park, CA, 2006.
J. Jeffrey, P. Karski, B. Lolirmann, K. Kianmehr and R. Alhajj, “Optimizing Web Structures Using Web Mining Techniques,” In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, Springer-Verlag LNCS, Brimingham, UK, 2007.
X.-M. Jiang, G.-R. Xue, W.-G. Song, H.-J. Zeng, Z. Chen, and W.-Y. Ma. Exploiting pagerank at different block level. Proceedings of the International Conference on Web Information Systems Engineering, pp. 241–252. 2004.
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, pap. 668–677. 1998.
C.H. Li and C.K. Chui. Web structure mining for usability analysis. Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, pp. 309–312, 2005.
P. Massa and C. Hayes. Page-rerank: Using trusted links to re-rank authority. Proc. of IEEE/WIC/ACM International Conference on Web Intelligence, pp. 614–617, 2005.
I. V. Renata Iváncsy. Frequent pattern mining in web log data. Journal of Applied Sciences at Budapest Tech, 3(1):77–90, 2006.
P. Soucy and G. W. Mineau, Beyond TFIDF Weighting for Text Categorization in the Vector Space Model. Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1130–1135, 2005.
R. Steinberger, B. Pouliquen and J. Hagman, Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC. Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing, pp. 415–424, 2002.
X. Wan, E. Milios, N. Kalyaniwalla, and J. Janssen, “Link-based event detection in email communication networks,” in SAC’ 09: Proceedings of the. 2009 ACM symposium on Applied Computing. New York, NY, USA: ACM, 2009, pp. 1506-1510.
W. Xing and A. A. Ghorbani. Weighted pagcrank algorithm. itProceedings of Annual Conference on Communication Networks and Services Research, pp. 305–314, 2004.
J. X. Yu, Y. Ou, C. Zhang, and S. Zhang. Identifying interesting customers through web log classification. IEEE Intelligent Systems, 20(3):55–59, 2005.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag/Wien
About this chapter
Cite this chapter
Xagi, M. et al. (2010). Employing Social Network Construction and Analysis in Web Structure Optimization. In: Memon, N., Alhajj, R. (eds) From Sociology to Computing in Social Networks. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0294-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-7091-0294-7_2
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-0293-0
Online ISBN: 978-3-7091-0294-7
eBook Packages: Computer ScienceComputer Science (R0)