Skip to main content

Employing Social Network Construction and Analysis in Web Structure Optimization

  • Chapter
From Sociology to Computing in Social Networks

Abstract

The world wide web is growing continuously and rapidly; it is quickly facilitating the migration of tasks of the daily life into web-based. This trend shows time will come when everyone is forced to use the web for daily activities. Naive users arc the major concern of such a shift; so, it is necessary to have the web ready to serve them. We argue that this requires well optimized websites for users to quickly locate the information they arc looking for. This, on the other hand, becomes more and more important due to the widespread reliance on the many services available on the Internet nowadays. It is true that search engines can facilitate the task of finding the information one is looking for. However, search engines will never replace but do complement the optimization of a website’s internal structure based on previously recorded user behavior. In this chapter, wc will present a novel approach for identifying problematic structures in websites. This method consists of two phases. The first phase compares user behavior, derived via web log mining techniques, to a combined analysis of the website’s link structure obtained by applying three methods leading to more robust framework and hence strong and consistent outcome: (1) constructing and analyzing a social network of the pages constituting the website by considering both the structure and the usage information; (2) applying the Weighted PageRank algorithm; and (3) applying the Hypertext Induced Topic Selection (HITS) method. In the second phase, we use the term frequency-inverse document frequency (TFIDF) measure to investigate further the correlation between the page that contains the link and the linked to pages in order to further support the findings of the first phase of our approach. We will then show how to use these intermediate results in order to point out problematic website structures to the website owner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abitcboul, M. Prcda, and G. Cobcna. Adaptive on-line page importance computation. Proceedings of the International Conference on World Wide Web, pp. 280–290, 2003.

    Google Scholar 

  2. M. Adnan and R. Alhajj, “DRFP-Tree: Disk-Resident Frequent Pattern Tree,” Applied Intelligence, Vol. 30, No.2, pp. 84–97, 2009.

    Article  Google Scholar 

  3. A. Altman and M. Tennenholtz. Ranking systems: the pagerank axioms. Proceedings of ACM Conference International on Electronic commerce, pp. 1–8, 2005.

    Google Scholar 

  4. V. Batagclj, A. Mrvar: Pajek — Program for Large Network Analysis. Home page: http:// vlado. fmf.uni-lj. si/pub/networ ks/paj ek/

    Google Scholar 

  5. M. Bianchini, M. Gori, and F. Scarselli. Inside pagerank. ACM Transactions on Internet Technology, 5(1):92–128, 2005.

    Article  Google Scholar 

  6. P. Boldi, M. Santini, and S. Vigna. Pagcrank as a function of the damping factor. Proceedings of the. International Conference on World Wide. Web, pp. 557–566, 2005.

    Google Scholar 

  7. C. Borgclt. Efficient implementations of apriori and eclat, Proceedings of the. Workshop of Frequent Item Set Mining Implementations, Melbourne, FL, 2003.

    Google Scholar 

  8. A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. ACM Transactions on Internet Technology, 5(l):231–297, 2005.

    Article  Google Scholar 

  9. J. T. Bradley, D. V. de Jager, W. J. Knottenbelt, and A. Trifunovic. Hypergraph partitioning for faster parallel pagcrank computation. Proceedings of Formal Techniques for Computer Systems and Business Processes, European Performance Engineering Workshop, pp. 155–171, 2005.

    Google Scholar 

  10. S. Chakrabarti, B. Dom, D. Gibson, J. Klcinbcrg, P. Raghavan, and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. Proceedings of the. International Conference on World Wide. Web, 1998.

    Google Scholar 

  11. Y.-Y. Chen, Q. Gan, and T. Sucl. I/o-cfficicnt techniques for computing pagcrank. Proceedings of ACM International Conference on Information and knowledge management, pp. 549 557, 2002.

    Google Scholar 

  12. P.-A. Chirita, J. Dicdcrich, and W. Ncjdl. Mailrank: using ranking for spam detection. Proceedings of ACM International Conference on Information and knowledge. management, pp. 373–380, 2005.

    Google Scholar 

  13. J. Clio, S. Roy, and R. E. Adams. Page quality: in search of an unbiased web ranking. Proceedings of ACM SIGMOD, pp. 551–562, 2005.

    Google Scholar 

  14. L. da F. Costa, F. A. Rodrigues, G. Travieso and P. R. Villas Boas. Characterization of complex networks: a survey of measurements Advanced Physics, Vol. 56, pp 167–242, 2007.

    Article  Google Scholar 

  15. J. Dean and M. Henzinger. Finding related pages in the world wide web. Proceedings of the International Conference on World Wide Weh, 1999.

    Google Scholar 

  16. G. Guo, et al., An kNN Model-Based Approach and Its Application in Text Categorization. Proceedings of the. International Conference on Computational Linguistics and Intelligent Text Processing, pp. 559–570, 2004.

    Google Scholar 

  17. J. Hou and Y. Zhang. Effectively finding relevant web pages from linkage information. IEEE Transactions on Knowledge and Data Engineering, 15(4):940–951, 2003.

    Article  Google Scholar 

  18. W. H. Hsu, A. King, M. S. Paradesi, T. Pydimarri, and T. Weninger. Collaborative and structural recommendation of friends using weblog-based social network analysis. In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs (CAAW), volume SS-06-03, pages 55–60, Menlo Park, CA, 2006.

    Google Scholar 

  19. J. Jeffrey, P. Karski, B. Lolirmann, K. Kianmehr and R. Alhajj, “Optimizing Web Structures Using Web Mining Techniques,” In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, Springer-Verlag LNCS, Brimingham, UK, 2007.

    Google Scholar 

  20. X.-M. Jiang, G.-R. Xue, W.-G. Song, H.-J. Zeng, Z. Chen, and W.-Y. Ma. Exploiting pagerank at different block level. Proceedings of the International Conference on Web Information Systems Engineering, pp. 241–252. 2004.

    Google Scholar 

  21. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, pap. 668–677. 1998.

    Google Scholar 

  22. C.H. Li and C.K. Chui. Web structure mining for usability analysis. Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, pp. 309–312, 2005.

    Google Scholar 

  23. P. Massa and C. Hayes. Page-rerank: Using trusted links to re-rank authority. Proc. of IEEE/WIC/ACM International Conference on Web Intelligence, pp. 614–617, 2005.

    Google Scholar 

  24. I. V. Renata Iváncsy. Frequent pattern mining in web log data. Journal of Applied Sciences at Budapest Tech, 3(1):77–90, 2006.

    Google Scholar 

  25. P. Soucy and G. W. Mineau, Beyond TFIDF Weighting for Text Categorization in the Vector Space Model. Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1130–1135, 2005.

    Google Scholar 

  26. R. Steinberger, B. Pouliquen and J. Hagman, Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC. Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing, pp. 415–424, 2002.

    Google Scholar 

  27. X. Wan, E. Milios, N. Kalyaniwalla, and J. Janssen, “Link-based event detection in email communication networks,” in SAC’ 09: Proceedings of the. 2009 ACM symposium on Applied Computing. New York, NY, USA: ACM, 2009, pp. 1506-1510.

    Google Scholar 

  28. W. Xing and A. A. Ghorbani. Weighted pagcrank algorithm. itProceedings of Annual Conference on Communication Networks and Services Research, pp. 305–314, 2004.

    Google Scholar 

  29. J. X. Yu, Y. Ou, C. Zhang, and S. Zhang. Identifying interesting customers through web log classification. IEEE Intelligent Systems, 20(3):55–59, 2005.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag/Wien

About this chapter

Cite this chapter

Xagi, M. et al. (2010). Employing Social Network Construction and Analysis in Web Structure Optimization. In: Memon, N., Alhajj, R. (eds) From Sociology to Computing in Social Networks. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0294-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-0294-7_2

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-7091-0293-0

  • Online ISBN: 978-3-7091-0294-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics