Skip to main content
Log in

Revealing connectivity structural patterns among web objects based on co-clustering of bipartite request dependency graph

  • Published:
Wireless Networks Aims and scope Submit manuscript

Abstract

Web objects are the entities retrieved from websites by users to compose the web pages. Therefore, exploring the relationships among web objects has theoretical and practical significance for many important applications, such as content recommendation, web page classification, and network security. In this paper, we propose a graph model named Bipartite Request Dependency Graph (BRDG) to investigate the relationships among web objects. To build the BRDG from massive network traffic data, we design and implement a parallel algorithm by leveraging the MapReduce programming model. Based on the study of a number of BRDGs derived from real wireless network traffic datasets, we find that the BRDG is large, sparse and complex, implying that it is very hard to derive the structural characteristics of the BRDG. Towards this end, we propose a co-clustering algorithm to decompose and extract coherent co-clusters from the BRDG. The co-clustering results of the experimental dataset reveal a number of interesting and interpretable connectivity structural patterns among web objects, which are useful for more comprehensive understanding of web page architecture and provide valuable data for e-commerce, social networking, search engine, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Balachandran, A., Aggarwal, V., & Halepovic, E., et al. (2014) Modeling web quality-of-experience on cellular networks. In Proceedings of the 20th annual international conference on mobile computing and networking (pp. 213–224). ACM.

  2. Bayir, M. A., Toroslu, I. H., & Cosar, A., et al. (2008). Discovering more accurate frequent web usage patterns. arXiv preprint arXiv:0804.1409.

  3. Cai, Z., & Wei, W. (2007). General parallel matrix multiplication on the OTIS network. In Signal-Image Technologies and Internet-Based System, Third International IEEE Conference on (pp. 476–481). IEEE.

  4. Dell, R. F, Romn, P. E., & Velsquez, J. D. (2008). Web user session reconstruction using integer programming. In Proceedings of the 2008 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology. Vol. 01 (pp. 385–388). IEEE Computer Society.

  5. Ding C., Li T., & Peng W., et al. (2006). Orthogonal nonnegative matrix t-factorizations for clustering. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 126–135). ACM.

  6. Huiying, Z., & Wei, L. (2004). An intelligent algorithm of data pre-processing in web usage mining. In Intelligent Control and Automation, Fifth World Congress on. Vol. 4. (pp. 3119–3123). IEEE.

  7. Jin, Y., Duffield, N., & Haffner, P., et al. (2011). Can’t see forest through the trees? Understanding mixed network traffic graphs from application class distribution. In Proceedings of 9th Workshop on Mining and Learning with Graphs.

  8. Liu, J., Liu, F., & Ansari, N. (2014). Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop. IEEE Network, 28(4), 32–39.

    Article  Google Scholar 

  9. Moghaddam, S., Helmy, A., & Ranka, S., et al. (2010). Data-driven co-clustering model of internet usage in large mobile societies. In Proceedings of the 13th ACM international conference on modeling, analysis, and simulation of wireless and mobile systems (pp. 248–256). ACM.

  10. Papalexakis, E. E., & Sidiropoulos, N. D. (2011). Co-clustering as multilinear decomposition with sparse latent factors. In Acoustics, Speech and Signal Processing, 2011 IEEE International Conference on (pp. 2064–2067). IEEE.

  11. Sun, Y., & Tong, Y. (2010). Cuda based fast implementation of very large matrix computation. In Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2010 International Conference on (pp. 487–491). IEEE.

  12. Suneetha, K. R., & Krishnamoorthi, R. (2009). Identifying user behavior by analyzing web server access log file. IJCSNS International Journal of Computer Science and Network Security, 9(4), 327–332.

    Google Scholar 

  13. Paulson, L. D. (2005). Building rich web applications with Ajax. Computer, 38(10), 14–17.

    Article  Google Scholar 

  14. Perfetti, C., & Spool, J. M. (2002). Macromedia flash: A new hope for web applications. User Interface Engineering.

  15. Bayir, M. A., Toroslu, I. H., & Cosar, A., et al. (2009). Smart miner: A new framework for mining large scale web usage data. In Proceedings of the 18th international conference on World Wide Web (pp. 161–170). ACM.

  16. Chitraa, V., & Thanamani, Antony Selvadoss. (2011). A novel technique for sessions identification in web usage mining preprocessing. International Journal of Computer Applications, 34(9), 23–27.

    Google Scholar 

  17. Zhou, B., Hui, S. C., & Fong, A. C. M. (2006). An effective approach for periodic web personalization. In IEEE/WIC/ACM international conference on (pp. 284–292). IEEE.

  18. Jin, Y., Sharafuddin, E., & Zhang, Z. L. (2009). Unveiling core network-wide communication patterns through application traffic activity graph decomposition. ACM SIGMETRICS Performance Evaluation Review, 37(1), 49–60.

    Google Scholar 

  19. Jiang, N., Cao, J., & Jin, Y., et al. (2010). Identifying suspicious activities through dns failure graph analysis. In Network Protocols, 18th IEEE international conference on (pp. 144–153). IEEE.

  20. Jin, Y., Duffield, N., & Gerber, A., et al. (2012). Characterizing data usage patterns in a large cellular network. In Proceedings of the 2012 ACM SIGCOMM workshop on Cellular networks: Operations, challenges, and future design (pp. 7–12). ACM.

  21. Liu, J., Fang, C., & Ansari, N. Request dependency graph: A model for web usage mining in large-scale web of things. IEEE Internet of Things Journal (accepted).

  22. GraphX :Apache Spark’s API for graphs and graph-parallel computation, URL http://spark.apache.org/graphx/

  23. Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 269–274). ACM.

  24. Papalexakis, E. E., Beutel, A., & Steenkiste, P. (2012). Network anomaly detection using co-clustering. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (pp. 403–410). IEEE Computer Society.

  25. Padmanabhan, V. N., & Mogul, J. C. (1996). Using predictive prefetching to improve world wide web latency. ACM SIGCOMM Computer Communication Review, 26(3), 22–36.

    Article  Google Scholar 

  26. Griffioen, J., & Appleton, R. (1994). Reducing file system latency using a predictive approach. In USENIX Summer (pp. 197–207).

  27. Domenech, J., Gil, J. A., & Sahuquillo, J., et al. (2006). DDG: An efficient prefetching algorithm for current web generation. In Hot Topics in Web Systems and Technologies, 1st IEEE Workshop on (pp. 1–12). IEEE.

  28. de la Ossa, B., Pont, A., & Sahuquillo, J., et al. (2010). Referrer graph: A low-cost web prediction algorithm. In Proceedings of the 2010 ACM Symposium on Applied Computing (pp. 831–838). ACM.

  29. Nataliani, Y., & Wellem, T. (2014). HTTP traffic graph clustering using Markov clustering algorithm. International Journal of Computer Applications, 90(2), 37–41.

    Article  Google Scholar 

  30. Liu, J., & Ansari, N. (2014). Identifying website communities in mobile internet based on affinity measurement. Computer Communications, 41, 22–30.

    Article  Google Scholar 

  31. Zha, H., He, X., & Ding, C., et al. (2001). Bipartite graph partitioning and data clustering. In Proceedings of the tenth international conference on Information and knowledge management. ACM.

  32. Bellur, U., & Kulkarni, R. (2007). Improved matchmaking algorithm for semantic web services based on bipartite graph matching. In Web Services, 2007. ICWS 2007, IEEE international conference on. IEEE.

  33. Lu, W., & Xue, L. (2014). A heuristic-based co-clustering algorithm for the internet traffic classification. In Advanced Information Networking and Applications Workshops (WAINA), 28th international conference on (pp. 49–54). IEEE.

  34. Kanjani, K. (2007). Parallel non negative matrix factorization for document clustering. In CPSC-659. Rep: (Parallel and Distributed Numerical Algorithms) course, Texas University, Tech.

  35. Robila, S. A., & Maciak, L. G. (2006). A parallel unmixing algorithm for hyperspectral images. In Optics East 2006, International Society for Optics and Photonics.

  36. George, T., & Merugu, S. (2005). A scalable collaborative filtering framework based on co-clustering. In Data Mining, Fifth IEEE international conference on. IEEE.

  37. Tian, S. Q., Fang, C., & Liu, J. (2016). Detecting malicious domains by massive DNS traffic data analysis. In 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC 2016). (accepted).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheng Fang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, C., Liu, J. & Ansari, N. Revealing connectivity structural patterns among web objects based on co-clustering of bipartite request dependency graph. Wireless Netw 24, 439–451 (2018). https://doi.org/10.1007/s11276-016-1345-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11276-016-1345-5

Keywords

Navigation