ABSTRACT
Free content websites (FCWs) are a critical part of the Internet, and understanding them is essential for their wide use. This study statistically explores the distribution of free content websites globally by analyzing their hosting network scale, cloud service provider, and country-level distribution, combined and per the content category they provide, and by contrasting these measurements to the characteristics of premium content websites (PCWs). Our study further contrasts the distribution of these websites to general websites sampled from the Alexa top-1M websites and explores their security attributes using various security indicators.
We found that FCWs and PCWs are hosted mainly in medium-scale networks, a scale that is shown to be associated with a high concentration of malicious websites. Moreover, FCWs cloud and country-level distributions are shown to be heavy-tailed, although with unique patterns compared to PCWs. Our study contributes to understanding the FCWs ecosystem through various quantitative analyses. The results highlight the possibility of containing their harm, when malicious, through effective isolation and filtering thanks to their network, cloud, and country-level concentration.
- --. 2022a. hrefhttps://ipdata.co/about.htmlReliable IP ddress Data. (2022). Last access December 14, 2022.Google Scholar
- --. 2022b. hrefhttps://www.virustotal.com/Analyze suspicious files and URLs to detect types of malware automatically. (2022). Last access December 14, 2022.Google Scholar
- --. 2023. hrefhttps://en.ipshu.com/IP Address Lookup Tools. (2023). Last access January 19, 2023.Google Scholar
- Devdatta Akhawe, Adam Barth, Peifung E. Lam, John C. Mitchell, and Dawn Song. 2010. hrefhttps://doi.org/10.1109/CSF.2010.27Towards a Formal Foundation of Web Security. In Proceedings of the 23rd IEEE Computer Security Foundations Symposium, CSF. 290--304.Google ScholarCross Ref
- Abdulrahman Alabduljabbar, Ahmed Abusnaina, Ulku Meteriz-Yildiran, and David Mohaisen. 2021. hrefhttps://doi.org/10.1145/3463676.3485608TLDR: Deep Learning-Based Automated Privacy Policy Annotation with Key Policy Highlights. In ACM WPES. 103--118.Google ScholarCross Ref
- Abdulrahman Alabduljabbar, Runyu Ma, Sultan Alshamrani, Rhongho Jang, Songqing Chen, and David Mohaisen. 2022a. Poster: Measuring and Assessing the Risks of Free Content Websites. In NDSS.Google Scholar
- Abdulrahman Alabduljabbar, Runyu Ma, Soohyeon Choi, Rhongho Jang, Songqing Chen, and David Mohaisen. 2022b. hrefhttps://doi.org/10.1145/3494108.3522769Understanding the Security of Free Content Websites by Analyzing their SSL Certificates: A Comparative Study. In CySSS@AsiaCCS. 19--25.Google ScholarCross Ref
- Abdulrahman Alabduljabbar and David Mohaisen. 2022. hrefhttps://doi.org/10.1145/3487553.3524663Measuring the Privacy Dimension of Free Content Websites through Automated Privacy Policy Analysis and Annotation. In Companion of The Web Conference, WWW. 860--867.Google ScholarCross Ref
- Mohammed Alaqdhi, Abdulrahman Alabduljabbar, Kyle Thomas, Saeed Salem, DaeHun Nyang, and David Mohaisen. 2022. hrefhttps://doi.org/10.48550/arXiv.2210.12083Do Content Management Systems Impact the Security of Free Content Websites? A Correlation Analysis. In CSoNet.Google ScholarCross Ref
- Mohammed Alkinoon, Sung J. Choi, and David Mohaisen. 2021a. hrefhttps://doi.org/10.1007/978--3-030--89432-0_22Measuring Healthcare Data Breaches. In Proceedings of the 22nd International Conference on Information Security Applications, WISA. 265--277.Google ScholarCross Ref
- Mohammed Alkinoon, Marwan Omar, Manar Mohaisen, and David Mohaisen. 2021b. hrefhttps://doi.org/10.1007/978--3-030--91434--9_16Security Breaches in the Healthcare Domain: A Spatiotemporal Analysis. In Proceedings of the 10th International Conference on Computational Data and Social Networks (CSoNet). Springer, 171--183.Google ScholarCross Ref
- Omar Alrawi and Aziz Mohaisen. 2016. hrefhttps://doi.org/10.1145/2872518.2888610Chains of Distrust: Towards Understanding Certificates Used for Signing Malicious Applications. In Proceedings of the 25th International Conference on World Wide Web,(WWW). 451--456.Google ScholarCross Ref
- Izzat Alsmadi and Fahad Mira. 2018. hrefhttps://ieeexplore.ieee.org/abstract/document/8592962Website security analysis: variation of detection methods and decisions. In Proceedings of the 21st IEEE/Saudi Computer Society National Computer Conference (NCC).Google Scholar
- Pradeep Bangera and Sergey Gorinsky. 2017. hrefhttps://doi.org/10.23919/IFIPNetworking.2017.8264851Ads versus regular contents: Dissecting the web hosting ecosystem. In Proceedings of Networking Conference, IFIP Networking and Workshops, Stockholm, Sweden, IEEE. 1--9.Google ScholarCross Ref
- Stefano Calzavara, Alvise Rabitti, and Michele Bugliesi. 2016. hrefhttps://doi.org/10.1145/2976749.2978338Content Security Problems?: Evaluating the Effectiveness of Content Security Policy in the Wild. In ACM CCS. 1365--1375.Google ScholarCross Ref
- Stefano Calzavara, Alvise Rabitti, and Michele Bugliesi. 2018. hrefhttps://doi.org/10.1145/3149408Semantics-Based Analysis of Content Security Policy Deployment. ACM Trans. Web, Vol. 12, 2 (2018), 10:1--10:36.Google ScholarDigital Library
- David G. Dobolyi and Ahmed Abbasi. 2016. hrefhttps://doi.org/10.1109/ISI.2016.7745439PhishMonger: A free and open source public archive of real-world phishing websites. In Proceedings of IEEE Conference on Intelligence and Security Informatics, ISI. 31--36.Google ScholarCross Ref
- Steven Englehardt and Arvind Narayanan. 2016. hrefhttps://doi.org/10.1145/2976749.2978313Online Tracking: A 1-million-site Measurement and Analysis. In ACM CCS. 1388--1401.Google ScholarCross Ref
- Daniel Fett, Ralf Kü sters, and Guido Schmitz. 2017. hrefhttps://doi.org/10.1109/CSF.2017.20The Web SSO Standard OpenID Connect: In-depth Formal Security Analysis and Security Guidelines. In Proceedings of the 30th IEEE Computer Security Foundations Symposium, CSF. 189--202.Google ScholarCross Ref
- Emilio Figueras-Mart'in, Roberto Magá n-Carrió n, and Juan Boubeta-Puig. 2022. hrefhttps://doi.org/10.1016/j.jisa.2022.103229Drawing the web structure and content analysis beyond the Tor darknet: Freenet as a case of study. J. Inf. Secur. Appl. , Vol. 68, 8 (2022), 103229.Google ScholarCross Ref
- Huw Fryer, Sophie StallaBourdillon, and Tim Chown. 2015. hrefhttps://doi.org/10.1016/j.clsr.2015.05.011Malicious web pages: What if hosting providers could actually do something.. Comput. Law Secur. Rev. , Vol. 31, 4 (2015), 490--505.Google ScholarCross Ref
- Sid Ghodke. 2022. hrefhttps://www.kaggle.com/datasets/cheedcheed/top1mTop 1 Million Websites. (2022). Last access December 8, 2022.Google Scholar
- hrefhttp://dl.acm.org/citation.cfm?id=2994548Marie Vasek and Matthew Weeden and Tyler Moore. 2016. Measuring the Impact of Sharing Abuse Data with Web Hosting Providers. In Proceedings of the Workshop on Information Sharing and Collaborative Security, ACM. 71--80.Google ScholarDigital Library
- Tam T. Huynh, Thuc D. Nguyen, Nhung T. H. Nguyen, and Hanh Tan. 2020. hrefhttps://doi.org/10.1007/978--3-030--63083--6_24Privacy-Preserving for Web Hosting. In Industrial Networks and Intelligent Systems - 6th EAI International Conference (Proceedings of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, Springer ), Vol. 334. 314--323.Google ScholarCross Ref
- Ranjita Pai Kasturi, Yiting Sun, Ruian Duan, Omar Alrawi, Ehsan Asdar, Victor Zhu, Yonghwi Kwon, and Brendan Saltaformaggio. 2020. hrefhttps://doi.org/10.1109/SP40000.2020.00116TARDIS: Rolling Back The Clock On CMS-Targeting Cyber Attacks. In Proceedings of the IEEE Symposium on Security and Privacy, SP. 1156--1171.Google ScholarCross Ref
- Surbhi Khare and Abhishek Badholia. 2022. hrefhttps://doi.org/10.4018/ijismd.297629Analysis of Cloud and Self-Web-Hosting Services Based on Security Parameters. Int. J. Inf. Syst. Model. Des. , Vol. 13, 6 (2022), 1--14.Google ScholarDigital Library
- Jan Kohout and Tomáv s Pevný. 2015. hrefhttps://doi.org/10.1109/INM.2015.7140487Automatic discovery of web servers hosting similar applications. In IFIP International Symposium on Integrated Network Management, IEEE. 1310--1315.Google ScholarCross Ref
- Suleyman Kondakci. 2009. hrefhttps://doi.org/10.1016/j.cose.2009.03.007A concise cost analysis of Internet malware. Comput. Secur. , Vol. 28, 7 (2009), 648--659.Google ScholarDigital Library
- Georgios Kontaxis, Demetres Antoniades, Iasonas Polakis, and Evangelos P. Markatos. 2011. hrefhttps://doi.org/10.1145/1972551.1972558An empirical study on the security of cross-domain policies in rich internet applications. In Proceedings of the Fourth European Workshop on System Security, EuroSec.Google ScholarCross Ref
- Ahmed E. Kosba, Aziz Mohaisen, Andrew G. West, Trevor Tonn, and Huy Kang Kim. 2014. hrefhttps://doi.org/10.1007/978--3--319--15087--1_1ADAM: Automated Detection and Attribution of Malicious Webpages. In Proceedings of the 15th International Workshop on Information Security Applications, WISA. 3--16.Google ScholarCross Ref
- Dongwon Lee, Kihwan Nam, Ingoo Han, and Kanghyun Cho. 2022. hrefhttps://doi.org/10.1016/j.im.2022.103681From free to fee: Monetizing digital content through expected utility-based recommender systems. Inf. Manag. , Vol. 59, 6 (2022), 103681.Google ScholarDigital Library
- Zhou Li, Kehuan Zhang, Yinglian Xie, Fang Yu, and XiaoFeng Wang. 2012. hrefhttps://doi.org/10.1145/2382196.2382267Knowing your enemy: understanding and detecting malicious web advertising. In Proceedings of the ACM Conference on Computer and Communications Security, CCS. 674--686.Google ScholarCross Ref
- Xiaojing Liao, Chang Liu, Damon McCoy, Elaine Shi, Shuang Hao, and Raheem A. Beyah. 2016. hrefhttps://doi.org/10.1145/2872427.2883008Characterizing Long-tail SEO Spam on Cloud Web Hosting Services. In Proceedings of the 25th International Conference on World Wide Web, ACM. 321--332.Google ScholarCross Ref
- Timothy Libert. 2015. hrefhttp://arxiv.org/abs/1511.00619Exposing the Hidden Web: An Analysis of Third-Party HTTP Requests on 1 Million Websites. CoRR (2015).Google Scholar
- Elisa Mannes and Carlos Maziero. 2019. hrefhttps://doi.org/10.1145/3311888Naming Content on the Network Layer: A Security Analysis of the Information-Centric Network Model. ACM Comput. Surv. , Vol. 52, 3 (2019), 44:1--44:28.Google ScholarCross Ref
- Srdjan Matic, Gareth Tyson, and Gianluca Stringhini. 2019. hrefhttps://doi.org/10.1145/3308558.3313664PYTHIA: a Framework for the Automated Analysis of Web Hosting Environments. In World Wide Web Conference. 3072--3078.Google ScholarCross Ref
- Seyed Ali Mirheidari, Sajjad Arshad, Saeidreza Khoshkdahan, and Rasool Jalili. 2012. hrefhttps://ieeexplore.ieee.org/document/6470968/Two novel server-side attacks against log file in Shared Web Hosting servers. In Proceedings of The 7th International Conference for Internet Technology and Secured Transactions, ICITST IEEE. 318--323.Google Scholar
- Seyed Ali Mirheidari, Sajjad Arshad, Saeidreza Khoshkdahan, and Rasool Jalili. 2018. hrefhttp://arxiv.org/abs/1811.00922A Comprehensive Approach to Abusing Locality in Shared Web Hosting Servers. CoRR , Vol. abs/1811.00922 (2018).Google Scholar
- Aziz Mohaisen. 2015. hrefhttps://doi.org/10.1109/HotWeb.2015.20Towards Automatic and Lightweight Detection and Classification of Malicious Web Contents. In IEEE Hot Topics in Web Systems and Technologies. 67--72.Google ScholarCross Ref
- Aziz Mohaisen, Omar Alrawi, and Manar Mohaisen. 2015. hrefhttps://doi.org/10.1016/j.cose.2015.04.001AMAL: High-fidelity, behavior-based automated malware analysis and classification. Comput. Secur. , Vol. 52 (2015), 251--266.Google ScholarDigital Library
- Van Linh Nguyen, Po-Ching Lin, and Ren-Hung Hwang. 2019. hrefhttp://arxiv.org/abs/1903.05470Preventing the attempts of abusing cheap-hosting Web-servers for monetization attacks. CoRR , Vol. abs/1903.05470 (2019).Google Scholar
- Arman Noroozian, Elsa Rodr'i guez, Elmer Lastdrager, Takahiro Kasama, Michel van Eeten, and Carlos Ga n á n. 2021. hrefhttps://doi.org/10.1109/EuroSP51992.2021.00031Can ISPs Help Mitigate IoT Malware? A Longitudinal Study of Broadband ISP Security Efforts. In Proceedings of the IEEE European Symposium on Security and Privacy, EuroS&P. 337--352.Google ScholarCross Ref
- Simone Raponi and Roberto Di Pietro. 2020. hrefhttps://doi.org/10.1109/ACCESS.2020.2981207A Longitudinal Study on Web-Sites Password Management (in)Security: Evidence and Remedies. IEEE Access (2020), 52075--52090.Google ScholarCross Ref
- Syed R. Rizvi, Brian D. Killough, Andrew Cherry, and Sanjay Gowda. 2018. hrefhttps://doi.org/10.1109/IGARSS.2018.8518084Lessons Learned and Cost Analysis of Hosting a Full Stack Open Data Cube (ODC) Application on the Amazon Web Services (AWS). In Proceedings of International Geoscience and Remote Sensing Symposium, IEEE. 8643--8646.Google ScholarCross Ref
- Sayak Saha Roy, Unique Karanjit, and Shirin Nilizadeh. 2022. hrefhttps://doi.org/10.48550/arXiv.2212.02563A Large-Scale Analysis of Phishing Websites Hosted on Free Web Hosting Domains. CoRR , Vol. abs/2212.02563 (2022).Google ScholarCross Ref
- Nayanamana Samarasinghe, Aashish Adhikari, Mohammad Mannan, and Amr M. Youssef. 2022. hrefhttps://doi.org/10.1145/3485447.3512223Et tu, Brute? Privacy Analysis of Government Websites and Mobile Apps. In ACM Web Conference.Google ScholarCross Ref
- Samaneh Tajalizadehkhoob, Tom van Goethem, Maciej Korczynski, Arman Noroozian, Rainer Bö hme, Tyler Moore, Wouter Joosen, and Michel van Eeten. 2017. hrefhttps://doi.org/10.1145/3133956.3133971Herding Vulnerable Cats: A Statistical Approach to Disentangle Joint Responsibility for Web Security in Shared Hosting. In Proceedings of the SIGSAC Conference on Computer and Communications Security, ACM. 553--567.Google ScholarCross Ref
- Synthia Wang, Kyle MacMillan, Brennan Schaffner, Nick Feamster, and Marshini Chetty. 2021. hrefhttps://arxiv.org/abs/2110.15345A First Look at the Consolidation of DNS and Web Hosting Providers. CoRR , Vol. abs/2110.15345 (2021).Google Scholar
- Nimesha Wickramasinghe, Mohamed Nabeel, Kenneth Thilakaratne, Chamath Keppitiyagama, and Kasun De Zoysa. 2021. hrefhttps://arxiv.org/abs/2111.00142Uncovering IP Address Hosting Types Behind Malicious Websites. CoRR , Vol. abs/2111.00142 (2021). ioGoogle Scholar
Index Terms
- Entangled Clouds: Measuring the Hosting Infrastructure of the Free Contents Web
Recommendations
The Infrastructure Utilization of Free Contents Websites Reveal Their Security Characteristics
Computational Data and Social NetworksAbstractFree Content Websites (FCWs) are a significant element of the Web, and realizing their use is essential. This study analyzes FCWs worldwide by studying how they correlate with different network sizes, cloud service providers, and countries, ...
Using Free Software for Elastic Web Hosting on a Private Cloud
Even though public cloud providers already exist and offer computing and storage services, cloud computing is still a buzzword for scientists in various fields such as engineering, finance, social sciences, etc. These technologies are currently mature ...
Managing appliance launches in infrastructure clouds
TG '11: Proceedings of the 2011 TeraGrid Conference: Extreme Digital DiscoveryInfrastructure cloud computing introduces a significant paradigm shift that has the potential to revolutionize how scientific computing is done. However, while it is actively adopted by a number of scientific communities, it is still lacking a well-...
Comments