Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research

Rweyemamu, Walter; Lauinger, Tobias; Wilson, Christo; Robertson, William; Kirda, Engin

doi:10.1007/978-3-030-15986-3_11

Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research

Walter Rweyemamu¹⁶,
Tobias Lauinger¹⁶,
Christo Wilson¹⁶,
William Robertson¹⁶ &
…
Engin Kirda¹⁶

Conference paper
First Online: 13 March 2019

1750 Accesses
13 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 11419))

Abstract

Top domain rankings (e.g., Alexa) are commonly used in security research, such as to survey security features or vulnerabilities of “relevant” websites. Due to their central role in selecting a sample of sites to study, an inappropriate choice or use of such domain rankings can introduce unwanted biases into research results. We quantify various characteristics of three top domain lists that have not been reported before. For example, the weekend effect in Alexa and Umbrella causes these rankings to change their geographical diversity between the workweek and the weekend. Furthermore, up to 91% of ranked domains appear in alphabetically sorted clusters containing up to 87k domains of presumably equivalent popularity. We discuss the practical implications of these findings, and propose novel best practices regarding the use of top domain lists in the security community.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alexa top 1 million download. http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
Amazon Alexa top sites. https://www.alexa.com/topsites
Are there known biases in Alexa’s traffic data? https://support.alexa.com/hc/en-us/articles/200461920-Are-there-known-biases-in-Alexa-s-traffic-data-
Cisco Umbrella top 1 million. https://s3-us-west-1.amazonaws.com/umbrella-static/index.html
How are Alexa’s traffic rankings determined? https://support.alexa.com/hc/en-us/articles/200449744-How-are-Alexa-s-traffic-rankings-determined-
Majestic million. https://majestic.com/reports/majestic-million
Quantcast top websites. https://www.quantcast.com/top-sites/
Symantec BlueCoat WebPulse site review. https://sitereview.bluecoat.com/
Alrwais, S., et al.: Under the shadow of sunshine: understanding and detecting bulletproof hosting on legitimate service provider networks. In: Security and Privacy Symposium (2017)
Google Scholar
Bilge, L., Kirda, E., Kruegel, C., Balduzzi, M.: EXPOSURE: finding malicious domains using passive DNS analysis. In: NDSS (2011)
Google Scholar
Chen, Q.A., Osterweil, E., Thomas, M., Mao, Z.M.: MitM attack by name collision: cause analysis and vulnerability assessment in the new gTLD era. In: Security and Privacy Symposium (2016)
Google Scholar
Chen, Q.A., et al.: Client-side name collision vulnerability in the new gTLD era: a systematic study. In: CCS (2017)
Google Scholar
Durumeric, Z., Kasten, J., Bailey, M., Halderman, J.A.: Analysis of the HTTPS certificate ecosystem. In: IMC (2013)
Google Scholar
Englehardt, S., Narayanan, A.: Online tracking: a 1-million-site measurement and analysis. In: CCS (2016)
Google Scholar
Heiderich, M., Frosch, T., Holz, T.: IceShield: detection and mitigation of malicious websites with a frozen DOM. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 281–300. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_15
Chapter Google Scholar
Hubbard, D.: Cisco Umbrella 1 million (2016). https://umbrella.cisco.com/blog/2016/12/14/cisco-umbrella-1-million/
Jones, D.: Majestic million CSV now free for all, daily (2012). https://blog.majestic.com/development/majestic-million-csv-daily/
Larisch, J., Choffnes, D., Levin, D., Maggs, B.M., Mislove, A., Wilson, C.: CRLite: a scalable system for pushing all TLS revocations to all browsers. In: Security and Privacy Symposium (2017)
Google Scholar
Lauinger, T., Chaabane, A., Arshad, S., Robertson, W., Wilson, C., Kirda, E.: Thou Shalt not depend on me: analysing the use of outdated JavaScript libraries on the Web. In: NDSS (2017)
Google Scholar
Le Pochat, V., van Goethem, T., Tajalizadehkhoob, S., Korczynski, M., Joosen, W.: Rigging research results by manipulating top websites rankings. In: NDSS (2019)
Google Scholar
Lee, S., Kim, J.: WarningBird: detecting suspicious URLs in Twitter stream. In: NDSS (2011)
Google Scholar
Lever, C., Kotzias, P., Balzarotti, D., Caballero, J., Antonakakis, M.: A lustrum of malware network communication: evolution and insights. In: Security and Privacy Symposium (2017)
Google Scholar
Lever, C., Walls, R.J., Nadji, Y., Dagon, D., McDaniel, P., Antonakakis, M.: Domain-Z: 28 registrations later. In: Security and Privacy Symposium (2016)
Google Scholar
Li, Z., Zhang, K., Xie, Y., Yu, F., Wang, X.: Knowing your enemy: understanding and detecting malicious web advertising. In: CCS (2012)
Google Scholar
Lo, B.W.N., Sedhain, R.S.: How reliable are website rankings? Implications for e-business advertising and internet search. Issues Inf. Syst. 7(2), 233–238 (2006)
Google Scholar
Nadji, Y., Antonakakis, M., Perdisci, R., Lee, W.: Connected colors: unveiling the structure of criminal networks. In: Stolfo, S.J., Stavrou, A., Wright, C.V. (eds.) RAID 2013. LNCS, vol. 8145, pp. 390–410. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41284-4_20
Chapter Google Scholar
Pearce, P., Ensafi, R., Li, F., Feamster, N., Paxson, V.: Augur: internet-wide detection of connectivity disruptions. In: Security and Privacy Symposium (2017)
Google Scholar
Pitsillidis, A., Kanich, C., Voelker, G.M., Levchenko, K., Savage, S.: Taster’s choice: a comparative analysis of spam feeds. In: IMC (2012)
Google Scholar
Felt, A.P., Barnes, R., King, A., Palmer, C., Bentzel, C., Tabriz, P.: Measuring HTTPS adoption on the Web. In: USENIX Security (2017)
Google Scholar
Scheitle, Q., t al.: A long way to the top: significance, structure, and stability of internet top lists. In: IMC (2018)
Google Scholar
Scheitle, Q., Jelten, J., Hohlfeld, O., Ciprian, L., Carle, G.: Structure and stability of internet top lists. In: eprint arXiv:1802.02651 [cs.NI] (2018)
Starov, O., Nikiforakis, N.: XHOUND: quantifying the fingerprintability of browser extensions. In: Security and Privacy Symposium (2017)
Google Scholar

Download references

Acknowledgements

This work was supported by Secure Business Austria and the National Science Foundation under grants CNS-1563320, CNS-1703454, and IIS-1553088.

Author information

Authors and Affiliations

Northeastern University, Boston, MA, USA
Walter Rweyemamu, Tobias Lauinger, Christo Wilson, William Robertson & Engin Kirda

Authors

Walter Rweyemamu
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Lauinger
View author publications
You can also search for this author in PubMed Google Scholar
Christo Wilson
View author publications
You can also search for this author in PubMed Google Scholar
William Robertson
View author publications
You can also search for this author in PubMed Google Scholar
Engin Kirda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Walter Rweyemamu .

Editor information

Editors and Affiliations

Northeastern University, Boston, MA, USA
David Choffnes
Federal University of Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
Marinho Barcellos

Appendix

Table 4. Top 10 domains on Wed. 4 and Sun. 8 April 2018 in Alexa and Umbrella.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rweyemamu, W., Lauinger, T., Wilson, C., Robertson, W., Kirda, E. (2019). Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research. In: Choffnes, D., Barcellos, M. (eds) Passive and Active Measurement. PAM 2019. Lecture Notes in Computer Science(), vol 11419. Springer, Cham. https://doi.org/10.1007/978-3-030-15986-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-15986-3_11
Published: 13 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15985-6
Online ISBN: 978-3-030-15986-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation