ABSTRACT
Domain-based top lists such as the Alexa Top 1M strive to portray the popularity of web domains. Even though their shortcomings (e.g., instability, no aggregation, lack of weights) have been pointed out, domain-based top lists still are an important element of Internet measurement studies.
In this paper we present the concept of prefix top lists, which ameliorate some of the shortcomings, while providing insights into the importance of addresses of domain-based top lists. With prefix top lists we aggregate domain-based top lists into network prefixes and apply a Zipf distribution to assign weights to each prefix. In our analysis we find that different domain-based top lists provide differentiated views on Internet prefixes. In addition, we observe very small weight changes over time. We leverage prefix top lists to conduct an evaluation of the DNS to classify the deployment quality of domains. We show that popular domains adhere to name server recommendations for IPv4, but IPv6 compliance is still lacking. Finally, we provide these enhanced and more stable prefix top lists to fellow researchers which can use them to obtain more representative measurement results.
- Lada A Adamic and Bernardo A Huberman. 2002. Zipf's law and the Internet. Glottometrics 3, 1 (2002), 143--150.Google Scholar
- Alexa. May 13, 2019. Top 1M sites. https://www.alexa.com/topsites. http://s3.dualstack.us-east-1.amazonaws.com/alexa-static/top-1m.csv.zip.Google Scholar
- Alexa. May 13, 2019. What's going on with my Alexa Rank? https://support.alexa.com/hc/en-us/articles/200449614.Google Scholar
- Mark Allman. 2018. Comments On DNS Robustness. In Proceedings of the Internet Measurement Conference 2018. ACM. https://doi.org/10.1145/3278532.3278541Google Scholar
- Mark Allman and Vern Paxson. 2007. Issues and Etiquette Concerning Use of Shared Measurement Data. In Proceedings of the Internet Measurement Conference 2007. ACM. https://doi.org/10.1145/1298306.1298327Google Scholar
- Tim Berners-Lee. 1998. The Fractal nature of the Web. http://edshare.soton.ac.uk/392/3/DesignIssues/Fractal.html.Google Scholar
- Stéphane Bortzmeyer. 2016. DNS Query Name Minimisation to Improve Privacy. RFC 7816 (Experimental). https://doi.org/10.17487/RFC7816Google Scholar
- Cisco. May 13, 2019. Umbrella Top 1M List. https://umbrella.cisco.com/blog/blog/2016/12/14/cisco-umbrella-1-million/.Google Scholar
- David Dittrich, Erin Kenneally, et al. 2012. The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research. US Department of Homeland Security (2012).Google Scholar
- Robert Elz, Randy Bush, Scott Bradner, and Michael Patton. 1997. Selection and Operation of Secondary DNS Servers. RFC 2182 (Best Current Practice). https://doi.org/10.17487/RFC2182Google Scholar
- Oliver Gasser, Quirin Scheitle, Pawel Foremski, Qasim Lone, Maciej Korczynski, Stephen D. Strowes, Luuk Hendriks, and Georg Carle. 2018. Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists. In Proceedings of the Internet Measurement Conference 2018. ACM. https://doi.org/10.1145/3278532.3278564Google ScholarDigital Library
- Oliver Gasser, Quirin Scheitle, Sebastian Gebhard, and Georg Carle. 2016. Scanning the IPv6 Internet: Towards a Comprehensive Hitlist. In Proceedings of the Traffic Monitoring and Analysis Workshop 2016.Google Scholar
- Jeremy Kepner, Kenjiro Cho, and KC Claffy. 2019. New Phenomena in Large-Scale Internet Traffic. arXiv:cs.NI/1904.04396Google Scholar
- Serge A Krashakov, Anton B Teslyuk, and Lev N Shchur. 2006. On the universality of rank distributions of website popularity. Computer Networks 50, 11 (2006), 1769--1780.Google ScholarDigital Library
- Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński, and Wouter Joosen. 2019. Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation. In Proceedings of the Network and Distributed System Security Symposium 2019. Internet Society.Google ScholarCross Ref
- Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński, and Wouter Joosen. May 13, 2019. Tranco List. https://tranco-list.eu/.Google Scholar
- Majestic. May 13, 2019. The Majestic Million. https://majestic.com/reports/majestic-million/.Google Scholar
- University of Oregon. 2019. Route Views Project. http://www.routeviews.orgGoogle Scholar
- Craig Partridge and Mark Allman. 2016. Ethical Considerations in Network Measurement Papers. Commun. ACM (2016). https://doi.org/10.1145/2896816Google Scholar
- Walter Rweyemamu, Tobias Lauinger, Christo Wilson, William Robertson, and Engin Kirda. 2019. Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research. In Proceedings of the Passive and Active Measurement Conference 2019.Google ScholarDigital Library
- Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. 2018. A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists. In Proceedings of the Internet Measurement Conference 2018. ACM. https://doi.org/10.1145/3278532.3278574Google ScholarDigital Library
Index Terms
- Prefix Top Lists: Gaining Insights with Prefixes from Domain-based Top Lists on DNS Deployment
Recommendations
Prefix and Suffix Reversals on Strings
SPIRE 2015: Proceedings of the 22nd International Symposium on String Processing and Information Retrieval - Volume 9309The Sorting by Prefix Reversals problem consists in sorting the elements of a given permutation $$\pi $$ with a minimum number of prefix reversals, i.e. reversals that always imply the leftmost element of $$\pi $$. A natural extension of this problem is ...
Bounding prefix transposition distance for strings and permutations
A transposition is an operation that exchanges two adjacent substrings. Transpositions over permutations, the sequences with no repeated symbols, are related to genome rearrangements. If one of the substrings is restricted to a prefix then it is called ...
On prefix normal words and prefix normal forms
A 1-prefix normal word is a binary word with the property that no factor has more 1s than the prefix of the same length; a 0-prefix normal word is defined analogously. These words arise in the context of indexed binary jumbled pattern matching, where ...
Comments