Abstract
The exploration and analysis of Web graphs has flourished in the recent past, producing a large number of relevant and interesting research results. However, the unique characteristics of the Tor network limit the applicability of standard techniques and demand for specific algorithms to explore and analyze it. The attention of the research community has focused on assessing the security of the Tor infrastructure (i.e., its ability to actually provide the intended level of anonymity) and on discussing what Tor is currently being used for. Since there are no foolproof techniques for automatically discovering Tor hidden services, little or no information is available about the topology of the Tor Web graph. Even less is known on the relationship between content similarity and topological structure. The present article aims at addressing such lack of information. Among its contributions: a study on automatic Tor Web exploration/data collection approaches; the adoption of novel representative metrics for evaluating Tor data; a novel in-depth analysis of the hidden services graph; a rich correlation analysis of hidden services’ semantics and topology. Finally, a broad interesting set of novel insights/considerations over the Tor Web organization and content are provided.
- Daniel Arp, Fabian Yamaguchi, and Konrad Rieck. 2015. Torben: A practical side-channel attack for deanonymizing Tor communication. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security (ASIACCS’15). ACM, New York, 597--602. DOI:http://dx.doi.org/10.1145/2714576.2714627 Google ScholarDigital Library
- Monica J. Barrat. 2012. Silk road: Ebay for drugs. Addiction 107, 3 (2012), 683--683. DOI:http://dx.doi.org/10.1111/j.1360-0443.2011.03709.x Google ScholarCross Ref
- Kevin Bauer, Micah Sherr, Damon McCoy, and Dirk Grunwald. 2011. ExperimenTor: A testbed for safe and realistic Tor experimentation. In Proceedings of the Workshop on Cyber Security Experimentation and Test (CSET’11).Google Scholar
- Massimo Bernaschi, Giancarlo Carbone, and Flavio Vella. 2016. Scalable betweenness centrality on multi-GPU systems. In Proceedings of the ACM International Conference on Computing Frontiers (CF’16). ACM, New York, 29--36. DOI:http://dx.doi.org/10.1145/2903150.2903153 Google ScholarDigital Library
- Alex Biryukov, Ivan Pustogarov, Fabrice Thill, and Ralf-Philipp Weinmann. 2014. Content and popularity analysis of Tor hidden services. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW’14). 188--193. DOI:http://dx.doi.org/10.1109/ICDCSW.2014.20 Google ScholarDigital Library
- Alex Biryukov, Ivan Pustogarov, and Ralf-Philipp Weinmann. 2013. Trawling for Tor hidden services: Detection, measurement, deanonymization. In Proceedings of the Symposium on Security and Privacy (SP’13). IEEE Computer Society, Washington, DC, 80--94. DOI:http://dx.doi.org/10.1109/SP.2013.15 Google ScholarDigital Library
- Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. 2014. BUbiNG: Massive crawling for the masses. In Proceedings of the 23rd International Conference on World Wide Web Companion. 227--228. Google ScholarDigital Library
- Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: Compression techniques. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, New York, 595--602. DOI:http://dx.doi.org/10.1145/988672.988752 Google ScholarDigital Library
- Phillip Bonacich. 2007. Some unique properties of eigenvector centrality. Soc. Netw. 29, 4 (2007), 555--564. DOI:http://dx.doi.org/10.1016/j.socnet.2007.04.002 Google ScholarCross Ref
- Anthony Bonato. 2005. A survey of models of the web graph. In Combinatorial and Algorithmic Aspects of Networking, Alejandro Lopez-Ortiz and Angle M. Hamel (Eds.). Lecture Notes in Computer Science, Vol. 3405. Springer, Berlin, 159--172. DOI:http://dx.doi.org/10.1007/11527954_16 Google ScholarDigital Library
- Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Comput. Netw. 33, 16 (2000), 309--320. DOI:http://dx.doi.org/10.1016/S1389-1286(00)00083-9 Google ScholarDigital Library
- Soumen Chakrabarti, Amit Pathak, and Manish Gupta. 2011. Index design and query processing for graph conductance search. VLDB J. 20, 3 (June 2011), 445--470. DOI:http://dx.doi.org/10.1007/s00778-010-0204-8 Google ScholarDigital Library
- Francisco Claude and Susana Ladra. 2011. Practical representations for web and social graphs. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, 1185--1190. DOI:http://dx.doi.org/10.1145/2063576.2063747 Google ScholarDigital Library
- Francisco Claude and Gonzalo Navarro. 2010. Fast and compact web graph representations. ACM Trans. Web, 4, Article 16 (Sept. 2010), 31 pages. DOI:http://dx.doi.org/10.1145/1841909.1841913 Google ScholarDigital Library
- Devanshu Dhyani, Wee Keong Ng, and Sourav S. Bhowmick. 2002. A survey of web metrics. ACM Comput. Surv. 34, 4 (Dec. 2002), 469--503. DOI:http://dx.doi.org/10.1145/592642.592645 Google ScholarDigital Library
- Roger Dingledine, Nick Mathewson, and Paul Syverson. 2004. Tor: The second-generation onion router. In Proceedings of the 13th Usenix Security Symposium. Google ScholarCross Ref
- Paul Erdős and Alfréd Rényi. 1959. On random graphs. Publicat. Mathemat. Debrec. 6 (1959), 290--297.Google Scholar
- Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner. 2014. Web data extraction, applications and techniques: A survey. Knowl.-Based Syst. 70 (2014), 301--323. DOI:http://dx.doi.org/10.1016/j.knosys.2014.07.007 Google ScholarDigital Library
- Gary William Flake, Steve Lawrence, C. Lee Giles, and Frans M. Coetzee. 2002. Self-organization and identification of web communities. IEEE Comput. 35 (2002), 66--71. Google ScholarDigital Library
- Massimo Franceschet. 2011. PageRank: Standing on the shoulders of giants. Commun. ACM 54, 6 (June 2011), 92--101. DOI:http://dx.doi.org/10.1145/1953122.1953146 Google ScholarDigital Library
- Christos Giatsidis, Fragkiskos D. Malliaros, and Michalis Vazirgiannis. 2013. Advanced graph mining for community evaluation in social networks and the web. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM’13). ACM, New York, 771--772. DOI:http://dx.doi.org/10.1145/2433396.2433495 Google ScholarDigital Library
- Evgeniy A. Grechnikov. 2012. Degree distribution and number of edges between nodes of given degrees in the buckleyosthus model of a random web graph. Internet Math. 8, 3 (2012), 257--287. DOI:http://dx.doi.org/10.1080/15427951.2011.646176 Google ScholarCross Ref
- Rob Jansen, Kevin Bauer, Nicholas Hopper, and Roger Dingledine. 2012. Methodically modeling the Tor network. In Proceedings of the 5th USENIX Conference on Cyber Security Experimentation and Test (CSET’12). USENIX Association, Berkeley, CA, 8--8. Retrieved from http://dl.acm.org/citation.cfm?id=2372336.2372347Google ScholarDigital Library
- Rob Jansen and Nicholas Hopper. 2012. Shadow: Running Tor in a box for accurate and efficient experimentation. In Proceedings of the 19th Symposium on Network and Distributed System Security (NDSS’12). Internet Society.Google Scholar
- Jon M. Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew S. Tomkins. 1999. The web as a graph: Measurements, models, and methods. In Computing and Combinatorics. Lecture Notes in Computer Science, Vol. 1627. Springer, Berlin, 1--17. DOI:http://dx.doi.org/10.1007/3-540-48686-0_1 Google ScholarCross Ref
- Raymond Kosala and Hendrik Blockeel. 2000. Web mining research: A survey. SIGKDD Explor. Newsl. 2, 1 (June 2000), 1--15. DOI:http://dx.doi.org/10.1145/360402.360406 Google ScholarDigital Library
- Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2010. Structure and evolution of online social networks. In Link Mining: Models, Algorithms, and Applications, Philip S. Yu, Jiawei Han, and Christos Faloutsos (Eds.). Springer, New York, 337--357. DOI:http://dx.doi.org/10.1007/978-1-4419-6515-8_13 Google ScholarCross Ref
- Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D. Sivakumar, Andrew Tomkins, and Eli Upfal. 2000. Stochastic models for the web graph. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science. 57--65. DOI:http://dx.doi.org/10.1109/SFCS.2000.892065 Google ScholarCross Ref
- Damon McCoy, Kevin Bauer, Dirk Grunwald, Tadayoshi Kohno, and Douglas Sicker. 2008. Shining light in dark places: Understanding the Tor network. In Privacy Enhancing Technologies. LNCS, Vol. 5134. Springer, Berlin, 63--76. DOI:http://dx.doi.org/10.1007/978-3-540-70630-4_5 Google ScholarDigital Library
- Mark E. J. Newman. 2003. The structure and function of complex networks. SIAM Rev. 45, 2 (2003), 167--256. Google ScholarDigital Library
- Gareth Owen and Nick Savage. 2016. Empirical analysis of Tor hidden services. IET Info. Sec. 10, 3 (2016), 113--118. Google ScholarCross Ref
- Mike Perry. 2009. Torflow: Tor network analysis. Retrieved from http://fscked.org/talks/ TorFlow-HotPETS-final.pdf.Google Scholar
- Dimitrios Prountzos and Keshav Pingali. 2013. Betweenness centrality: Algorithms and implementations. SIGPLAN Not. 48, 8 (Feb 2013), 35--46. DOI:http://dx.doi.org/10.1145/2517327.2442521 Google ScholarDigital Library
- Robin Snader and Nikita Borisov. 2011. Improving security and performance in the Tor network through tunable path selection. IEEE Trans. Depend. Secure Comput. 8, 5 (2011), 728--741. Google ScholarDigital Library
- Robin Snader et al. 2008. A Tune-up for Tor: Improving Security and Performance in the Tor Network. Retrieved from https://www.internetsociety.org/doc/tune-tor-improving-security-and-per formance-tor-network-paper.Google Scholar
- Kyle Soska and Nicolas Christin. 2015. Measuring the longitudinal evolution of the online anonymous marketplace ecosystem. In Proceedings of the 24th USENIX Security Symposium (USENIX Security’15), Washington, D.C., 33--48.Google ScholarDigital Library
- Martijn Spitters, Stefan Verbruggen, and Mark van Staalduinen. 2014. Towards a comprehensive insight into the thematic organization of the tor hidden services. In Proceedings of the Intelligence and Security Informatics Conference (JISIC’14), 220--223. DOI:http://dx.doi.org/10.1109/JISIC.2014.40 Google ScholarDigital Library
- Flavio Vella, Giancarlo Carbone, and Massimo Bernaschi. 2016. Algorithms and heuristics for scalable betweenness centrality computation on multi-GPU systems. CoRR abs/1602.00963 (2016). Retrieved from http://arxiv.org/abs/1602.00963.Google Scholar
- Zachary Weinberg, Jeffrey Wang, Vinod Yegneswaran, Linda Briesemeister, Steven Cheung, Frank Wang, and Dan Boneh. 2012. StegoTorus: A camouflage proxy for the tor anonymity system. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (CCS’12). ACM, New York, 109--120. DOI:http://dx.doi.org/10.1145/2382196.2382211 Google ScholarDigital Library
Index Terms
- Exploring and Analyzing the Tor Hidden Services Graph
Recommendations
Spiders like Onions: on the Network of Tor Hidden Services
WWW '19: The World Wide Web ConferenceTor hidden services allow offering and accessing various Internet resources while guaranteeing a high degree of provider and user anonymity. So far, most research work on the Tor network aimed at discovering protocol vulnerabilities to de-anonymize ...
Trawling for Tor Hidden Services: Detection, Measurement, Deanonymization
SP '13: Proceedings of the 2013 IEEE Symposium on Security and PrivacyTor is the most popular volunteer-based anonymity network consisting of over 3000 volunteer-operated relays. Apart from making connections to servers hard to trace to their origin it can also provide receiver privacy for Internet services through a ...
Improving the Privacy of Tor Onion Services
Applied Cryptography and Network SecurityAbstractOnion services enable bidirectional anonymity for parties that communicate over the Tor network, thus providing improved privacy properties compared to standard TLS connections. Since these services are designed to support server-side anonymity, ...
Comments