Skip to main content
Log in

Online Tor Privacy Breach Through Website Fingerprinting Attack

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

Tor is one of the most widely used anonymization networks based on onion router that it preserves user’s privacy and secure data flow over the Internet communications. Due to the growing utilization of Tor, Identifying its weaknesses and fixing them is crucial. This study focuses on the website fingerprinting attack and offers a new procedure based on FFT to calculate the similarity distance between two instances and form a distance matrix. By applying the proposed method, we demonstrate that either accuracy grows significantly or the time complexity reduces such that it is applicable in an online manner. In order to evaluate the capability of the proposed method to defeat user privacy, we applied it in an open-world scenario for 100 target websites and achieved a TP rate of over 96%, while the FP rate is 0%, compared to the best of 85% TP rate with a FP rate of 0.6% in the existing works. In a closed-world scenario, we attained an accuracy of over 97% that compared to a similar study, it shows a meaningful improvement. In addition, a new model based on the combination of open and closed-world scenarios is also presented. In this model, the time complexity of the preprocessing and visited website detection stages are reduced by a factor of 60 and 465, times respectively, compared to previous studies. By using this model, it is possible to manage the detection procedure in an online process, providing an update mechanism for distance matrix in case of immediate variations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. In this paper, by "website" we mean a single page with a specified address (i.e. a webpage with a certain URL).

  2. Optimal string alignment distance.

  3. Damerau Levenshtein Distance.

  4. A measurement of the similarity between an instance and its shifted-versions, for different shift values.

  5. Fast Fourier Transform.

  6. Safe web is an encrypted web proxy that discontinued in late 2001.

  7. Naïve Bayes.

  8. Multinomial Naïve Bayes.

  9. Support Vector Machine.

  10. Redial Basis Function.

  11. One Class SVM.

  12. Possible to determine when websites start and end.

  13. Ability to distinguish between the traffic of different websites which may occur sequentially or even in parallel.

References

  1. Dingledine, R., Mathewson, N., Syverson, P.: Tor: the second-generation onion router. In: DTIC Document (2004)

  2. Dingledine, R., Mathewson, N., Syverson, P.: Tor: anonymity online, ed (2008)

  3. Levine, B.N., Reiter, M.K., Wang, C., Wright, M.: Timing attacks in low-latency mix systems. In: International Conference on Financial Cryptography, pp. 251–265 (2004)

  4. Bauer, K., McCoy, D., Grunwald, D., Kohno, T., Sicker, D.: Low-resource routing attacks against tor. In: Proceedings of the 2007 ACM Workshop on Privacy in Electronic Society, pp. 11–20 (2007)

  5. Murdoch, S.J., Zieliński, P.: Sampled traffic analysis by internet-exchange-level adversaries. In: International Workshop on Privacy Enhancing Technologies, pp. 167–183 (2007)

  6. Dyer, K.P., Coull, S.E., Ristenpart, T., Shrimpton, T.: Peek-a-boo, i still see you: Why efficient traffic analysis countermeasures fail. In: 2012 IEEE Symposium on Security and Privacy, pp. 332–346 (2012)

  7. Wang, T., Goldberg, I.: Improved website fingerprinting on tor. In: Proceedings of the 12th ACM Workshop on Workshop on Privacy in the Electronic Society, pp. 201–212 (2013)

  8. Sun, Q., Simon, D.R., Wang, Y.-M., Russell, W., Padmanabhan, V.N., Qiu, L.: Statistical identification of encrypted web browsing traffic. In: Security and Privacy on IEEE Symposium, pp. 19–30 (2002)

  9. Hintz, A.: Fingerprinting websites using traffic analysis. In: International Workshop on Privacy Enhancing Technologies, pp. 171–178 (2002)

  10. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., et al.: Hypertext transfer protocol–HTTP/1.1, pp. 2070-1721 (1999)

  11. Liberatore, M., Levine, B.N.: Inferring the source of encrypted HTTP connections. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, pp. 255–263 (2006)

  12. Herrmann, D., Wendolsky, R., Federrath, H.: Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier. In: Proceedings of the 2009 ACM Workshop on Cloud Computing Security, pp. 31–42 (2009)

  13. Shi, Y., Matsuura, K.: Fingerprinting attack on the tor anonymity system. In: International Conference on Information and Communications Security, pp. 425–438 (2009)

  14. Panchenko, A., Niessen, L., Zinnen, A., Engel, T.: Website fingerprinting in onion routing based anonymization networks. In: Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society, pp. 103–114 (2011)

  15. Cai, X., Zhang, X.C., Joshi, B., Johnson, R.: Touching from a distance: Website fingerprinting attacks and defenses. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 605–616 (2012)

  16. Wang, T., Cai, X., Nithyanand, R., Johnson, R., Goldberg, I.: Effective attacks and provable defenses for website fingerprinting. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 143–157 (2014)

  17. Al-Naami, K., Chandra, S., Mustafa, A., Khan, L., Lin, Z., Hamlen, K., et al.: Adaptive encrypted traffic fingerprinting with bi-directional dependence. In: Proceedings of the 32nd Annual Conference on Computer Security Applications, pp. 177–188 (2016)

  18. He, G., Yang, M., Gu, X., Luo, J., Ma, Y.: A novel active website fingerprinting attack against Tor anonymous system. In: Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the 2014 IEEE 18th International Conference on, pp. 112–117 (2014)

  19. Gu, X., Yang, M., Luo, J.: A novel Website Fingerprinting attack against multi-tab browsing behavior. In: IEEE 19th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 234–239 (2015)

  20. Jahani, H., Jalili, S.: A novel passive website fingerprinting attack on tor using fast fourier transform. Comput. Commun. 96, 43–51 (2016)

    Article  Google Scholar 

  21. Vapnik, V.N., Chervonenkis, A.J.: Theory of pattern recognition (in Russian). Nauka (1974)

  22. Wang, T., Goldberg, I.: On realistically attacking Tor with website fingerprinting. Proc. Priv. Enhanc. Technol. 2016, 21–36 (2016)

    Article  Google Scholar 

  23. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 27 (2011)

    Google Scholar 

  24. Tsang, I.W., Kwok, J.T., Cheung, P.-M.: Core vector machines: fast SVM training on very large data sets. J. Mach. Learn. Res. 6, 363–392 (2005)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeed Jalili.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

figure a
figure b

Appendix 2

To assess the WF attack in open-world scenario, we use the dataset of [15, 16]. The required dataset for the limited number of monitoring websites (2, 4, 8, 16, 32, 64, 100) with known instances are created among unknown Instances (i.e., minimum 1600 and maximum 32,000 unknown instances). In Series A and B, the number of supposed instances for each website is 40 instances.

2.1 Series A (the number of non-target websites are 400)

  • DSA-4: The dataset with 160 Known Instances (of 4 websites) against 2454 unknown instances (from 60 websites).

  • DSA-8: The dataset with 320 Known Instances (of 8 websites) against 16108 unknown Instances

  • DSA-16: The dataset with 640 Known Instances (of 16 websites) against 16217 unknown Instances

  • DSA-32:The dataset with 1280 Known Instances (of 32 websites) against 16435 unknown Instances

  • DSA-64: The dataset with 2560 Known Instances (of 64 websites) against 16871 unknown Instances

2.2 Series B (the number of non-target websites are 800)

  • DSB-4: The dataset with 160 Known Instances (of 4 websites) against 3454 unknown Instances (from 85 websites).

  • DSB-8: The dataset with 320 Known Instances (of 8 websites) against 32109 unknown Instances.

  • DSB-16: The dataset with 640 Known Instances (of 16 websites) against 32218 unknown Instances.

  • DSB-32: The dataset with 1280 Known Instances (of 32 websites) against 32435 unknown Instances.

  • DSB-64: The dataset with 2560 Known Instances (of 64 websites) against 32871 unknown Instances.

2.3 Series C (Our approach to create dataset with limited number of unknown instances)

To assess the WF attack in open-world scenario, in terms of the number of unknown instances per class are controlled and limited, we introduce two datasets in accordance with the parameters of Table 8 that \({\text{S}}_{\text{T}} = \# \left( {N \times n_{q} + \left( {0.33 \times m \times n_{p} } \right)} \right)\) indicates the number of test instances.

Table 8 Parameters values and the number of required unknown instances

2.4 Series D (Our approach to create dataset with different number of non-target websites)

By using relation (7), the number of known instances required for different amount of \(m\) is provided in Table 9. In this Table, value of \({\text{n}}\) is equal to 38, also \({\text{X}}\) is the total number of unknown instances and \({\text{S}}_{\text{T}} = \# \left( {X + \left( {0.33 \times m \times n} \right)} \right)\) that indicate number of test instances.

Table 9 parameters values and the number of Known instances needed for different N
  • DSD-2: The dataset with 76 Known Instances (of 2 websites) against 939 unknown Instances

  • DSD-4: The dataset with 104 Known Instances (of 4 websites) against 2176 unknown Instances.

  • DSD-8: The dataset with 217 Known Instances (of 8 websites) against 15287 unknown Instances.

  • DSD-16: The dataset with 440 Known Instances (of 16 websites) against 22968 unknown Instances.

  • DSD-32: The dataset with 815 Known Instances (of 32 websites) against 31104 unknown Instances.

  • DSD-64: The dataset with 1571 Known Instances (of 64 websites) against 31104 unknown Instances.

  • DSD-100: The dataset with 2503 Known Instances (of 100 websites) against 31104 unknown Instances.

To build the above datasets, the class label of all known instances is “1” and the class label of all unknown instances is “?”.

Appendix 3

3.1 Series E (the number of target websites are 100)

To assess the WF attack in the closed-world scenario, wang’s dataset [7, 16] is used which is symbolized as follows.

  • DSE-1: The dataset constructed and used by Wang and Goldberg [7, 16].

  • DSE-2: The dataset based on the dataset of Cai et al. and improved by Wang and Goldberg [7, 16].

The above datasets are produced to eight datasets with the following specifications.

  • Type I: the data collected from the traffic between the user and the first relay by removing all ACK packets (i.e., without packets of size 52).

  • Type II: Type I dataset where their packet size is rounded to 600.

  • Type III: Tor cell sequences extracted from Type I dataset.

  • Type IV: Type III dataset where all SENDME cells are removed

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jahani, H., Jalili, S. Online Tor Privacy Breach Through Website Fingerprinting Attack. J Netw Syst Manage 27, 289–326 (2019). https://doi.org/10.1007/s10922-018-9466-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10922-018-9466-z

Keywords

Navigation