SmartSwitch: Efficient Traffic Obfuscation Against Stream Fingerprinting

Li, Haipeng; Niu, Ben; Wang, Boyang

doi:10.1007/978-3-030-63086-7_15

SmartSwitch: Efficient Traffic Obfuscation Against Stream Fingerprinting

Haipeng Li²⁰,
Ben Niu²¹ &
Boyang Wang²⁰

Conference paper
First Online: 12 December 2020

1064 Accesses
2 Citations

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 335))

Abstract

In stream fingerprinting, an attacker can compromise user privacy by leveraging side-channel information (e.g., packet size) of encrypted traffic in streaming services. By taking advantages of machine learning, especially neural networks, an adversary can reveal which YouTube video a victim watches with extremely high accuracy. While effective defense methods have been proposed, extremely high bandwidth overheads are needed. In other words, building an effective defense with low overheads remains unknown. In this paper, we propose a new defense mechanism, referred to as SmartSwitch, to address this open problem. Our defense intelligently switches the noise level on different packets such that the defense remains effective but minimizes overheads. Specifically, our method produces higher noises to obfuscate the sizes of more significant packets. To identify which packets are more significant, we formulate it as a feature selection problem and investigate several feature selection methods over high-dimensional data. Our experimental results derived from a large-scale dataset demonstrate that our proposed defense is highly effective against stream fingerprinting built upon Convolutional Neural Networks. Specifically, an adversary can infer which YouTube video a user watches with only 1% accuracy (same as random guess) even if the adversary retrains neural networks with obfuscated traffic. Compared to the state-of-the-art defense, our mechanism can save nearly 40% of bandwidth overheads.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

NNI: An open source AutoML toolkit for neural architecture search and hyper-parameter tuning. https://github.com/Microsoft/nni
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5, 537–550 (1994)
Article Google Scholar
Bennasar, M., Hicks, Y., Setchi, R.: Feature selection using joint mutual information maximisation. Exp. Syst. Appl. 42, 8520–8532 (2015)
Article Google Scholar
Brown, G., Pocock, A., Zhao, M.J., Lujan, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)
MathSciNet MATH Google Scholar
Dubin, R., Dvir, A., Hadar, O., Pele, O.: I know what you saw last minute – the Chrome browser case. In: Black Hat Europe (2016)
Google Scholar
Dyer, K.P., Coull, S.E., Ristenpart, T., Shrimpton, T.: Peek-a-Boo, I still see you: why efficient traffic analysis countermeasures fail. In: Proceedings of IEEE S&P’12 (2012)
Google Scholar
Juarez, M., Imani, M., Perry, M., Diaz, C., Wright, M.: Toward an efficient website fingerprinting defense. In: Proceedings of ESORICS 2016 (2016)
Google Scholar
Kennedy, S., Li, H., Wang, C., Liu, H., Wang, B., Sun, W.: I can hear your alexa: voice command fingerprinting on smart home speakers. In: Proceedings of IEEE CNS 2019 (2019)
Google Scholar
Kohls, K., Rupprecht, D., Holz, T., Popper, C.: Lost traffic encryption: fingerprinting LET/4G Traffic on Layer Two. In: Proceedings of ACM WiSec 2019 (2019)
Google Scholar
Liberatore, M., Levine, B.N.: Inferring the source of encrypted HTTP connections. In: Proceedings of ACM CCS’06 (2006)
Google Scholar
Liu, Y., Ou, C., Li, Z., Corbett, C., Mukherjee, B., Ghosal, D.: Wavelet-based traffic analysis for identifying video streams over broadband networks. In: Proceedings of IEEE GLOBECOM 2008 (2008)
Google Scholar
Molnar, C.: Interpretable machine learning a guide for making black box models explainable. (2019). https://christophm.github.io/interpretable-ml-book/
Panchenko, A., et al.: Website fingerprinting at internet scale. In: Proceedings of NDSS 2016 (2016)
Google Scholar
Panchenko, A., Niessen, L., Zinnen, A., Engel, T.: Website fingerprinting in onion routing based anonymization networks. In: Proceedings of Workshop on Privacy in the Electronic Society (2011)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Article Google Scholar
Peng, P., Yang, L., Song, L., Wang, G.: Opening the blackbox of virustotal: analyzing online phishing scan engines. In: Proceedings of ACM SIGCOMM Internet Measurement Conference (IMC 2019) (2019)
Google Scholar
Rashid, T., Agrafiotis, I., Nurse, J.R.C.: A new take on detecting inside threats: exploring the use of hidden markov models. In: Proceedings of the 8th ACM CCS International Workshop on Managing Insider Security Threats (2016)
Google Scholar
Reed, A., Klimkowski, B.: Leaky streams: identifying variable bitrate DASH videos streamed over encrypted 802.11n connections. In: 13th IEEE Annual Consumer Communications & Networking Conference (CCNC) (2016)
Google Scholar
Rimmer, V., Preuveneers, D., Juarez, M., Goethem, T.V., Joosen, W.: Automated website fingerprinting through deep learning. In: Proceedings of NDSS 2018 (2018)
Google Scholar
Saponas, T.S., Lester, J., Hartung, C., Agarwal, S.: Devices that tell on you: privacy trends in consumer ubiquitous computing. In: Proceedings of USENIX Security 2007 (2007)
Google Scholar
Schuster, R., Shmatikov, V., Tromer, E.: Beauty and the burst: remote identification of encrypted video streams. In: Proceedings of USENIX Security 2017 (2017)
Google Scholar
Sirinam, P., Imani, M., Juarez, M., Wright, M.: Deep fingerprinting: understanding website fingerprinting defenses with deep learning. In: Proceedings of ACM CCS 2018 (2018)
Google Scholar
Wang, C., et al.: Fingerprinting encrypted voice traffic on smart speakers with deep learning. In: Proceedings of ACM WiSec 2020 (2020)
Google Scholar
Wang, T., Goldberg, I.: Walkie-Talkie: an efficient defense against passive website fingerprinting attacks. In: Proceedings of USENIX Security 2017 (2017)
Google Scholar
Weinshel, B., et al.: Oh, the places you’ve been! user reactions to longitudinal transparency about third-party web tracking and inferencing. In: Proceedings of ACM CCS 2019 (2019)
Google Scholar
Xiao, Q., Reiter, M.K., Zhang, Y.: Mitigating storage side channels using statistical privacy mechanisms. In: Procedings of ACM CCS 2015 (2015)
Google Scholar
Yang, H.H., Moody, J.: Feature selection based on joint mutual information. In: Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis (1999)
Google Scholar
Zhang, X., Hamm, J., Reiter, M.K., Zhang, Y.: Statistical privacy for streaming traffic. In: Proceedings of NDSS 219 (2019)
Google Scholar

Download references

Acknowledgement

Our source code and datasets can be found on GitHub (https://github.com/SmartHomePrivacyProject/SmartSwitch). Authors from the University of Cincinnati were partially supported by National Science Foundation (CNS-1947913), UC Office of the Vice President for Research Pilot Program, and Ohio Cyber Range at UC.

Author information

Authors and Affiliations

Department of EECS, University of Cincinnati, Cincinnati, USA
Haipeng Li & Boyang Wang
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Ben Niu

Authors

Haipeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Ben Niu
View author publications
You can also search for this author in PubMed Google Scholar
Boyang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boyang Wang .

Editor information

Editors and Affiliations

Yonsei University, Seoul, Korea (Republic of)
Noseong Park
George Mason University, Fairfax, VA, USA
Kun Sun
Dipartimento di Informatica, Universita degli Studi, Milan, Milano, Italy
Sara Foresti
University of Florida, Gainesville, FL, USA
Kevin Butler
Division of Nephrology, University of Alabama, Birmingham, AL, USA
Nitesh Saxena

Appendix

Hyperparameters of CNN. The tuned hyperparameters of our CNN are described in Table 4. For the search space of each hyperparameter, we represent it as a set. For the activation functions, dropout, filter size and pool size, we searched hyperparameters at each layer. The tuned parameters we report in the table are presented as a sequence of values by following the order of layers we presented in Fig. 2. For instance, the tuned activation functions are selu (1st Conv), elu (2nd Conv), relu (3rd Conv), elu (4th Conv), tanh (the second-to-last Dense layer).

Table 4. Tuned hyperparameters of CNN When $w = 0.05$ s

Full size table

$d^*$-privacy. Xiao et al. [26] proposed $d^*$-privacy, which is a variant of differential privacy on time-series data, to preserve side-channel information leakage. They proved that $d^*$-privacy can achieve ($d^*$,2$\epsilon $)-privacy, where $d^{*}$ is a distance between two time series data and $\epsilon $ is privacy parameter in differential privacy.

Let $\mathbf {x}=(x_{1}, ..., x_{n})$ and $\mathbf {y}=(y_{1}, ..., y_{n})$ denote two time series with the same length. The $d^*$-distance between $\mathbf {x}$ and $\mathbf {y}$ is defined as:

$$\begin{aligned} d^*(\mathbf {x}, \mathbf {y}) = \sum _{i\ge 2} \left| (x_{i}-x_{i-1})-(y_{i}-y_{i-1}) \right| \end{aligned}$$

(6)

$d^{*}$-privacy produces noise to data at a later timestamp by considering data from an earlier timestamp in the same time series. Specifically, let D(i) denote the greatest power of 2 that divides timestamp i, $d^*$-privacy computes noised data at timestamp i as , where = 0, function $G(\cdot )$ and $r_{i}$ are defined as below

$$\begin{aligned} G(i)= {\left\{ \begin{array}{ll} 0 &{} \text {if } i = 1 \\ i/2 &{} \text {if } i = D(i) \\ i-D(i) &{} \text {if } i > D(i) \\ \end{array}\right. } \end{aligned}$$

(7)

$$\begin{aligned} r_i= {\left\{ \begin{array}{ll} \mathrm {Laplace}(\frac{1}{\epsilon })&{} \text {if } i = D(i) \\ \mathrm {Laplace}(\frac{\lfloor \log _2i\rfloor }{\epsilon })&{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(8)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Niu, B., Wang, B. (2020). SmartSwitch: Efficient Traffic Obfuscation Against Stream Fingerprinting. In: Park, N., Sun, K., Foresti, S., Butler, K., Saxena, N. (eds) Security and Privacy in Communication Networks. SecureComm 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 335. Springer, Cham. https://doi.org/10.1007/978-3-030-63086-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-63086-7_15
Published: 12 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63085-0
Online ISBN: 978-3-030-63086-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation