Abstract
Website fingerprinting attack can identify the visited websites by analyzing the side-channel information of the network traffic even though it is transferred through an encrypted tunnel. The security of web browsing can be evaluated by quantifying the side-channel information leaks. However, most of the current leak quantification measures focus on web applications and may be impractical in web browsing due to their time complexity. Although the revised models were proposed to simplify computations, their assumptions may not be suitable for web browsing. In this paper, the problem of website fingerprinting is analyzed from the viewpoint of pattern classification. The data complexity measures, which quantify the difficulty of separating classes in a classification problem, are applied to describe the leak quantification. The performance of these data complexity measures in representing information leaks is discussed and compared with the existing approaches. This comparative analysis is realized conceptually and through experiments by using two website fingerprinting countermeasures: traffic morphing and BuFLO. Moreover, the parameter selection model based on the leak quantification is proposed to estimate suitable parameters for the website fingerprinting countermeasure. The experimental results confirm that the countermeasures with parameters selected according to the data complexity measures are more secure than other leak quantification measures.
Similar content being viewed by others
References
Backes M, Kopf B, Rybalchenko A (2009) Automatic discovery and quantification of information leaks. In: Proceedings of the 30th IEEE symposium on security and privacy, SP ’09. IEEE Computer Society, Washington, DC, pp 141–153
Backes M, Doychev G, Köpf B (2013) Preventing side-channel leaks in web traffic: a formal approach. In: Proceedngs of 20th network and distributed systems security symposium (NDSS), Internet Society
Bernado-Mansilla E, Ho TK (2005) Domain of competence of XCS classifier system in complexity measurement space. IEEE Trans Evolut Comput 9(1):82–104
Biggio B, Fumera G, Roli F (2010) Multiple classifier systems for robust classifier design in adversarial environments. Int J Mach Learn Cybern 1(1–4):27–41
Blasco J, Hernandez-Castro JC, Tapiador JE, Ribagorda A (2012) Bypassing information leakage protection with trusted applications. Comput Secur 31(4):557–568
Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134
Cai X, Zhang XC, Joshi B, Johnson R (2012) Touching from a distance: website fingerprinting attacks and defenses. In: Proceedings of the 2012 ACM conference on computer and communications security, CCS ’12. ACM, New York, pp 605–616
Chapman P, Evans D (2011) Automated black-box detection of side-channel vulnerabilities in web applications. In: Proceedings of the 18th ACM conference on computer and communications security, CCS ’11. ACM, New York, pp 263–274
Chen S, Wang R, Wang X, Zhang K (2010) Side-channel leaks in web applications: s reality today, a challenge tomorrow. In: Proceedings of the 2010 IEEE symposium on security and privacy, SP ’10. IEEE Computer Society, Washington, DC, pp 191–206
Coull SE, Collins MP, Wright CV, Monrose F, Reiter MK, et al. (2007) On web browsing privacy in anonymized netflows. In: Proceedings of the 16th USENIX security symposium, pp 339–352
Dierks T (2008) The transport layer security (TLS) protocol version 1.2
Dingledine R, Mathewson N, Syverson P (2004) Tor: the second-generation onion router. In: Proceedings of the 13th conference on USENIX security symposium, USENIX Association
Dyer KP, Coull SE, Ristenpart T, Shrimpton T (2012) Peek-a-boo, I still see you: why efficient traffic analysis countermeasures fail. In: Proceedings of the 2012 IEEE symposium on security and privacy, SP ’12. IEEE Computer Society, Washington, DC, pp 332–346
Ho TK, Basu M (2000) Measuring the complexity of classification problems. In: 15th international conference on pattern recognition, vol 2, pp 43–47
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300
Liberatore M, Levine BN (2006) Inferring the source of encrypted HTTP connections. In: Proceedings of the 13th ACM conference on computer and communications security, CCS ’06. ACM, New York, pp 255–263
Lu L, Chang EC, Chan MC (2010) Website fingerprinting and identification using ordered feature sequences. In: Proceedings of the 15th European conference on research in computer security, ESORICS’10, pp 199–214
Luengo J, Herrera F (2010) Domains of competence of fuzzy rule based classification systems with data complexity measures: a case of study using a fuzzy hybrid genetic based machine learning method. Fuzzy Sets Syst 161(1):3–19
Luengo J, Herrera F (2012) Shared domains of competence of approximate learning models using measures of separability of classes. Inf Sci 185(1):43–65
Luo X, Zhou P, Chan EWW, Lee W, Chang RKC, Perdisci R (2011) HTTPOS: sealing information leaks with browser-side obfuscation of encrypted flows. In: Network and distributed systems symposium (NDSS)
Macià N, Bernadó-Mansilla E, Orriols-Puig A, Ho TK (2013) Learner excellence biased by data set selection: a case for data characterisation and artificial data sets. Pattern Recognit 46(3):1054–1066
Mather L, Oswald E (2012) Quantifying side-channel information leakage from web applications. IACR cryptology ePrint archive, p 269
Nelson B, Barreno M, Chi FJ, Joseph AD, Rubinstein BI, Saini U, Sutton CA, Tygar JD, Xia K (2008) Exploiting machine learning to subvert your spam filter. LEET 8:1–9
Panchenko A, Niessen L, Zinnen A, Engel T (2011) Website fingerprinting in onion routing based anonymization networks. In: Proceedings of the 10th annual ACM workshop on privacy in the Electronic Society, WPES ’11, pp 103–114
Pironti A, Strub PY, Bhargavan K (2012) Identifying website users by tls traffic analysis: new attacks and effective countermeasures. Technical report RR-8067, INRIA
Sáez JA, Luengo J, Herrera F (2013) Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit 46(1):355–364
Singh S (2003) Multiresolution estimates of classification complexity. IEEE Trans Pattern Anal Mach Intell 25:1534–1539
Song DX, Wagner D, Tian X (2001) Timing analysis of keystrokes and timing attacks on ssh. In: Proceedings of the 10th conference on USENIX security symposium, vol 10, SSYM’01, USENIX Association
Standaert FX, Malkin T, Yung M (2009) A unified framework for the analysis of side-channel key recovery attacks. In: EUROCRYPT, lecture notes in computer science, vol 5479. Springer pp 443–461
Sun D, Guo Y, Yin L, Hu C (2012) Comparison of measuring information leakage for fully probabilistic systems. Int J Innov Comput Inf Control 8(1A):255–267
Sun Q, Simon DR, Wang YM, Russell W, Padmanabhan VN, Qiu L (2002) Statistical identification of encrypted web browsing traffic. In: Proceedings of the 2002 IEEE symposium on security and privacy, SP ’02. IEEE Computer Society, Washington, DC, pp 19–30
Todo Y, Mitsui T (2014) A learning multiple-valued logic network using genetic algorithm. Int J Innov Comput Inf Control 10(2):565–574
Tong DL, Mintram R (2010) Genetic algorithm-neural network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int J Mach Learn Cybern 1(1–4):75–87
Wang T, Goldberg I (2013) Improved website fingerprinting on tor. In: Proceedings of the 12th ACM workshop on privacy in the Electronic Society, WPES ’13. ACM, pp 201–212
Wright CV, Coull SE, Monrose F (2009) Traffic morphing: An efficient defense against statistical traffic analysis. In: Proceedings of the 16th network and distributed security symposium. IEEE, pp 237–250
Yao L, Zi X, Pan L, Li J (2009) A study of on/off timing channel based on packet delay distribution. Comput Secur 28(8):785–794
Zhang K, Li Z, Wang R, Wang X, Chen S (2010) Sidebuster: Automated detection and quantification of side-channel leaks in web application development. In: Proceedings of the 17th ACM conference on computer and communications security, CCS ’10. ACM, pp 595–606
Acknowledgments
This work is supported by the National Natural Science Foundation of China (61003171, 61272201 and 61003172).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
He, ZM., Chan, P.P.K., Yeung, D.S. et al. Quantification of side-channel information leaks based on data complexity measures for web browsing. Int. J. Mach. Learn. & Cyber. 6, 607–619 (2015). https://doi.org/10.1007/s13042-015-0348-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-015-0348-3