Skip to main content
Log in

Feature Selection Based Correlation Attack on HTTPS Secure Searching

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Search engine plays an irreplaceable role in web information organizing and accessing. It is very common for Internet users to query a search engine when retrieving web information. Sensitive data about search engine user’s intentions or behavior can be inferred from his query phrases, the returned results pages, and the webpages he visits subsequently. In order to protect contents of communications from being eavesdropped, some search engines adopt HTTPS by default to provide bidirectional encryption. This only provides an encrypted channel between user and search engine, the majority of webpages indexed in search engines’ results pages are still on HTTP enabled websites and the contents of these webpages can be observed by attackers once the user click on these links. Imitating attackers, we propose a novel approach for attacking secure search through correlating analysis of encrypted search with unencrypted webpages. We show that a simple weighted TF–DF mechanism is sufficient for selecting guessing phrase candidates. Imitating search engine users, by querying these candidates and enumerating webpages indexed in results pages, we can hit the definite query phrases and meanwhile reconstruct user’s web-surfing trails through DNS-based URLs comparison and flow feature statistics-based network traffic analysis. In the experiment including 28 search phrases, we achieved 67.86% hit rate at first guess and 96.43% hit rate within three guesses. Our empirical research shows that HTTPS traffic can be correlated and de-anonymized through HTTP traffic and secured search of search engines are not always secure unless HTTPS by default enabled everywhere.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Naylor, D., Finamore, A., Leontiadis, I., Grunenberger, Y., Mellia, M., Munafo, M., Papagiannaki, K., & Steenkiste, P. (2014). The cost of S in HTTPS. In Proceedings of the CoNext (pp. 133–140).

  2. Yuan, Z., Xue, Y., & Xia, W. (2013) PPI: Towards precise page identification for encrypted web-browsing traffic. In Proceedings of the ANCS (pp. 109–110).

  3. Xia, W., Ren, Y., Yuan, Z., & Xue, Y. (2013). TCPI: A novel method of encrypted page identification. In Proceedings of the CCIS (pp. 453–456).

  4. Miller, B., Huang, L., Joseph, A. D., & Tygar, J. D. (2014). I know why you went to the clinic: Risks and realization of HTTPS traffic analysis. In Proceedings of the PETS (pp. 143–163).

  5. Xie, G., Iliofotou, M., Karagiannis, T., Faloutsos, M., & Jin, Y. (2013). Reconstructing web-surfing activity from network traffic. In Proceedings of the IFIP Networking Conference (pp. 1–9).

  6. Neasbitt, C. (2014). Clickminer: Towards forensic reconstruction of user-browser interactions from network traces. In Proceedings of the ACM CCS (pp. 1244–1255).

  7. Gugelmann, D. (2015). Hviz: HTTP(S) traffic aggregation and visualization for network forensics. Digital Investigation, 12(Sup 1), S1–S11.

    Article  Google Scholar 

  8. Conti, M., Mancini, L. V., Spolaor, R., & Verde, N. V. (2015). Can’t you hear me knocking: Identification of user actions on android apps via traffic analysis. In Proceedings of the ACM SIGSAC CODASPY.

  9. Chen, S., Wang, R., Wang, X. F., & Zhang, K. (2010). Side-channel leaks in web applications: A reality today, a challenge tomorrow. In Proceedings of 2010 IEEE Symposium on Security and Privacy, May 16–19, 2010, Oakland, CA, USA (pp. 191–206). IEEE.

  10. Juarez, M., Afroz, S., Acar, G., Diaz, C., & Greenstadt, R. (2014). A critical evaluation of website fingerprinting attacks. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’14) (pp. 263–274).

  11. Korczynski, M., & Duda, A. (2014). Markov chain fingerprinting to classify encrypted traffic. In Proceedings of the 2014 IEEE Conference on Computer Communications (IEEE INFOCOM 2014), April 27–May 2, 2014, Toronto ON (pp. 781–789). IEEE.

  12. Goseva-Postojanova, K., Anastasovski, G., Dimitrijevik, A., Pantev, R., & Miller, B. (2014). Characterization and classification of malicious Web traffic. Computers and Security, 42, 92–115.

    Article  Google Scholar 

  13. Luoshi, Z., Yibo, X., & Yuanyuan, B. (2015). A new network traffic classification method based on classifier integration. International Journal of Grid and Distributed Computing, 8(3), 309–322.

    Article  Google Scholar 

  14. Wang, Y., Xiang, Y., Zhang, J., Zhou, W., Wei, G., & Yang, L. T. (2014). Internet traffic classification using constrained clustering. IEEE Transactions on Parallel and Distributed Systems, 25(11), 2932–2943.

    Article  Google Scholar 

  15. Bukhari, R. H., Sarfaraz, A., & Khan, A. (2018). Python: A critical analysis of programing languages for novices. Science International, 30(3), 327–331.

    Google Scholar 

  16. Khan, A., & Sarfaraz, A. (2018). Practical guidelines for securing wireless local area networks (WLANs). International Journal of Security and Its Applications, 12(3), 19–28.

    Article  Google Scholar 

  17. Le Blond, S., & Choffnes, D. (2015). Herd: A scalable, traffic analysis resistant anonymity network for VoIP systems. In Proceedings of the SIGCOM (pp. 639–652).

  18. Vines, P., & Kohno, T. (2015). Rook: Using video games as a low-bandwidth censorship resistant communication platform. In Proceedings of the WPES (pp. 75–84).

  19. Dyer, K. P., Coull, S. E., & Shrimpton, T. (2015). Marionette: A programmable network-traffic obfuscation system. In Proceedings of the USENIX (pp. 367–382).

  20. Khan, A., & Sarfaraz, A. (2017). Vetting the security of mobile applications. Science International, 29(2), 361–365.

    Google Scholar 

  21. Khan, A., & Sarfaraz, A. (2018). Novel high-capacity robust and imperceptible image steganography scheme using multi flipped permutations and frequency entropy matching method. Soft Computing, 20(10), 1–12.

    Google Scholar 

  22. Khan, A., Sohaib, M., & Amjad, F. M. (2016). High-capacity multi-layer framework for highly robust textual steganography. Science International, 28(5), 4451–4457.

    Google Scholar 

  23. Khan, A., Tariq, U., Shabbir, J., & Hassan, S. (2016). Cloud security analysis for health care systems. International Journal of Computer and Communication System Engineering, 3(1), 1–8.

    Google Scholar 

  24. Khan, A. (2015). Comparative analysis of watermarking techniques. Science International, 27(6), 6091–6096.

    Google Scholar 

  25. Khan, A. (2015). Robust textual steganography. Journal of Science, 4(4), 426–434.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Khan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sarfaraz, A., Khan, A. Feature Selection Based Correlation Attack on HTTPS Secure Searching. Wireless Pers Commun 103, 2995–3008 (2018). https://doi.org/10.1007/s11277-018-5989-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-018-5989-6

Keywords

Navigation