ABSTRACT
Keystroke inference attacks are a form of side-channel attacks in which an attacker leverages various techniques to recover a user's keystrokes as she inputs information into some display (e.g., while sending a text message or entering her pin). Typically, these attacks leverage machine learning approaches, but assessing the realism of the threat space has lagged behind the pace of machine learning advancements, due in-part, to the challenges in curating large real-life datasets. We aim to overcome the challenge of having limited number of real data by introducing a video domain adaptation technique that is able to leverage synthetic data through supervised disentangled learning. Specifically, for a given domain, we decompose the observed data into two factors of variation: Style and Content. Doing so provides four learned representations: real-life style, synthetic style, real-life content and synthetic content. Then, we combine them into feature representations from all combinations of style-content pairings across domains, and train a model on these combined representations to classify the content (i.e., labels) of a given datapoint in the style of another domain. We evaluate our method on real-life data using a variety of metrics to quantify the amount of information an attacker is able to recover. We show that our method prevents our model from overfitting to a small real-life training set, indicating that our method is an effective form of data augmentation, thereby making keystroke inference attacks more practical.
Supplemental Material
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. arXiv:1607.06450 [stat.ML]Google Scholar
- M. Backes, T. Chen, M. Duermuth, H. P. A. Lensch, and M. Welk. 2009. Tempest in a Teapot: Compromising Reflections Revisited. In 2009 30th IEEE Symposium on Security and Privacy. 315--327.Google Scholar
- M. Backes, M. Dürmuth, and D. Unruh. 2008. Compromising Reflections-or-How to Read LCD Monitors around the Corner. In 2008 IEEE Symposium on Security and Privacy (sp 2008). 158--169.Google ScholarDigital Library
- Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, 65--72. https://www.aclweb.org/anthology/W05-0909Google Scholar
- Liang Cai and Hao Chen. 2012. On the Practicality of Motion Based Keystroke Inference Attack. 273--290. https://doi.org/10.1007/978--3--642--30921--2_16Google Scholar
- Yimin Chen, Tao Li, Rui Zhang, Yanchao Zhang, and Terri Hedgpeth. 2018. EyeTell: Video-Assisted Touchscreen Keystroke Inference from Eye Movements. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 144--160.Google ScholarCross Ref
- Yang Chen, Yingwei Pan, Ting Yao, X. Tian, and T. Mei. 2019. Mocycle-GAN: Unpaired Video-to-Video Translation. Proceedings of the 27th ACM International Conference on Multimedia (2019).Google Scholar
- Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, M. Enzweiler, Rodrigo Benenson, Uwe Franke, S. Roth, and B. Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 3213--3223.Google Scholar
- Fred J. Damerau. 1964. A Technique for Computer Detection and Correction of Spelling Errors. Commun. ACM 7, 3 (March 1964), 171--176. https://doi.org/10. 1145/363958.363994Google ScholarDigital Library
- Emily Denton and Vighnesh Birodkar. 2017. Unsupervised Learning of Disentangled Representations from Video. arXiv:1705.10915 [cs.LG]Google Scholar
- Yaroslav Ganin and Victor Lempitsky. 2014. Unsupervised Domain Adaptation by Backpropagation. arXiv:1409.7495 [stat.ML]Google Scholar
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv:1406.2661 [stat.ML]Google Scholar
- Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A. Efros, and Trevor Darrell. 2017. CyCADA: Cycle-Consistent Adversarial Domain Adaptation. arXiv:1711.03213 [cs.CV]Google Scholar
- Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, and Richard Socher. 2019. Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation. In International Conference on Learning Representations. https://openreview.net/ forum?id=B1G9doA9F7Google Scholar
- Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li Fei-Fei, and Juan Carlos Niebles. 2018. Learning to Decompose and Disentangle Representations for Video Prediction. arXiv:1806.04166 [cs.LG]Google Scholar
- Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, P. Wang, Y. Lin, and Ruigang Yang. 2018. The ApolloScape Dataset for Autonomous Driving. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018), 1067--10676.Google Scholar
- A. Jamal, Vinay P. Namboodiri, Dipti Deodhare, and K. Venkatesh. 2018. Deep Domain Adaptation in Action Space. In BMVC.Google Scholar
- Rohit Kulkarni. 2018. A Million News Headlines. https://doi.org/10.7910/DVN/ SYBGZLGoogle Scholar
- Alon Lavie. 2010. Evaluating the Output of Machine Translation Systems. (01 2010).Google Scholar
- Yingzhen Li and Stephan Mandt. 2018. Disentangled Sequential Autoencoder. arXiv:1803.02991 [cs.LG]Google Scholar
- John Lim, True Price, Fabian Monrose, and Jan-Michael Frahm. 2020. Revisiting the Threat Space for Vision-based Keystroke Inference Attacks. arXiv:2009.05796 [cs.CV]Google Scholar
- Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74--81. https://www.aclweb.org/anthology/W04--1013Google Scholar
- Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. 2019. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations. arXiv:1811.12359 [cs.LG]Google Scholar
- L. V. D. Maaten and Geoffrey E. Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579--2605.Google ScholarDigital Library
- Saeid Motiian, Quinn Jones, Seyed Iranmanesh, and Gianfranco Doretto. 2017. Few-Shot Adversarial Domain Adaptation. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H.Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6670--6680. http://papers.nips.cc/paper/7244-few-shot-adversarial-domain-adaptation.pdfGoogle Scholar
- Boxiao Pan, Zhangjie Cao, E. Adeli, and Juan Carlos Niebles. 2020. Adversarial Cross-Domain Action Recognition with Co-Attention. ArXiv abs/1912.10405 (2020).Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311--318. https://doi.org/10.3115/1073083.1073135Google ScholarDigital Library
- Rahul Raguram, Andrew M White, Dibyendusekhar Goswami, Fabian Monrose, and Jan-Michael Frahm. 2011. iSpy: automatic reconstruction of typed input from compromising reflections. In Proceedings of the 18th ACM conference on Computer and communications security. 527--536.Google ScholarDigital Library
- A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, Justus Thies, and M. Nießner. 2018. FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces. ArXiv abs/1803.09179 (2018).Google Scholar
- K. Schindler and L. Gool. 2008. Action snippets: How many frames does human action recognition require? 2008 IEEE Conference on Computer Vision and Pattern Recognition (2008), 1--8.Google ScholarCross Ref
- Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In In Proceedings of Association for Machine Translation in the Americas. 223--231.Google Scholar
- Jingchao Sun, Xiaocong Jin, Yimin Chen, Jinxue Zhang, Yanchao Zhang, and Rui Zhang. 2016. VISIBLE: Video-Assisted Keystroke Inference from Tablet Backside Motion.. In NDSS.Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. arXiv:1409.3215 [cs.CL]Google ScholarDigital Library
- Joshua B. Tenenbaum and William T. Freeman. 1997. Separating Style and Content. In Advances in Neural Information Processing Systems 9, M. C. Mozer, M. I. Jordan, and T. Petsche (Eds.). MIT Press, 662--668. http://papers.nips.cc/ paper/1290-separating-style-and-content.pdfGoogle Scholar
- J. B. Tenenbaum and W. T. Freeman. 2000. Separating Style and Content with Bilinear Models. Neural Computation 12, 6 (2000), 1247--1283.Google ScholarDigital Library
- Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. 2017. Adversarial Discriminative Domain Adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ? ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H.Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998--6008. http://papers.nips.cc/paper/7181- attention-is-all-you-need.pdfGoogle ScholarDigital Library
- Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, and Bryan Catanzaro. 2019. Few-shot Video-to-Video Synthesis. In Conference on Neural Information Processing Systems (NeurIPS).Google Scholar
- Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-Video Synthesis. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
- Jin woo Choi, Gaurav Sharma, S. Schulter, and J. Huang. 2020. Shuffle and Attend: Video Domain Adaptation. In ECCV.Google Scholar
- Yi Xu, Jared Heinly, Andrew M White, Fabian Monrose, and Jan-Michael Frahm. 2013. Seeing double: Reconstructing obscured typed input from repeated compromising reflections. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. 1063--1074.Google ScholarDigital Library
- Guixin Ye, Zhanyong Tang, Dingyi Fang, Xiaojiang Chen, Kwang In Kim, Ben Taylor, and Zheng Wang. 2017. Cracking Android pattern lock in five attempts. (2017).Google Scholar
- Qinggang Yue, Zhen Ling, Xinwen Fu, Benyuan Liu, Kui Ren, and Wei Zhao. 2014. Blind recognition of touched keys on mobile devices. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1403--1414.Google ScholarDigital Library
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Computer Vision (ICCV), 2017 IEEE International Conference on.Google ScholarCross Ref
Index Terms
- Leveraging Disentangled Representations to Improve Vision-Based Keystroke Inference Attacks Under Low Data Constraints
Recommendations
Revisiting the Threat Space for Vision-Based Keystroke Inference Attacks
Computer Vision – ECCV 2020 WorkshopsAbstractA vision-based keystroke inference attack is a side-channel attack in which an attacker uses an optical device to record users on their mobile devices and infer their keystrokes. The threat space for these attacks has been studied in the past, but ...
Distributed denial of service attacks and its defenses in IoT: a survey
AbstractA distributed denial of service (DDoS) attack is an attempt to partially or completely shut down the targeted server with a flood of internet traffic. The primary aim of this attack is to disrupt regular traffic flow to the victim’s server or ...
FLEDGE: Ledger-based Federated Learning Resilient to Inference and Backdoor Attacks
ACSAC '23: Proceedings of the 39th Annual Computer Security Applications ConferenceFederated learning (FL) is a distributed learning process that uses a trusted aggregation server to allow multiple parties (or clients) to collaboratively train a machine learning model without having them share their private data. Recent research, ...
Comments