research-article

VarSem: declarative expression and automated inference of variable usage semantics

Authors:
Yin Liu

Virginia Tech, USA

Virginia Tech, USA
View Profile

,
Eli Tilevich

Virginia Tech, USA

Virginia Tech, USA
View Profile

GPCE 2020: Proceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesNovember 2020Pages 84–97https://doi.org/10.1145/3425898.3426962

Published:16 November 2020Publication History

GPCE 2020: Proceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences

Pages 84–97

ABSTRACT

Programmers declare variables to serve specific implementation purposes that we refer to as variable usage semantics (VUS). Understanding VUS is required for various software engineering tasks, including program comprehension, code audits, and vulnerability detection. To help programmers understand VUS, we present a new program analysis that infers a variable's usage semantics from its textual and context information (e.g., symbolic name, type, scope, information flow). To support this analysis, we introduce VarSem, a domain-specific language, in which a variable's semantic category is expressed as a set of declarative rules. VarSem's execution determines which program variables belong to a given semantic category. VarSem translates high-level declarative rules into low-level program analysis techniques, including natural language processing and data flow, and provides a highly extensible architecture for specifying new rules and analysis techniques. We evaluate VarSem with eight real-world systems to identify their personally identifiable information variables. The evaluation results show that VarSem infers variable semantics with satisfying accuracy/precision and passable recall, thus potentially benefiting both software and security engineers.

Supplemental Material

gpce20main-p36-p-video.mp4

mp4

114.3 MB

Download

3425898.3426962.mp4

mp4

28.4 MB

Download

References

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 281-293.Google ScholarDigital Library
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 38-49.Google ScholarDigital Library
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In International conference on machine learning. 2091-2100.Google Scholar
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices 53, 4 ( 2018 ), 404-419.Google Scholar
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL ( 2019 ), 1-29.Google ScholarDigital Library
antlersoft. 2011. Browse-by-Query. htp://browsebyquery.sourceforge. net/.Google Scholar
Dmitry Baryshkov. 2019. Tools to work with EMV bank cards. htps: //github.com/lumag/emv-tools.Google Scholar
Raymond PL Buse and Westley R Weimer. 2008. A metric for software readability. In Proceedings of the 2008 international symposium on Software testing and analysis. 121-130.Google ScholarDigital Library
Guang Chen, Yuexing Wang, Min Zhou, and Jiaguang Sun. 2019. VFQL: combinational static analysis as query language. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 378-381.Google ScholarDigital Library
Yue Chen, Mustakimur Khandaker, and Zhi Wang. 2017. Pinpointing vulnerabilities. In Proceedings of the 2017 ACM on Asia conference on computer and communications security. 334-345.Google ScholarDigital Library
Clang Front End for LLVM Developers. 2019. Clang Static Analyzer. htps://clang-analyzer.llvm.org/.Google Scholar
Tal Cohen, Joseph Gil, and Itay Maman. 2006. JTL: the Java tools language. ACM SIGPLAN Notices 41, 10 ( 2006 ), 89-108.Google Scholar
CVE site. 2011. CVE-2011-4120. htps://cvesite.com/cves/CVE-2011-4120.Google Scholar
CVE site. 2019. CVE-2019-12210. htps://cvesite.com/cves/CVE-2019-12210.Google Scholar
drkblog. 2018. findmacs. htps://github.com/drkblog/findmacs.Google Scholar
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-largescale software repositories. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 422-431.Google ScholarCross Ref
Peter Harry Eidorf, Fritz Henglein, Christian Mossin, Henning Niss, Morten Heine Sørensen, and Mads Tofte. 1999. AnnoDomini: from type theory to Year 2000 conversion tool. In Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 1-14.Google ScholarDigital Library
Peter Harry Eidorf, Fritz Henglein, Christian Mossin, Henning Niss, Morten Heine B Sørensen, and Mads Tofte. 1999. AnnoDomini in practice: A type-theoretic approach to the year 2000 problem. In International Conference on Typed Lambda Calculi and Applications. Springer, 6-13.Google ScholarCross Ref
Peter Eisentraut. 2015. emailaddr type for PostgreSQL. htps://github. com/petere/pgemailaddr.Google Scholar
Edward M Gellenbeck and Curtis R Cook. 1991. An investigation of procedure and variable names as beacons during program comprehension. In Empirical studies of programmers: Fourth workshop. Ablex Publishing, Norwood, NJ, 65-81.Google Scholar
Google. 2019. word2vec. htps://code.google.com/archive/p/ word2vec/.Google Scholar
Google. 2019. word2vec-GoogleNews-vectors. htps://github.com/ mmihaltz/word2vec-GoogleNews-vectors.Google Scholar
Hunter Gregal. 2019. MimiPenguin 2.0. htps://github.com/ huntergregal/mimipenguin.Google Scholar
Cay S Horstmann. 2012. Scala for the Impatient. Pearson Education.Google Scholar
Einar W Høst and Bjarte M Østvold. 2009. Debugging method names. In European Conference on Object-Oriented Programming. Springer, 294-317.Google ScholarDigital Library
Jianjun Huang, Zhichun Li, Xusheng Xiao, Zhenyu Wu, Kangjie Lu, Xiangyu Zhang, and Guofei Jiang. 2015. {SUPOR}: Precise and Scalable Sensitive User Input Detection for Android Apps. In 24th USENIX Security Symposium (USENIX Security 15). 977-992.Google Scholar
J. Karau. 2014. phone number scanner. htps://github.com/witycoder/ phone_number_scanner.Google Scholar
Lin Jiang, Hui Liu, and He Jiang. 2019. Machine Learning Based Recommendation of Method Names: How Far are We. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 602-614.Google ScholarDigital Library
Jorrit Kronjee, Arjen Hommersom, and Harald Vranken. 2018. Discovering software vulnerabilities using data-flow analysis and machine learning. In Proceedings of the 13th International Conference on Availability, Reliability and Security. 1-10.Google ScholarDigital Library
KYLIN Information Technology Co., Ltd. 2019. Biometric Authentication. htps://github.com/ukui/biometric-authentication.Google Scholar
Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. 2006. What's in a Name? A Study of Identifiers. In 14th IEEE International Conference on Program Comprehension (ICPC'06). IEEE, 3-12.Google ScholarDigital Library
Xing Liu, Jiqiang Liu, Wei Wang, Yongzhong He, and Xiangliang Zhang. 2018. Discovering and understanding Android sensor usage behaviors with data flow analysis. World Wide Web 21, 1 ( 2018 ), 105-126.Google Scholar
llvm-admin team. 2019. The LLVM Compiler Infrastructure. htps: //llvm.org/.Google Scholar
Kenny MacDermid. 2016. wdpassport-utils. htps://github.com/ KenMacD/wdpassport-utils.Google Scholar
Michael Martin, Benjamin Livshits, and Monica S. Lam. 2005. Finding Application Errors and Security Flaws Using PQL: A Program Query Language. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (San Diego, CA, USA) ( OOPSLA '05). Association for Computing Machinery, New York, NY, USA, 365-383. htps://doi.org/10.1145/ 1094811.1094840Google Scholar
Alon Mishne, Sharon Shoham, and Eran Yahav. 2012. Typestate-based semantic code search over partial programs. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications. 997-1016.Google ScholarDigital Library
Yuhong Nan, Min Yang, Zhemin Yang, Shunfan Zhou, Guofei Gu, and XiaoFeng Wang. 2015. Uipicker: User-input privacy identification in mobile applications. In 24th USENIX Security Symposium (USENIX Security 15). 993-1008.Google Scholar
Arvind Narayanan and Vitaly Shmatikov. 2010. Myths and fallacies of" personally identifiable information". Commun. ACM 53, 6 ( 2010 ), 24-26.Google Scholar
NetBeans. 2012. Jackpot. htp://wiki.netbeans.org/Jackpot.Google Scholar
Son Nguyen, Tien Nguyen, Yi Li, and Shaohua Wang. 2019. Combining Program Analysis and Statistical Language Model for Code Statement Completion. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 710-721.Google Scholar
Veselin Raychev, Martin Vechev, and Andreas Krause. 2019. Predicting program properties from'big code'. Commun. ACM 62, 3 ( 2019 ), 99-107.Google Scholar
Andrew Rice, Edward Aftandilian, Ciera Jaspan, Emily Johnston, Michael Pradel, and Yulissa Arroyo-Paredes. 2017. Detecting argument selection defects. Proceedings of the ACM on Programming Languages 1, OOPSLA ( 2017 ), 1-22.Google ScholarDigital Library
Luciano Sampaio and Alessandro Garcia. 2016. Exploring contextsensitive data flow analysis for early vulnerability detection. Journal of Systems and Software 113 ( 2016 ), 337-361.Google Scholar
Paul M Schwartz and Daniel J Solove. 2011. The PII problem: Privacy and a new concept of personally identifiable information. NYUL rev. 86 ( 2011 ), 1814.Google Scholar
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM international conference on Automated software engineering. 43-52.Google ScholarDigital Library
The Clang Team. 2020. Matching the Clang AST. htps://clang.llvm. org/docs/LibASTMatchers.html.Google Scholar
Technology Services Group, University of Illinois at UrbanaChampaign. 2014. Ssniper Social Security Scanner for Linux. htps: //github.com/racooper/ssniper.Google Scholar
The Clang Team. 2019. LibTooling. htps://clang.llvm.org/docs/ LibTooling.html.Google Scholar
Raoul-Gabriel Urma and Alan Mycroft. 2015. Source-code queries with graph databases-with application to programming language usage and evolution. Science of Computer Programming 97 ( 2015 ), 127-134.Google Scholar
Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering clear, natural identifiers from obfuscated JS names. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 683-693.Google ScholarDigital Library
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S Yu. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 397-407.Google ScholarDigital Library
Fengguo Wei, Sankardas Roy, and Xinming Ou. 2018. Amandroid: a precise and general inter-component data flow analysis framework for security vetting of Android apps. ACM Transactions on Privacy and Security (TOPS) 21, 3 ( 2018 ), 1-32.Google Scholar
Westley Weimer and George C Necula. 2005. Mining temporal speciifcations for error detection. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 461-476.Google ScholarDigital Library
Ian H Witten, Eibe Frank, and Mark A Hall. 2005. Practical machine learning tools and techniques. Morgan Kaufmann ( 2005 ), 578.Google Scholar
Ludwig Wittgenstein. 2009. Philosophical investigations. John Wiley & Sons.Google Scholar
Yubico Company. 2019. Yubico PAM module. htps://developers. yubico.com/yubico-pam/.Google Scholar
Yu Zhao, Tingting Yu, Ting Su, Yang Liu, Wei Zheng, Jingzhi Zhang, and William GJ Halfond. 2019. Recdroid: automatically reproducing Android application crashes from bug reports. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 128-139.Google ScholarDigital Library

Index Terms

VarSem: declarative expression and automated inference of variable usage semantics
1. Security and privacy
  1. Systems security
    1. Operating systems security
      1. Trusted computing
2. Software and its engineering
  1. Software notations and tools
    1. Context specific languages
      1. Domain specific languages

Recommendations

Everything old is new again: quoted domain-specific languages
PEPM '16: Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation

We describe a new approach to implementing Domain-Specific Languages(DSLs), called Quoted DSLs (QDSLs), that is inspired by two old ideas:quasi-quotation, from McCarthy's Lisp of 1960, and the subformula principle of normal proofs, from Gentzen's ...
Read More
A DSL for writing type systems for Xtext languages
PPPJ '11: Proceedings of the 9th International Conference on Principles and Practice of Programming in Java

Xtext is a framework for the development of languages, which also generates all the typical and recurrent artifacts for a fully-fledged IDE on top of Eclipse. The validation (e.g., checking the correctness of programs from the point of view of types) of ...
Read More
A generic analysis environment for declarative programs
WCFLP '05: Proceedings of the 2005 ACM SIGPLAN workshop on Curry and functional logic programming

In this paper we present CurryBrowser, a generic analysis environment for the declarative multi-paradigm language Curry. CurryBrowser supports browsing through the implementation of an application written in Curry, i.e., the main module and all directly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GPCE 2020: Proceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences
November 2020
136 pages
ISBN:9781450381741
DOI:10.1145/3425898
General Chair:
Martin Erwig
Oregon State University, USA
,
Program Chair:
Jeff Gray
University of Alabama, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 November 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DSL
NLP
Program Analysis
Variable Semantics
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate56of180submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 95
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

VarSem: declarative expression and automated inference of variable usage semantics

GPCE 2020: Proceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Everything old is new again: quoted domain-specific languages

A DSL for writing type systems for Xtext languages

A generic analysis environment for declarative programs