skip to main content
10.1145/2660267.2660339acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

ALETHEIA: Improving the Usability of Static Security Analysis

Published: 03 November 2014 Publication History

Abstract

The scale and complexity of modern software systems complicate manual security auditing. Automated analysis tools are gradually becoming a necessity. Specifically, static security analyses carry the promise of efficiently verifying large code bases. Yet, a critical usability barrier, hindering the adoption of static security analysis by developers, is the excess of false reports. Current tools do not offer the user any direct means of customizing or cleansing the report. The user is thus left to review hundreds, if not thousands, of potential warnings, and classify them as either actionable or spurious. This is both burdensome and error prone, leaving developers disenchanted by static security checkers.
We address this challenge by introducing a general technique to refine the output of static security checkers. The key idea is to apply statistical learning to the warnings output by the analysis based on user feedback on a small set of warnings. This leads to an interactive solution, whereby the user classifies a small fragment of the issues reported by the analysis, and the learning algorithm then classifies the remaining warnings automatically. An important aspect of our solution is that it is user centric. The user can express different classification policies, ranging from strong bias toward elimination of false warnings to strong bias toward preservation of true warnings, which our filtering system then executes.
We have implemented our approach as the Aletheia tool. Our evaluation of Aletheia on a diversified set of nearly 4,000 client-side JavaScript benchmarks, extracted from 675 popular Web sites, is highly encouraging. As an example, based only on 200 classified warnings, and with a policy biased toward preservation of true warnings, Aletheia is able to boost precision by a threefold factor (x 2.868), while reducing recall by a negligible factor (x 1.006). Other policies are enforced with a similarly high level of efficacy.

References

[1]
N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 1950.
[2]
N. Ayewah, W. Pugh, J. D. Morgenthaler, J. Penix, and Y. Zhou. Using findbugs on production software. In OOPSLA Companion, 2007.
[3]
C. M. Bishop. Pattern recognition and machine learning, volume 1. Springer, 2006.
[4]
J. G. Cleary, L. E. Trigg, et al. K*: An instance-based learner using an entropic distance measure. In ICML, 1995.
[5]
D. D. E. Denning and P. J. Denning. Certification of programs for secure information flow. Commun. ACM, 20(7), 1977.
[6]
A. Fehnker, R. Huuck, S. Seefried, and M. Tapp. Fade to grey: Tuning static program analysis. ENTCS, 266, 2010.
[7]
S. Guarnieri, M. Pistoia, O. Tripp, J. Dolby, S. Teilhet, and R. Berg. Saving the world wide web from vulnerable javascript. In ISSTA, 2011.
[8]
A. Guha, S. Krishnamurthi, and T. Jim. Using Static Analysis for Ajax Intrusion Detection. In WWW, 2009.
[9]
T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani. The elements of statistical learning, volume 2. Springer, 2009.
[10]
D. Heckerman, D. Geiger, and D. M. Chickering. Learning bayesian networks: The combination of knowledge and statistical data. Machine learning, 20(3), 1995.
[11]
B. Johnson, Y. Song, E. Murphy-Hill, and R. Bowdidge. Why don't software developers use static analysis tools to find bugs? In ICSE, 2013.
[12]
M. Junker, R. Huuck, A. Fehnker, and A. Knapp. Smt-based false positive elimination in static program analysis. In ICFEM, 2012.
[13]
B. Livshits and M. S. Lam. Finding security vulnerabilities in java applications with static analysis. In USENIX Security, 2005.
[14]
S. McCamant and M. D. Ernst. Quantitative information flow as network flow capacity. In PLDI, 2008.
[15]
T. B. Muske, A. Baid, and T. Sanas. Review efforts reduction by partitioning of static analysis warnings. In SCAM, 2013.
[16]
T. B. Muske, A. Datar, M. Khanzode, and K. Madhukar. Efficient elimination of false positives using bounded model checking. In VALID, 2013.
[17]
C. G. Nevill-Manning, G. Holmes, and I. H. Witten. The development of holte's 1r classifier. In Artificial Neural Networks and Expert Systems, 1995. Proceedings., Second New Zealand International Two-Stream Conference on. IEEE, 1995.
[18]
A. Y. Ng and M. I. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in neural information processing systems, 2, 2002.
[19]
M. S. Pepe. The statistical evaluation of medical tests for classification and prediction. Oxford University Press, 2003.
[20]
M. Pontil and A. Verri. Properties of support vector machines. Neural Computation, 10, 1998.
[21]
J. R. Quinlan. C4. 5: programs for machine learning, volume 1. Morgan kaufmann, 1993.
[22]
A. Sabelfeld and A. C. Myers. Language-based information-flow security. J-SAC, 21(1), 2006.
[23]
S. Saitoh. Theory of reproducing kernels and its applications. Longman, 1988.
[24]
B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12, 2000.
[25]
H. Shen, J. Fang, and J. Zhao. Efindbugs: Effective error ranking for findbugs. In ICST, 2011.
[26]
T. Tateishi, M. Pistoia, and O. Tripp. Path- and index-sensitive string analysis based on monadic second-order logic. TOSEM, 22(4), 2013.
[27]
O. Tripp, P. Ferrara, and M. Pistoia. Hybrid security analysis of web javascript code via dynamic partial evaluation. In ISSTA, 2014.
[28]
O. Tripp, M. Pistoia, P. Cousot, R. Cousot, and S. Guarnieri. Andromeda: Accurate and scalable security analysis of web applications. In FASE, 2013.
[29]
O. Tripp, M. Pistoia, S. J. Fink, M. Sridharan, and O. Weisman. TAJ: Effective Taint Analysis of Web Applications. In PLDI, 2009.
[30]
I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005.

Cited By

View all
  • (2024)A Method for Processing Static Analysis Alarms Based on Deep LearningApplied Sciences10.3390/app1413554214:13(5542)Online publication date: 26-Jun-2024
  • (2024)Equivalent Mutants in the Wild: Identifying and Efficiently Suppressing Equivalent Mutants for Java ProgramsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680310(654-665)Online publication date: 11-Sep-2024
  • (2024)Pattern Mining-Based Warning Prioritization by Refining Abstract Syntax TreeInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450029334:10(1593-1619)Online publication date: 23-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '14: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security
November 2014
1592 pages
ISBN:9781450329576
DOI:10.1145/2660267
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. false alarms
  3. information-flow security
  4. machine learning
  5. static analysis
  6. usable security

Qualifiers

  • Research-article

Conference

CCS'14
Sponsor:

Acceptance Rates

CCS '14 Paper Acceptance Rate 114 of 585 submissions, 19%;
Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Method for Processing Static Analysis Alarms Based on Deep LearningApplied Sciences10.3390/app1413554214:13(5542)Online publication date: 26-Jun-2024
  • (2024)Equivalent Mutants in the Wild: Identifying and Efficiently Suppressing Equivalent Mutants for Java ProgramsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680310(654-665)Online publication date: 11-Sep-2024
  • (2024)Pattern Mining-Based Warning Prioritization by Refining Abstract Syntax TreeInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450029334:10(1593-1619)Online publication date: 23-Jul-2024
  • (2024)Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERTEmpirical Software Engineering10.1007/s10664-023-10405-929:2Online publication date: 22-Feb-2024
  • (2023)Why Johnny Can’t Use Secure Docker Images: Investigating the Usability Challenges in Using Docker Image Vulnerability Scanners through Heuristic EvaluationProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3607199.3607244(669-685)Online publication date: 16-Oct-2023
  • (2023)Incorporating Signal Awareness in Source Code Modeling: An Application to Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/359720232:6(1-40)Online publication date: 29-Sep-2023
  • (2023)Mitigating False Positive Static Analysis Warnings: Progress, Challenges, and OpportunitiesIEEE Transactions on Software Engineering10.1109/TSE.2023.332966749:12(5154-5188)Online publication date: Dec-2023
  • (2023)An Empirical Study of Class Rebalancing Methods for Actionable Warning IdentificationIEEE Transactions on Reliability10.1109/TR.2023.323498272:4(1648-1662)Online publication date: Dec-2023
  • (2023)Code Vulnerability Detection via Signal-Aware Learning2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP57164.2023.00037(506-523)Online publication date: Jul-2023
  • (2023)Valar: Streamlining Alarm Ranking in Static Analysis with Value-Flow Assisted Active LearningProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00098(1940-1951)Online publication date: 11-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media