skip to main content
10.1145/3576914.3587523acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article
Public Access

Software Introspection for Signaling Social-Cyber Operations

Published:09 May 2023Publication History

ABSTRACT

Open-source software (OSS) is a critical element in the design and operation of complex cyber-physical systems. Contributions to OSS projects are typically the result of voluntary work and time allocation by researchers, software developers, hackers, and even opportunistic programmers. These communities often operate on a “trust” basis, and although they strive to evaluate the technical correctness and merits of contributed code, the processes they use are usually loosely supervised. Social rules, trust, reputation, and even arcane processes often govern these communities. While these components have undoubtedly contributed to the growth and expansion of OSS, they could also lead to opportunities for subversion [3], hindering the reliability of an OSS project. This, in turn, could not only compromise the integrity of cyber-physical systems depending on OSS but also affect their performance.

The risks of new and emerging socio-technical attack vectors on cyber-physical systems that rely on OSS are real, broad, and growing [8]. Therefore, it is essential for the cyber-defense community to develop both a comprehensive and a deeper understanding of the socio-technical behavior and behavioral dynamics involved in these attacks. Additionally, mechanisms must be in place to extract latent information hidden in these operations. Much of the previous research on understanding socio-technical behavior in OSS projects has focused on a static view of the problem, paying close attention to individual and publicly available traces of information involving source code, commits, logs, or external packages (e.g., [7]). However, social-cyber operations are not static [6]. Instead, they can change over time to help potential contributors build a reputation and eventually become project committers (as seen in the case study of a “successful socialization” scenario in Ducheneaut [2]). Furthermore, some of these dynamics, particularly those related to vulnerability fixes, may occur behind closed doors and be black boxes of complexity [4]. Introspecting multiple streams of information resulting from both social and technical interactions across and between development channels (such as mailing lists, version control systems, and source code) can help us open this black box and build a high fidelity model of socio-technical behavior. Such a model can be operationalized as an early warning mechanism to highlight emergent social-cyber operations that aim to undermine the integrity of OSS projects and their dependent cyber-physical systems. This paper summarizes SIGNAL 1, a single and coherent software introspection capability for signaling social-cyber operations against cyber-physical systems that depend on OSS projects.

As shown in Figure 1, SIGNAL views an OSS project as a changing artifact that grows and evolves over time through socially vetted modifications submitted by programmers. SIGNAL is grounded on three key and inter-connected components: (1) Explainable persuasive behavior extraction (Yellow Patch), (2) Graph-based revision history analysis (Sensor), and (3) Self-supervised mechanisms for dynamic trace analysis (Antenna). In the first component, SIGNAL combines white-box transfer learning for Random Forest and exploratory factor analysis to compute an accurate model of persuasive developer action flows emerging within a project’s social and technical channels. This effectively links key traces of developer social and technical interactions to their associated traces of code modifications. The computed model achieves a comparable accuracy (~68%) to the state-of-the-art [9], and 16x faster training time. In the second component, SIGNAL introduces a novel graph-based pattern mining approach for detecting API misuses that originated from persuasive developer activities. This component looks at chains of code changes in OSS projects to evaluate structural and semantic patterns. It uses this information to identify API misuses. In the third component, SIGNAL combines the output of the first two components and performs self-supervision on their temporal ordering to learn dynamic developer activity embeddings. These embeddings can be used to track the semantic evolution of developer contribution ploys. An advantage of using an embedding approach to track the semantic evolution of socio-technical behavior is that it produces a natural “backtrace” of contributors’ modus operandi. This backtrace details how their actions exploit seams within a project to influence technical change.

Case Study: The Evolution of Hypocrite Commits in the Linux Kernel. In a recent work [5], we assessed the effectiveness of SIGNAL in introspecting a well-documented social engineering attack against the Linux Kernel, specifically the “hypocrite commits” [11]. “Hypocrite commits” refer to scenarios where an attacker exploits the social landscape of OSS projects, such as the Linux Kernel in this case, to earn the trust of maintainers before introducing malicious code or malware that can lead to critical vulnerabilities in the OSS project or its subsystems. Our SIGNAL analysis of the 2020 social engineering attack against the Linux Kernel revealed new and distinct social-cyber operation traces, as depicted in Figure 1 of our recent study. In [5], we sought to capture the dynamics of influence-seeking and trust-building operations carried out by adversaries seeking to acquire write permissions to an OSS project. Additionally, we drew similarities between OSS development life-cycle and online social networks [10] and introduced the concept of trust ascendancy. This concept describes any influence-seeking and trust-building operations seeking to change a project’s technical direction.

In our SIGNAL analysis of the “hypocrite commits” attack, we collected mailing-list, patch, and commit data from August to November 2020, the period when the attack took place [1]. Our approach was hybrid as it formulated our analysis task as an unsupervised learning task with a self-supervised learning twist. Through our experiments, we successfully captured the modus operandi trajectories followed by the aliases involved in the attack and identified a series of potentially influenced maintainers and core contributors. In the process, we also identified a series of trust ascendancy classes, such as opportunistic or awry trust ascendancy 2.

Remarks.. SIGNAL makes the following technical contributions:

Moving forward, we aim to scale SIGNAL to new case studies and to larger volumes of diverse socio-technical activity data. Our goal is to chart the strategic landscape of influence-seeking and trust-building operations in OSS development while avoiding information overload and unnecessary CPU-intensive data operations. We anticipate these efforts will facilitate new research in secure and continuous software development, benefiting the advancement of complex cyber-physical system design and development.

References

  1. Kees Cook. 2021. Report on University of Minnesota Breach-of-Trust Incident. https://lkml.org/lkml/2021/5/5/1244 Accessed: 2023-02-20.Google ScholarGoogle Scholar
  2. Nicolas Ducheneaut. 2005. Socialization in an open source software community: A socio-technical analysis. Computer Supported Cooperative Work (CSCW) 14, 4 (2005), 323–368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Luiz Giovanini, Daniela Oliveira, Huascar Sanchez, and Deborah Shands. 2021. Leveraging Team Dynamics to Predict Open-source Software Projects’ Susceptibility to Social Engineering Attacks. arXiv preprint arXiv:2106.16067 (2021).Google ScholarGoogle Scholar
  4. Ralf Ramsauer, Lukas Bulwahn, Daniel Lohmann, and Wolfgang Mauerer. 2020. The sound of silence: Mining security vulnerabilities from secret integration channels in open-source projects. In Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop. 147–157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Huascar Sanchez and Briland Hitaj. 2022. Trust in Motion: Capturing Trust Ascendancy in Open-Source Projects using Hybrid AI. arXiv preprint arXiv:2210.02656 (2022).Google ScholarGoogle Scholar
  6. Yun Shen and Gianluca Stringhini. 2019. ATTACK2VEC: Leveraging Temporal Word Embeddings to Understand the Evolution of Cyberattacks. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 905–921.Google ScholarGoogle Scholar
  7. Nikolai Sviridov, Mikhail Evtikhiev, and Vladimir Kovalenko. 2021. TNM: A Tool for Mining of Socio-Technical Data from Git Repositories. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 295–299.Google ScholarGoogle ScholarCross RefCross Ref
  8. Synopsys. 2023. Open Source Security and Risk Analysis Report. https://tinyurl.com/4j8zp82y Accessed: 2023-02-20.Google ScholarGoogle Scholar
  9. Xuewei Wang, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. 2019. Persuasion for good: Towards a personalized persuasive dialogue system for social good. arXiv preprint arXiv:1906.06725 (2019).Google ScholarGoogle Scholar
  10. Yi Wang and David Redmiles. 2016. The diffusion of trust and cooperation in teams with individuals’ variations on baseline trust. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 303–318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Qiushi Wu and Kangjie Lu. 2021. On the feasibility of stealthily introducing vulnerabilities in open-source software via hypocrite commits. http://www.coding-guidelines.com/code-data/OpenSourceInsecurity.pdf. (2021).Google ScholarGoogle Scholar

Index Terms

  1. Software Introspection for Signaling Social-Cyber Operations

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              CPS-IoT Week '23: Proceedings of Cyber-Physical Systems and Internet of Things Week 2023
              May 2023
              419 pages
              ISBN:9798400700491
              DOI:10.1145/3576914

              Copyright © 2023 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 May 2023

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited
            • Article Metrics

              • Downloads (Last 12 months)52
              • Downloads (Last 6 weeks)1

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format