A modular package manager architecture

https://doi.org/10.1016/j.infsof.2012.09.002Get rights and content

Abstract

Context

The success of modern software distributions in the Free and Open Source world can be explained, among other factors, by the availability of a large collection of software packages and the possibility to easily install and remove those components using state-of-the-art package managers. However, package managers are often built using a monolithic architecture and hard-wired and ad-hoc dependency solvers implementing some customized heuristics.

Objective

We aim at laying the foundation for improving on existing package managers. Package managers should be complete, that is find a solution whenever there exists one, and allow the user to specify complex criteria that define how to pick the best solution according to the user’s preferences.

Method

In this paper we propose a modular architecture relying on precise interface formalisms that allows the system administrator to choose from a variety of dependency solvers and backends.

Results

We have built a working prototype–called MPM–following the design advocated in this paper, and we show how it largely outperforms a variety of current package managers.

Conclusion

We argue that a modular architecture, allowing for delegating the task of constraint solving to external solvers, is the path that leads to the next generation of package managers that will deliver better results, offer more expressive preference languages, and be easily adaptable to new platforms.

Introduction

Free and Open Source Software (FOSS) distributions, as well as other complex software platforms, strive to provide modular software components, called packages, that can be assembled to provide the user with the desired functionalities. Packages are equipped with a rich set of metadata providing information on their content and the relationships to other packages, that describe the requirements for a package to run properly on a target system.

Packages, as found in FOSS distributions, share important features with software component models [13], but exhibit also some important differences. On one side, packages, like components, are reusable software units which can be combined freely by a system administrator; they are also independent units that follow their own development time-line and versioning scheme.

On the other side, packages, unlike what happens in many software component models, cannot be composed to build a larger component, and it is not possible to install more than one copy of a given package on a given system. Furthermore, installation of packages, and execution of software contained in packages, acts on shared resources that are provided by the operating system, like creating files on the file system, or interacting through the systems input/output devices. As a consequence, packages may be in conflict with each other, a phenomenon which is not yet commonplace for software components.

Software components come with an interface describing their required and provided services. In the case of packages, requirements and provided features are given by symbolic names (either names of packages, or names of abstract features) whose semantics is defined separately from the package model (for instance, a policy document may describe how an executable must behave in order to provide a feature mail-transport-agent, or an external table will tell us which symbols have been provided in version 1.2.3 of library libfoo).

A key component for maintaining and deploying software systems based on packages are the tools used to perform installation, upgrade and removal of packages on the target machines. These tools, called package managers, incorporate numerous functionalities: they allow to retrieve components from remote repositories, and eventually checking their integrity; they compute upgrade paths that respect inter-component constraints (a functionality known as dependency solving); they handle the interaction with the user to allow for fine-tuning of the choice of components; and finally, they perform the actual deployment of upgrades by removing and adding components in the right order, aborting the operation if problems are encountered.

Package managers take a very abstract view by considering only constraints between packages identified by names. Even though the package model is quite simple and abstract, package managers face two major challenges:

  • Logical complexity

    Packages are defined in terms of positive (dependencies) and negative constraints (conflicts), and dependencies may be composed by using logical conjunctions and disjunctions.

  • Scale

    Package repositories include tens of thousands of packages. This challenge is even more complex when package managers may pick packages from several repositories.

Until recently, package managers in FOSS distributions followed a monolithic architecture (re-) implementing all functionalities to fit specific formats of metadata and user requests. In particular, dependency solving was often implemented by ad hoc algorithms instead of employing well known solver technologies. Surprisingly little was known about the intrinsic complexity of dependency solving. It is only in [8] that some of the authors have shown that for packages in FOSS distributions determining whether a component can be installed is an NP-complete problem. This result has been established by showing the equivalence of package installation with Boolean satisfiability, which has opened the door to show that installation in other component models is NP-complete as well. These results and the straightforward encoding into Boolean satisfiability [17] have pushed various communities to incorporate SAT solvers directly in package managers, instead of writing ad hoc solvers as it was previously the case [15], [20], [25], [26].

In this paper—which extends and formalizes the preliminary results of [2]—we argue that decoupling dependency solving from other functionalities will yield better package managers.

  • that succeed in finding an upgrade path where existing package managers fail,

  • that are more powerful by accepting an input language that is more expressive than the ones currently supported,

  • and that are more flexible by being easily adaptable to new platforms.

We propose a modular architecture to build component managers that decouples the front-end, which is in charge of interacting with the user and installing and removing individual components, from a generic back-end, which is in charge of finding the best upgrade path according to some user-specified criteria. As a uniform interface between the front-end and the back-end, our architecture relies on two domain specific languages: the Common Upgradeability Description Format (CUDF), which captures all the relevant information about component dependencies, and the user preferences language, which describes the criteria used to determine the best solution.

In particular we describe MPM, the Mancoosi Package Manager, which is a proof-of-concept implementation of this modular package manager architecture for Debian based systems. MPM largely outperforms the mainstream package managers available in the Debian FOSS distribution in terms of quality of the proposed solution.

This article is organized as follows: Section 2 introduces the package installation problem, describes the state of the art in the area of package managers, and provides a paradigmatic example of the limitations of current tools. Section 3 presents the modular architecture that we advocate for building package managers, and formally defines the two interface languages used to interconnect their components, CUDF and the user preferences language. Section 4 introduces MPM, our new modular package manager which is able to cope efficiently with different installation scenarios. Section 5 gives an overview of the performances of MPM in comparison with other package manages. Before concluding, we discuss related and future work in Section 6.

The appendix contains the precise syntax (Appendix A) and semantics (Appendix B) of the CUDF format, as well as our translation from Debian metadata and RPM meta-data to CUDF (Appendix C Translating Debian package metadata to CUDF, Appendix D Translating RPM package metadata to CUDF).

Section snippets

The upgrade problem

Mainstream FOSS distributions undergo a quality assurance process which aims, among other goals, at assuring a high degree of coherence of the packages contained in the distribution. In particular, a stable distribution will avoid shipping packages referring to other packages not included in the same distribution, and excludes packages which are impossible to install because of some unsatisfiable relation to other packages in the same distribution [22]. Furthermore, a released FOSS distribution

Modular package management

Among all functionalities of a package manager, dependency solving is the most difficult, recurrent, and apparently underestimated one. Re-developing from scratch dependency solvers as soon as dependencies and conflicts are introduced in yet another component model seems to have not served well FOSS users thus far. We argue that an alternative, more modular, approach is possible by treating dependency solving as a separate concern from other component management issues. The goal is to decouple

The Mancoosi package manager

The Mancoosi package manager (MPM) is a proof-of-concept implementation which integrates solver technology and optimization criteria to solve real world installation problems. The back-end of MPM leverages the infrastructure of the apt package manager both to parse command line arguments and to perform package installation, but is modular with respect to the dependency solver component.

To facilitate the acceptance of MPM we decided to maintain a strict compatibility with existing tools. In

Experimental validation

We compared MPM to the latest version available in Debian of four different state of the art packages managers. Our goal was to assess the improvements in the quality of the solution that are attainable using our modular architecture by reusing solvers which participated in the latest Mancoosi International Solver Competition (MISC 2011) [1], [3].

For this particular experiment, MPM has been configured to use the aspcud solver, one of the winners of the MISC 2011 competition: based on Answer Set

State of the art package managers

The world of FOSS distributions has grown very complex over time: the dedicated page6 on Linux Weekly News lists more than 600 of them. Despite this large variety, most distributions use one of two mainstream package formats, RPM and DEB, originally designed for the RedHat and Debian distributions respectively, but now largely adopted by most of the others.

Each of these two different package formats comes with standard tools for the low

Conclusions

We have presented in this work a modular package manager architecture that allows to rely on external state-of-the-art solvers for dependency handling, thanks to the formally defined CUDF format coming from the MISC solver competition. Our architecture also provides the user with a flexible, high-level preference language that allows to tailor the solution to one’s needs.

We have built a proof-of-concept package manager, called MPM, for Debian-based FOSS distribution. MPM is based on the

Data availability

All tools and raw data used for the validation of the results shown in this paper are available online at http://data.mancoosi.org/papers/ist2012/. A synthetic presentation of the results can be found at http://www.mancoosi.org/measures/packagemanagers/2012.

Acknowledgments

The authors are grateful to the members of the Mancoosi project for many stimulating discussions on the CUDF format, optimization criteria, and the practical use cases. A special acknowledgment goes to the optimization and solving community who took part in the MISC competition, adapting their solver to the CUDF format: this allows the direct reuse of their solvers in the MPM package manager, paving the way to a general reuse in other package managers based on the modular architecture proposed

References (26)

  • P. Abate, R. Di Cosmo, R. Treinen, S. Zacchiroli, MISC competition 2011 <http://mancoosi.org/misc>. Results announced...
  • P. Abate, R. Di Cosmo, R. Treinen, S. Zacchiroli, MPM: a modular package manager, in: Proceedings of the 14th...
  • P. Abate, R. Di Cosmo, R. Treinen, S. Zacchiroli, Dependency solving: A separate concern in component evolution...
  • P. Abate, A. Guerreiro, S. Laurière, R. Treinen, S. Zacchiroli, Extension of an existing package manager to produce...
  • J. Argelich, D. Le Berre, I. Lynce, J. Marques-Silva, P. Rapicault, Solving Linux upgradeability problems using boolean...
  • C. Bozman, Converting Eclipse Metadata into CUDF. Technical report 5. Mancoosi project, 2010...
  • R. Di Cosmo, S. Zacchiroli, Feature diagrams as package dependencies, in: SPLC: Software Product Lines Conference,...
  • EDOS Project Workpackage 2 Team, Report on Formal Management of Software Dependencies, EDOS Project Deliverable Work...
  • EDOS Project Workpackage 2 Team, Report on Formal Management of Software Dependencies. EDOS Project Deliverable Work...
  • P.C. Fishburn

    Lexicographic orders, utilities and decision rules: a survey

    Management Science

    (1974)
  • M. Gebser, R. Kaminski, T. Schaub, Aspcud: a linux package configuration tool based on answer set programming, in:...
  • G. Jenson et al.

    An empirical study of the component dependency resolution search space

  • K.K. Lau et al.

    Software component models

    IEEE Transactions on Software Engineering

    (2007)
  • Cited by (17)

    • From virtualization security issues to cloud protection opportunities: An in-depth analysis of system virtualization models

      2020, Computers and Security
      Citation Excerpt :

      Approaches such as Cappos et al. (2008) introduce package signature mechanisms to deal with man-in-the-middle attacks, as well as package alterations. Complementarily, Abate et al. (2013) exploits solving techniques for the satisfiability problem (SAT) to cope with dependency solving issues. In the case of unikernels, the software management is performed outside the virtual machines, reducing the consequences of its flaws to the supported appliance.

    • Identifying impact of software dependencies on replicability of biomedical workflows

      2016, Journal of Biomedical Informatics
      Citation Excerpt :

      It includes concepts on business and technical layers ranging from services, process steps, and data exchanged between the steps, down to the technical infrastructure, including hardware, software, and files. An example extension is the ontology on software dependencies based on CUDF (Common Upgradeability Description Format) [25], which provides detailed modelling support for various relations, such as dependencies or conflicts between packages. In this paper we refer to the above described structure as context meta model, while to the instances of it as context model.

    • Flexible and Optimal Dependency Management via Max-SMT

      2023, Proceedings - International Conference on Software Engineering
    View all citing articles on Scopus
    View full text