Linux package management

Updated: 2005-03-02
Created: 2003

A less schematic discussion of many of the same concepts is available here.

How to map packages onto filesystems

  1. A package contains a collection of names and files.
  2. So does a filesystem; indeed a package contains a mini filesystem.
  3. Traditional method: just copy the package into the filesystem.
  4. Problems:
    • Cannot uninstall easily.
    • Overlaps among packages undetected.
    • Overwrites because of overlaps cannot be undone.
  5. Why is this bad?
    • Security.
    • Lack of reproducibility for mass installs.
    • Lack of reinstallability, especially configs.


Bear in mind the concept of unavoidable functionality: it must be implemented, the only choice is whether in the computer or the user's head (e.g. spooling, compilers, ...).

  1. Cascading package merging, undoable. Reasons:
    • Patches/upgrades.
    • Configuration: default config, site config, host config, each in a separate package.
  2. Very simple requirements, implementation hard:
    • Add package to filesystem.
    • Given a package, list all files in it (by filesystem, which includes listing not installed its because overriden).
    • Given a filesystem (which can be just a simple file), list all packages in it (this includes listing all files that are not in any package).
    • Remove package from filesystem, restoring state before adding (undoing overrides).
  3. Additional requirements:
    • List set of package installation prerequisite capabilities.
    • List set of package provided capabilities.
    • Handle very large numbers of packages and files.

The problem with overlapping packages

  1. Overlaps must be detected.
  2. Overlapped files must be saved on install.
  3. Saved overlaps must be restored on uninstall.
  4. This can happen to several levels.
  5. Sophisticated state tracking.
  6. What about partially overlapping files? Not a problem to be solved at this level; thus the trend towards splitting files into directories.

Filesystem structure

  1. A filesystem is a classification system.
  2. It is implemented as a set of names that map (many-to-one) to a set of files.
  3. Directories need not actually be implemented, can be entirely virtual (but beware search permissions). Names don't necessarily have any given structure.

Multidimensional view

  1. Each name can consist of keywords, listed in any order.
  2. usr/lib/emacs/site-init.el same as lib/site-init.el/emacs/usr.
  3. Any set of keywords defines a directory.
  4. Any unique subset of a file's keywords identifies it.
  5. cd changes the set of default keywords.
  6. This (almost completely) solves the package problem.
  7. This does not exist, we need kludges.

Hierarchical view

  1. The set of keywords must be listed in an order given at file construction.
  2. Each different ordering defines a different name.
  3. Which ordering for package installation? In practice two:
    • Package name tree prefix:
    • Merged trees, no leading package name, only categories:

Which view?

  1. Package name leading solves the package problem as such. Can be used to containerize packages.
  2. However, thanks to paths, UNIX/Linux uses the merged tree structure (except for /opt/), with a few trees and subtrees.
  3. How to preserve package ownerships in merged trees?
    • In-band multiple views, via link (hard or symbolic) farms: one canonical (because of overlap restore) package leading view (the depot), one merged view.
    • Out-of-band databases: one canonical merged view, database tracks package ownership.
  4. In-band solves to a large extent the package problem, but has other problems:
    • Hard links don't span partitions, don't apply to directories, cpio not suitable.
    • Symbolic links are ugly, inefficient, fragile.
  5. Out-of-band requires a lot of extra work. Does not solve easily the overlap restore problem.

Package structure

  • The single most important part is the list of files that are part of the package. In theory this is all that should be necessary and desirable; any other information might render the package installation stateful.
  • The other issue is whether the paths are absolute or relative, or both. Ideally relative, but rare.
  • Often packages contain pre/post install/remove scripts. These are bad news, because the package state is carried inside more or less invisible or incomprehensible code.
  • They are usually used to edit configuration files or to start/stop daemons, automagically, something that the user should do themselves.
  • Metadata is not bad as long as it does not affect the semantics of the package.
  • It usually includes both package and packager information.
  • Particularly important is version information: for the original sources and for the particular package instance.


Not so much package specific, but package system specific.

Some packages contain only dependencies, usually called virtual packages.

Build requisites
  • If the package manager provides a particular build logic, a package might be tagged with the list of packages that must be installed in order to build it.
  • Absolutely essential for distribution builders.
  • It creates a number of very tricky situations.
Runtime requisites
  • List of packages or capabilities.
  • Sometimes list of shared libraries (bad idea, consider APIs instead).
Runtime provides
These are most useful if generic: most packages require functionality from another package, not a specific package.

Some solutions

  1. Lots of package formats, there is a converter, Alien, and a related comparison of the package formats it can convert.
  2. Workspace applicatioms try to sidestep the issue by being isolated and each with its own mechanism.
  3. Entire hardware systems have been dedicated to single applications.
  4. Ignore the maintenance problem and buy a new system instead. Shifts the problem to the developers in effect.


  1. The major root of all link farm systems, developed at CMU.
  2. Each volume has a depot subdirectory in which packages are installed with package name leading.
  3. Installation merges into the filesystem by creating symbolic links.
  4. Optimisations: if a filesystem directory contains files from a single package, that is done as a symbolic link to the package directory. This can change if files from other packages have to be installed in that directory.
  5. Easy to list bits of filesystem that are not in any package, and which package any bit of filesystem comes from (both encoded in the symbolic link).
  6. Restoring overlaps requires extra state.

Lots of others ('stow' etc.)

Symbolic link farm.
Another symbolic link farm.
Sumbolic link farm, but quite mad (a single per package wrapper is used to invoke every cmmand in the package).
Slackware package tools
tar archives with scripts and manifest.
Used by the Gentoo distribution (and some BSD ones).
used by the Stampede distribution.

Major Linux ones

  1. All popular package managers are out-of-band; perhaps this is not that good.
  2. RPM is the LSB package manager, so the others matter less; perhaps this is not that good.


  1. Very very badly documented.
  2. Out of band; state is kept in binary format in a Berkeley DB database. Various versions of DB have been used.
  3. Package files are in-band, with a binary header followed by a cpio archive.
  4. Overrides are sort of handled: checksums, and some files may be renamed to .rpmsave (overriden) or .rpmnew (not overriding). Probably its best feature.
  5. Only popular package manager that uses cpio; not a bad choice overall, as tar is a bigger mess, especially with long filenames. Historical reasons too.
  6. All package metadata contained in a .spec file. Poorly designed format, in particular for relocatable packages.
  7. Each distribution defines different .spec file extensions. Because of this and different distribution filesystem layouts, RPM packages are not portable.
  8. LSB supposedly standardises RPM and filesystem layouts.
  9. Conectiva Linux had modified Debian's APT to work with RPM instead of DPKG.


  1. Out of band installation; state is kept as a set of text files, about five files for each package.
  2. Package files are in-band, all belong to an ar archive that contains a couple of tar archives and a tag file.
  3. Very poor implementation choices, in particular the state directory can contain dozens of thousands of files.
  4. More complete dependency management than others.
  5. Clever, but grossly inefficient, frontends.
  6. Can't install multiple versions of a package (and only recently it can install multiple architectures).
  7. Not in the LSB, fortunately.

POSIX/Solaris pkgadd

  1. Used by most proprietary UNIXes, also used at one time by Caldera, which ahs now switched to RPM.
  2. Package is a tar archive with in-band scripts.
  3. Package is first unpacked in a temporary directory, and then copied to its definitive resting place.

Other issues

Package naming

General principle: left to right increasing specificity.

Filesystem layout

It should have major hierarchies (e.g. /, usr) and frameworks/subhierarchies (e.g. TeX, X11R6, grass) in which multiple related (by use of the same libraries or data formats) get merged.

Package building