Linux package management
A less schematic discussion of many of the same concepts is
- A package contains a collection of names and files.
- So does a filesystem; indeed a package contains a mini
- Traditional method: just copy the package into the
- Cannot uninstall easily.
- Overlaps among packages undetected.
- Overwrites because of overlaps cannot be undone.
- Why is this bad?
- Lack of reproducibility for mass installs.
- Lack of reinstallability, especially configs.
Bear in mind the concept of unavoidable functionality: it
must be implemented, the only choice is
whether in the computer or the user's head (e.g. spooling,
- Cascading package merging, undoable. Reasons:
- Configuration: default config, site config, host
config, each in a separate package.
- Very simple requirements, implementation hard:
- Add package to filesystem.
- Given a package, list all files in it (by
filesystem, which includes listing not installed
its because overriden).
- Given a filesystem (which can be just a simple
file), list all packages in it (this includes
listing all files that are not in any package).
- Remove package from filesystem, restoring state
before adding (undoing overrides).
- Additional requirements:
- List set of package installation prerequisite
- List set of package provided capabilities.
- Handle very large numbers of packages and files.
- Overlaps must be detected.
- Overlapped files must be saved on install.
- Saved overlaps must be restored on uninstall.
- This can happen to several levels.
- Sophisticated state tracking.
- What about partially overlapping files?
Not a problem to be solved at this level; thus the trend
towards splitting files into directories.
- A filesystem is a classification system.
- It is implemented as a set of names that map (many-to-one) to
a set of files.
- Directories need not actually be implemented, can be
entirely virtual (but beware search permissions). Names don't
necessarily have any given structure.
- Each name can consist of keywords, listed in any
usr/lib/emacs/site-init.el same as
- Any set of keywords defines a directory.
- Any unique subset of a file's keywords identifies it.
cd changes the set of default keywords.
- This (almost completely) solves the package problem.
- This does not exist, we need kludges.
- The set of keywords must be listed in an order given at
- Each different ordering defines a different name.
- Which ordering for package installation? In practice two:
- Package name leading solves the package problem as
such. Can be used to containerize packages.
- However, thanks to paths, UNIX/Linux uses the merged tree
structure (except for
/opt/), with a few trees
- How to preserve package ownerships in merged trees?
- In-band multiple views, via link (hard or
symbolic) farms: one canonical (because of overlap
restore) package leading view (the depot), one
- Out-of-band databases: one canonical merged view,
database tracks package ownership.
- In-band solves to a large extent the package problem, but
has other problems:
- Hard links don't span partitions, don't apply to
cpio not suitable.
- Symbolic links are ugly, inefficient, fragile.
- Out-of-band requires a lot of extra work. Does not solve
easily the overlap restore problem.
- The single most important part is the list of files that
are part of the package. In theory this is all that should
be necessary and desirable; any other information might
render the package installation stateful.
- The other issue is whether the paths are absolute or
relative, or both. Ideally relative, but rare.
- Often packages contain pre/post install/remove scripts.
These are bad news, because the package state is
carried inside more or less invisible or incomprehensible
- They are usually used to edit configuration files or to
start/stop daemons, automagically, something that the user
should do themselves.
- Metadata is not bad as long as it does not affect the
semantics of the package.
- It usually includes both package and packager
- Particularly important is version information: for the
original sources and for the particular package
Not so much package specific, but package system specific.
Some packages contain only dependencies, usually
called virtual packages.
- Build requisites
If the package manager provides a particular build logic, a
package might be tagged with the list of packages that must
be installed in order to build it.
- Absolutely essential for distribution builders.
- It creates a number of very tricky situations.
- Runtime requisites
- List of packages or capabilities.
- Sometimes list of shared libraries (bad
idea, consider APIs instead).
- Runtime provides
- These are most useful if generic: most packages require
functionality from another package, not a specific
- Lots of package formats, there is a converter,
and a related
of the package formats it can convert.
- Workspace applicatioms try to sidestep the issue by being
isolated and each with its own mechanism.
- Entire hardware systems have been dedicated to single
- Ignore the maintenance problem and buy a new system
instead. Shifts the problem to the developers in effect.
- The major root of all link farm systems, developed at CMU.
- Each volume has a
depot subdirectory in which
packages are installed with package name leading.
- Installation merges into the filesystem by creating
- Optimisations: if a filesystem directory contains files
from a single package, that is done as a symbolic link to
the package directory. This can change if files from other
packages have to be installed in that directory.
- Easy to list bits of filesystem that are not in any
package, and which package any bit of filesystem comes from
(both encoded in the symbolic link).
- Restoring overlaps requires extra state.
- Symbolic link farm.
- Another symbolic link farm.
- Sumbolic link farm, but quite mad (a single per package
wrapper is used to invoke every cmmand in the package).
- Slackware package tools
tar archives with scripts and manifest.
- Used by the
distribution (and some BSD ones).
- used by the
- All popular package managers are out-of-band;
perhaps this is not that good.
- RPM is the LSB package manager, so the others matter less;
perhaps this is not that good.
- Very very badly documented.
- Out of band; state is kept in binary format in a Berkeley
DB database. Various versions of DB have been used.
- Package files are in-band, with a binary header followed
- Overrides are sort of handled: checksums, and some files
may be renamed to
.rpmsave (overriden) or
.rpmnew (not overriding). Probably its best
- Only popular package manager that uses
not a bad choice overall, as
tar is a bigger
mess, especially with long filenames. Historical reasons
- All package metadata contained in a
file. Poorly designed format, in particular for relocatable
- Each distribution defines different
file extensions. Because of this and different distribution
filesystem layouts, RPM packages are not portable.
- LSB supposedly standardises RPM and filesystem
- Conectiva Linux had modified Debian's APT to work with
RPM instead of DPKG.
- Out of band installation; state is kept as a set of text
files, about five files for each package.
- Package files are in-band, all belong to an
ar archive that contains a couple of
tar archives and a tag file.
- Very poor implementation choices, in particular the state
directory can contain dozens of thousands of files.
- More complete dependency management than others.
- Clever, but grossly inefficient, frontends.
- Can't install multiple versions of a package (and only
recently it can install multiple architectures).
- Not in the LSB, fortunately.
- Used by most proprietary UNIXes, also used at one time by
Caldera, which ahs now switched to RPM.
- Package is a
tar archive with in-band
- Package is first unpacked in a temporary directory, and
then copied to its definitive resting place.
General principle: left to right increasing specificity.
- The original RedHat convention was right:
- Package names, same as the original archive.
- Subpackage name, e.g.
- Original archive version number.
- Package version number.
- Bad practices like putting the edition number in the
package name and a
lib prefix have become
popular (Debian, Mandrake).
It should have major hierarchies (e.g.
usr) and frameworks/subhierarchies
grass) in which multiple related (by use of the
same libraries or data formats) get merged.
- A very sad issue. Driven by opportunism.
- It should be based on a careful balancing between path
length and number of files in a directory.
- SuSE and Debian least bad.
- Package building and installing should be different
- RPM does it a bit better: the
.spec file is
self contained and can be used independently. But there can
be many patch files, with improper names.
- RPM still does it wrong: the original archive should not
be part of the RPM. But one can do, and should do,
- DKPG has the original archive, renamed, plus a
metadata file and a single patch file that contains the