This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.
[file this blog page at: digg del.icio.us Technorati]
Aptitude is probably the best APT front-end in particular because it has a full screen user interface, without it being a GUI, so it is easy to use over a terminal connection, and has a remarkably powerful query language for the package collection it manages, which includes both installed and available packages. In distributions that use DPKG it has the usual limitations the latter has, but that's inevitable.
I have recently used it to clean up a bit the list of packages on one of my systems, so as to eliminate unnecessary packages, and to ensure that packages are appropriately tagged as being automatically or manually selected.
Note: APT and Aptitude are about dependency management, and keep track of which packages depends on which other packages, or are required by them, and which packages have been installed intentionally by the system administrator and which have been installed automatically purely to satisfy the dependencies of other packages. The latter can be automatically removed if no installed package requires them anymore.
The Aptitude queries useful for this are based on using these predicates (and their negations):
essentialthat is part of the minimal installation of the system, and cannot or should not be removed.
shared objectsbut also source or bytecode modules) begin with lib, those containing compile and link time libraries usually end in dev, those containing documentation usually end in -doc, those containing files used by more than one variant of another package usually end in -common, and some of the packages containing executable commands end in -bin, while most have no suffix.
The roots of the dependency graph are essential packages and manually installed packages. Since essential packages should always be present they should usually be marked as manually installed (even if it does not make much of a difference). To mark a package one can use the most common Aptitude commands on the line listing the package, and they are:
Some commands help displaying package lists. Usually I prefer
a rather shallow package list or the New Flat Package List
instead of the deeply nested subsections Aptitude displays by
default, and the commands that change the display are:
Therefore the first check is to use L with ~E~M to list all essential packages that are marked as automatically installed, and use m to mark them manually installed. There should be no packages of the system's architecture in the list selected by the ~E!~i query.
The other check is to use the ~g query to look at installed packages that seem unneeded according to APT, and either purge them with _ or mark them as manually installed if they are intended to be installed. In general unneeded run-time and compile or link-time libraries should be purged.
It may then be useful to review the manual/automatic installed mark for several categories of packages:
~i(!(~R~i|~Rrecommends:~i))(~Rrecommends:!~i)but this usually list many packages, and they must be manually scanned.
metapackagethat should be marked as manually installed, for example kde-plasma-desktop.
For example apache2 was marked automatically installed on one of my systems as it is a dependency of some web based applications, but I want it to be installed regardless of their presence, so I marked it as manually installed.
Here it is often useful to use command r on a package and d on packages requiring them to get an idea of the web of dependencies they are part of.
The overall goal is to ensure that the state of installed packages is well described, so that for example ~g will list packages that can be purged without perplexity and that all such packages are listed by it.
My introduction to DBMSes was based on the original
Ingres
on a PDP-11 and its
Quel
query language that was based on the
relational calculus
and was quite nice, so
I have largely continued to think in those terms even when
events decreed that most DBMSes had to be queried in
SQL.
SQL allows queries that resemble relational calculus, for example against a system inventory database as:
SELECT n_d.dept AS "Dept.", l_u.user AS "User", l.name AS "List", n.snum AS "Serial", n.name AS "Node" FROM nodes AS n, lists AS l, nodes_depts AS n_d, nodes_lists AS n_l, lists_users AS l_u WHERE n.snum = n_l.snum AND n_l.list = l.list AND n_l.list = l_u.list AND n.snum = n_d.snum AND (n_d.dept = ANY ('sales','factory') OR l_u.user = 'acct042') ORDER BY n_d.dept,l_u.user,n.snum,l.name
In the the SELECT and ORDER BY clauses describe the output table, FROM lists the relations over which the calculus will operate (similar to the range of in Quel), and the WHERE defines which tuples will end in which rows of the output table.
Note: that I use table
to
indicate the result, as the result is not a
relation
as it has an ordering of its rows,
while tuples
in a relation are not ordered.
The DBMS is required in the above to figure out that various
constraints, and in particular that some equality predicates
among different tables imply join
rather
than selection
operations.
An important detail in the above query is that equality constraints for joins and equality selections look syntactically identical, while perhaps it would have been better to indicate equality constraints for joins with a different symbol, for example ==.
Anyhow SQL is based more on relational algebra, and while the above calculus-style query is possible, standard SQL-92 does not have a disjunctive equality predicate to indicate outer joins, like for example the Sybase *= to indicate left outer join. They need to be written in a syntax that makes clear that SQL-92 is based on relational algebra, in this fashion (omitting the SELECT and ORDER BY clauses which remain the same):
FROM nodes AS n LEFT JOIN nodes_lists AS n_l ON n.snum = n_l.snum LEFT JOIN lists AS l ON n_l.list = l.list LEFT JOIN lists_users AS l_u ON n_l.list = l_u.list LEFT JOIN nodes_depts AS n_d ON n.snum = n_d.snum WHERE n_d.dept = ANY ('sales','factory') OR l_u.user = 'acct042'
In the query styled as in the above all joins must be indicated almost in an operational way in the FROM clause, and the WHERE clause describes only selection predicates.
Therefore the FROM clause no longer declares the relations over which the query will be defined, but describes the algebra that builds a joined relation starting from a seed relation such as nodes, WHERE as filtering the resulting relation by tuple selection, and SELECT as projecting the resulting relation into a subset of its fields.
As such I have come to think regretfully that it is the best way to write queries within the SQL-92 design. I would prefer a calculus based query language, but SQL-92 and variants are not really that, and using SQL calculus-like syntax may be a bit misleading, in particular as only some joing types may be written in calculus-like form. Some variants of SQL like Sybase and Oracle have them, and the Sybase one is fairly reasonable (the Oracle one is quite tasteless), but even those fundamentally go against the grain of SQL-92.
In this vein I have indented the FROM clause above to show the logical structure of the algebraic operations involved (as Edsger Dijkstra suggested such a long time ago), as the SQL-92 join algebra does not allow parenthesis to imply grouping. So here each level of nesting describes a relation to be joined with the next level of nesting.
The indentation aims to suggests tt that the nodes relation (indentation level 1) is to be joined with nodes_depts and nodes_lists which implicitly defines a new relation (indentation level 2) which is to be joined with the lists and lists_users relations.
SQL-92 allows going the whole way round and adding tuple selection predicates to the tuple joining predicates in the FROM clause, in a fashion like this:
FROM nodes AS n LEFT JOIN nodes_lists AS n_l ON n.snum = n_l.snum LEFT JOIN lists AS l ON n_l.list = l.list LEFT JOIN lists_users AS l_u ON n_l.list = l_u.list LEFT JOIN nodes_depts AS n_d ON n.snum = n_d.snum AND (n_d.dept = ANY ('sales','factory') OR l_u.user = 'acct042')
But I regard this as bad taste, and I am not even sure that the example above is indeed valid, because of the position of the OR l_u.user = 'acct042' condition: where it is in the example is probably in the wrong context, and moving it so the FROM clause seem to have a rather different meaning to me:
FROM nodes AS n LEFT JOIN nodes_lists AS n_l ON n.snum = n_l.snum LEFT JOIN lists AS l ON n_l.list = l.list LEFT JOIN lists_users AS l_u ON n_l.list = l_u.list OR l_u.user = 'acct042' LEFT JOIN nodes_depts AS n_d ON n.snum = n_d.snum AND n_d.dept = ANY ('sales','factory')
In any case I think that joining and selecting predicates are best separated for clarity, as they have rather different imports.
On trying similar queries on some relational DBMS with an EXPLAIN operator all three types are analyzed in exactly the same way, as they are fundamentally equivalent, at least because of the equivalence of relational calculus and algebra.
The
lack of distinct
effective
and real
owners for inodes
in UNIX (and derivatives)
is perhaps its major flaw, but time has revealed some other
significant flaaws.
To be clear I am talking here of flaws in the design of UNIX
qua
UNIX, that is within its
design style, not of the flaws of the UNIX design style
itself.
Among these the first is the lack of a generally used library providing structured files as a commonly used entity. By this I means something like ar files. Even if relatively recently libarchive has appeared, and it has been too late.
Unfortunately structured archive files and easy access to
their member
files is a quite common
requirement, especially in all those cases where data
accumulates in a log
-like way, which are
many.
While in the UNIX design libraries of object (.o) files are archived in ar files (.a) that almost incomprehensibly is almost the only common live use of archives.
The only other common use of archives is both incomplete and shows both the consequences of not using archives and the reason why they are not commonly used other than there not being a ready-made library for them: mbox style mail collections.
Note: while ZIP
archives also exist, they are not commonly used in the
UNIX/Linux culture, and tar archives lack many of the
conveniences of general archive formats, and they mostly some
kind of hibernation
format for filetrees.
In effect an mbox file is an archive of mail messages, and is mostly handled with similar logic: mbox archives are mostly read-only, they are mostly appended to, deleting a member is usually handled by marking the member deleted and then periodically rewriting the archive minus the deleted members.
But there is no convenient command like ar to manipulate mbox files as archives, even if most mail clients allow to manipulate them interactively in some limited fashion.
The mbox archive anyhow has been long on the wane being increasingly replaced by MH/Maildir style filetrees, and that shows one of the reason why a standard archive library has not been written early or widely adopted: it is always possible to replace an efficient archive with a directory linking to the individual members. This scales very badly, but most software projects, whether commercial or volunteer, live or die by a small scale initial demo, so abusing directories as containers or directories as simple databases is an appealing option, one however that is rarely revisited later.
Indeed the database case, where BDB and similar libraries have been available for a long time, and yet many application use directories as keyed small record containers, can invite skepticism as to whether the early availability of an archive handling library would have helped.
But considering the case of the stdio library, which is widely used for buffered IO, I think that if something similar had existed for ar style archives they would have been far more widely used, in the numerous cases where there are mostly read-only collections of small files, for example .h and similar header files, or .man manual page files.
Recent various talks about monitoring systems and in particular Nagios or Icinga have reminded me of a topic that I should have posted here a long time time.
Nagios and Icinga have some very nice way to design a
configuration, which are the orthogonal
inheritance
among node types and dependencies
among them, but also
relationships between
host objects
and
service objects
and between each of these and their service groups.
To indicate such relationships there are two styles, one of which is rather brittle and very commonly used (e.g. 1, 2, 3, etc.) the other is rather elegant and flexible, but is amazingly rarely used.
The first way looks like (I haven't actually tested any of these sketched examples of configurations, and some parts are omitted) this:
define service { use MainStreet-templsvc check_command check_DHCP_MainStreet host_name SV01 service_name DHCP-service-MainStreet service_description service for host configuration } define host { description server for DHCP DNS in Main Street DC use generic-MainStreetServer host_name SV01 parents GW01_gateway parents routerGW02 alias server1.example.org } define host { use generic-MainStreetServer parents GW01_gateway host_name server-2 alias server02.example.org description backup server for DNS in Main Street DC parents routerGW02 } define host { use generic-MainStreetServer host_name Server03 parents GW01_gateway alias S03.example.org description server for IPP in Main Street DC parents router-GW02 } define hostgroup ( hostgroup_name mainst_servers members SV01,server-2,Server03 alias Main St. Servers } define service { service_description service for name resolution use MainStreet-templsvc service_name DNS_Mainst-resolvSV01 host_name SV01 check_command check_DNSr_MainStreet } define service { service_name resolver_server2_DNS_Mainst host_name server-2 use MainStreet-templsvc service_description service for name resolution check_command check_DNSr_MainStreet} define service { host_name server-SV03 name service-CUPS-MainStreet service_description service for printing check_command check_IPP_MainStreet use service-MainStreet }
The above is a typical messy configuration file which has several weaknesses.
The first and major issue is that the list of hosts running a service is explicit in the service object definition. It means that:
The alternative is not to list the services to be checked on a host in the host object definition because that cannot be done, it is even better: to define a hostgroup object to match per each service, and to specify its name in the service object definition with hostgroup_name.
The second and also fairly significant problem is similar that in the hostgroup object definition there is an explicit list of the hosts that belong to the group, with much the same issues. The alternative is to specify an skeleton hostgroup definition and adds its name to to the hostgroups list in the host object definitions for members of the hostgroup.
Another significant issue that there are per-host service object definitions, which does not reflect that the same service may be running on many hosts.
The alternative is to create service definitions that are host-independent, and rely on parameterization to adapt each service check to the host it runs on, as far as possible.
Another significant issue is that the names and aliases of objects do not conform to a systematic naming convention, which means that:
The alternative is systematic naming conventions for objects, for example that all host object names for server should begin with server- so the wildcard server-* indicates them.
Finally, the order, indentation and layout of the object definitions is unsystematic too, and that makes them much harder to compare and maintain.
A different configuration would look like:
define hostgroup { hostgroup_name datacenter-MainStreet alias datacenter "MainStreet" hosts } define hostgroup { hostgroup_name network-MainStreet-A alias network "MainStreet" backbone A hosts } define hostgroup { hostgroup_name network-MainStreet-B alias network "MainStreet" backbone B hosts } define hostgroup { hostgroup_name subnet-192-168-020_24 alias network subnet 192.168.20/24 hosts } define hostgroup { hostgroup_name subnet-192-168-133_24 alias network subnet 192.168.133/24 hosts } define hostgroup { hostgroup_name service-DNS alias service DNS hosts } define hostgroup { hostgroup_name service-DHCP alias service DHCP hosts } define hostgroup { hostgroup_name service-IPP alias service IPP hosts } define service { use service-generic-IP hostgroups servers-DHCP service_name service-DHCP service_description serve network configurations check_command check_dhcp!-u -s $HOSTNAME$ } define service { use service-generic-IP hostgroups servers-DNSr service_name service-DNSr service_description serve DNS resolution check_command check_dns!-H icinga.org -s $HOSTNAME$ } define service { use service-generic-IP hostgroups servers-IPP service_name service-IPP service_description serve printer spooling check_command check_ipp } define host { use server-generic host_name server-generic-MainStreet-AB description server in Main Street DC on both backbones hostgroups + datacenter-MainStreet hostgroups + network-MainStreet-A,network-MainStreet-B hostgroups + subnet-192-168-020_24,subnet-192-168-133_24 parents gateway-01 parents gateway-02 register 0 } define host { use server-generic-MainStreet-AB host_name server-01 alias server1.example.org description server for DHCP DNS in Main Street DC hostgroups + service-DNS,service-IPP } define host { use server-generic-MainStreet-AB host_name server-02 alias server02.example.org description server for DNS in Main Street DC hostgroups + service-DNS } define host { use server-generic-MainStreet-AB host_name server-03 alias S03.example.org description server for IPP in Main Street DC hostgroups + service-IPP }
Please note all the details in the above. One that I did not announce is to define various hostgroups not associated with services, as normal, usually for physical location, power circuit, cooling circuit, physical network connection, subnets, type of host (router, switch, server, desktop, ...), organization. The main purpose is to spot at a glance whether lack of availability of a service or server is shared with others in the same hostgroup, for example because of a failed power circuit. Such hostgroups may be associated with services if a check is possible, for example for power availability.
I typically write distinct configuration files per collection of hosts, where each collection shares hostgroups, some service or host templates, some services, some routers, some switches, some servers, some desktops, usually in that order.
A well written, legible Nagios or Icinga configuration with the style above is both fairly easy to maintain and both very good documentation of an installation, and because of that also makes finding extent and causes of episodes of unavailability much easier.
Over many years while chatting with various people about the failings of various aspects of the UNIX design decisions, there are some that were failings not so much of the architecture but of how it was implemented, or incompletely implemented.
The biggest probably is the
lack of distinction
between effective
and real
ownership of files.
While UNIX processes have both an effective and a real owner, UNIX files only have one owner, which is a combination of the two.
This for example makes it very awkward to transfer ownership of a file to from one owner to another. Obviously the target user obviously cannot be allowed to take ownership on their initiative.
But less obviously the source user cannot be allowed to
give ownership of a file to a target user, because it
would allow them to retain all the advantages of file
ownership, without the disadvantages (such as space
accounting). Because a file can be linked into a directory
only accessible by the source user, and then set with wide
other
permissions.
The obvious solution is to ensure that effective and real ownership be separate, with effective ownership being about access permissions, and real ownership about accounting aspects, with the rule that the real owner can change the effective owner and the permissions, and the effective owner can access the file and change the real owner to themselves.
In this way first the creator of a file can change the permission to none and the effective owner to the target user to initiate the transfer of ownership, and the target user can then accept the transfer by changing the real owner to themselves and then the permissions to acces the file themselves. If the target user does not accept the transfer the source user can always change the effective ownership of the file to themselves.
There is a question whether there should be two sets of permissions for effective and real owners, but I think that only one is necessary and desirable.
If it were just for transfer of ownership the above would be
an important but non-critical internal limitation of the UNIX
architecture, but the its importance is far greater than that,
because it makes the
set-uid
feature of UNIX far more useful and easier to use.
Set-uid appears in two manifestations, one related to
exec
utable
files, the other related to the ability of processes with an
effective user of root to switch effective ownership
to another user and back.
The essential aspect of set-uid for executable files is that it allows for programmatic transfer of data across different users: the target user sets up a set-uid executable file, which when invoked by a process to the source user will change the effective ownership of that process to the target user. Thus transfer of data can be done as:
execs it. it thus acquires the ability to access files accessible to the target user, but only under the control of code written by the target user, thus in a way that is safe for that target user.
The above classic scenario is typical, for example where the target user is a system dæmon or other subsystem, and the source user any other user.
It has several drawbacks, the most important being it depends utterly on writing very carefully the logic of the source and target programs, as it relies on keeping exactly the right file descriptors open in the same process when switching from the source user's executable to the target user's executable.
Having distinct effective and real owners for a file instead allows for a much simpler way to transfer data: put the data in a suitable file, and transfer the ownership of the file.
Consider for example a print spooler running as user lpd and the request by user example to print a file: the lpr command could simply, running set-uid with effective user lpd and real user example, change the effective ownership of the file to lpd (if licit), and then move or copy it to the print spool directory, with the added benefit that all the files to be printed would still have as real owner the source user, and thus be accounted to them.
But there is a much bigger scope for simplification and greater security in something related to set-uid, which relates to the ability of processes with effective owner root to switch temporarily or definitively effective ownership to another user, which because of various subtleties has involved complicated rules and security risks even worse than the executable-file related notion of set-uid.
The root system-call version of set-uid is typically used to do some operation between a source user and a target user mediated by a process owned by root, one that therefore needs to access resources first as the source user and then as the target user.
This could be solved by having a set-uid executable for every possible target user, but this is rather impractical in most cases; rather than doing so it is deemed more practical to have a single set-uid executable to root for getting data from the source user, and let processes with a real owner of root owned processes switch effective owner to any target user, with the following sequence of operations for a transfer from source user to target user:
This can happen with crossing 3 boundaries, and even more riskily it can happen not merely with an executable set-uid to root but with a dæmon started and running as root. For a practical example imagine the sending of messages from one user to another via a spooled mail system, where the message file is created by the source user, is held in a spool file, and then is appended to a mailbox file owned by the target user.
With the ability to distinguish between effective and real ownership of files there is no need to have root itself as an implicitly trusted intermediary, all that needs to happen is to have an explicitly trusted one as in:
Note that at no point there is any need for root as all-powerful intermediary, nor there is any need for the source or target users to trust the intermediary completely because the intermediary user is only effective thanks to the set-uid executables and the source and target files are the only ones belonging to the source and target users that are exposed to the intermediary by having their effective owner set to the intermediary user.
In the UNIX tradition there has been a very partial
realization of the above logic, where the role of intermediary
owning user was simulated by using an intermediary owning
group instead. That is by using the owning group as a
simulated effective user, thus resulting in the common
practice of creating a unique group for every user with the
same name as the user, for that purpose. But in the UNIX
architecture owning groups do not have enough power
and regardless using groups that way is
a distortion of a different concept.
In the above I have not presented not all the details, as
they should be quite natural; for example where I have used
user
that implies user or a process
owned by the user
, and owner
might be not just
the owning user but also the owning group (but I am skweptical
this is needed or useful), and where I have written
file
I have also implied file or
directory or other resource
.
Also some of the transition and other rules can be tweaked one way or another (for example separate permission masks for real and effective owners), and there could be as mentioned different permissions for effective and real owner.
But the overall concept is to generalize the notions of effective and real owner for processes and of the set-uid mechanism to transition between them. In other words it is powerful idea to model any crossing of protection boundaries as two half steps, one initiated by the source protection domain, the second initiated by the target protection boundary, such that in the first permissions but not accountability are transferred by the source to the target, and in the second accountability is transferred by the target from the source, completing the crossing.
While in general I like
Ganglia
and other performance monitoring systems, a strong alternative
to its gmond agent
, at least for
UNIX-style systems, is
collectd.
I was looking at the list of front-ends for it, to display graphically the information it collects, and it was depressing to see that most are web UIs and so many based on PHP which I dislike.
But fortunately there is a GUI program for that kcollectd which is quite good, compact and does not rely on running a web server, if only for local users, and a browser, and is very responsive graphically, and it is also well designed. This is a confirmation of my preference for traditional GUI interfaces instead of web UI ones. Also since there is little which is specific to collectd in kcollectd as it just displays RRD files, it can probably be used with most other performance monitoring systems that write to RRD files.
Anyhow if one wants to setup performance monitoring quickly on a single system locally, for example a desktop or a laptop, the combination of collectd and kcollectd is very convenient and informative and requires minimal work, in part because the default collectd configuration is useful.
At various times I have used the Ganglia monitoring system, most notably in relation to some WLCG cluster.
Ganglia is quite a good system and I haven't paid it much attention as for typical small to middling setups it sort of just works; its only regrettable feature is that it has a web UI and one written in PHP.
To have a nice printed reference I have bought Monitoring with Ganglia O'Reilly 2012-11 by Matt Massie, Bernard Li, Brad Nicholes, Vladimir Vuksan, Robert Alexander, Jeff Buchbinder, Frederiko Costa, Alex Dean, Dave Josephsen, Peter Phaal, Daniel Pocock.
This book documents the much more that can be done with Ganglia and rather comprehensively, comprehensively written, and even well written, and it is an excellent resource for understanding how it is designed and how ti can be configured and deployed.
The final section on deployment examples is particularly useful not just on how it can be deployed, but also on how best it can be deployed.
There some niggles, one of which is the book is really an anthology of distinct articles on various aspects of Ganglia, even if authorship of the articles sometimes overlaps. The overall plan of the anthology however is cohesive, so the articles flow into each other somewhat logically.
The main niggle however relates to what Ganglia is and how it is described, and this is reflected in the anthology structure of the book.
Initially Ganglia is described as a very scalable monitoring system, good for system populations of dozens of thousands to hundreds of thousands of systems, which is all agreeable, thanks to its highly parallel multicasting data collection default mode of operation.
The default mode of operation is run the gmond
dæmon on each system which both collects data from that
system and aggregates data from every other system in its
cluster
which is achieved by multicasting
the measurements collected by each system to all other systems
in the cluster, which are then condensed into an overall state
by the gmetad dæmon and that is fetched and
viewed by the PHP-based web UI.
Obvious this is note very scalable, while being very resilient, as each system has a full record of the measurements for every other system in the cluster, but this very replication and the traffic that realizes it makes for less scalability.
Scalability is achieved by two completely different means, as subsequent article illustrate very clearly:
Thus the main, and not very large, defect of the book is the confusion that is likely to happen to the reader when Ganglia as described as being scalable for some reasons at the beginnging, and then after a while those reasons are discounted and replaced by much better ones.
Overall it is a very good, useful, well written book with excellent setup advice.
My notes from the second day of presentations at the FLOSS UK Spring 2013 event:
foreign tableswhich are views on a foreign database. This includes query optimization.
NoSQLmodes of operation, like location, like key-value store, and is faster at that than MongoDB and full text indexing.
API, DHCP, C++ or Python 3, but very annoying configuration. It does not recursive service for now.
write-round,
write-throughor
write-back.
My notes from the first day of presentations at the FLOSS UK Spring 2013 event which began with a commemoration of pod (Chris Cooper):
fireball modewhich uses 0MQ instead of SSH.
The FLOSS UK Spring 2013 event was preceded by a day of tutorials.
My notes from the Ansible tutorial:
delegateto another host, e.g. to provision VM disks.
My notes from the Juju tutorial:
Quite amused today to find that some of the build systesm mentioned in an interesting survey and notably tup (1, 2) take an interesting approach to figuring out the dependencies between parts of the configuration to be built: instead of using a static tool like makedepend which examines the sources of the components to build, they trace dynamically which files are accessed the first time a full build is done, using something equivalent to strace -e open (using the system call ptrace) or ltrace -e open (using an LD_PRELOADed wrapper).
Which is something that I use by hand to reverse engineer software package builds and behaviour.
This is interesting for me in several ways, but mostly it is an admission of failure: the failure to design languages and programming frameworks that treat dependencies as explicit and important. The failure of the approach to design purposefully in other words, and the necessity to support programming styles based on slapping stuff together until it sorts of works. Which brings to mind web based applications :-).
The performance of flash memory
(EEPROM)
based storage devices is rather anisotropic, and while the
typical performance is quite good, in some applications the
corner cases matter.
In a recent
group test of some enterprise
flash SSD
the testers commendably paid attention to maximum latencies,
which matter because erasing flash memory can take time, and
anyhow most flash SSD need to do background reorganisation of
the storage layout.
As the graphs make clear there is a huge gap between the average write latency of around 1 millisecond and the worst case latency of between 15 and 27 milliseconds.
It would have interesting to look at the variability of interleaving reads and writes, but there is another interesting test related to write rate (rather than latency) variability, where the result is pretty good as the tester is impressed:
Zeroing in to the one-second averages, as we did with Intel's SSD DC S3700, the P400m performs admirably (although it cannot beat the consistency we saw from the Intel drive). The SSD DC S3700 gave us 90% of its one-second averages within 99% of the overall average. In contrast, only 65% of the P400m's one-second averages fall within 99% of the overall average.
Micron's P400m does much better if you compare the individual data points to the product's specification instead of overall average. In fact, 99.8% of all one-second averages are higher than this drive's write specification.
A few months ago, these results would have been phenomenal. The problem is that Intel's 200 GB contender also achieves its results at a higher throughput.
The group test has also the usual interesting graphs that show how transfer size matters to average transfer rates.
I have decided to have a look at the KDE SC 4 information management components, and I have written a brief summary as a new section in my KDE notes.
Having activated Akonadi and Nepomuk I was astonished to see
that Nepomuk was scanning a large NFS filetree that I had
mounted to set within it a full coverage of
inotify
watchpoints
,
when I had clearly indicated in the relevant settings
that I wanted it to index only my home directory.
So I asked on the relevant IRC channel, and I was told that
by default Nepomuk sets inotify watchpoints on every
directory in the system because it has to monitor every
directory in case a file for which it keeps tags
or ratings
is moved to it, so it can update
its location in the database, as documented in two
KDE issue entries
(1,
2).
Because Nepomuk will build a database of file names only for specifically indicated directories, but will keep tags or ratings for any file anywhere, because even if it restricted setting tags or ratings only to files in the indicated directories, such files could be moved or linked to any other directory.
That to me seems quite an extreme view, because Nepomuk could set inotify watchpoints on the indicated directories only, and on the disappearance of a file from them could just remove from its database the associated tags or ratings.
Alternatively it could store tags or ratings in
extended attributes
which most UNIX/Linux
filesystems support, and even some non-UNIX/Linux ones do. But
I suspect that since KDE is supposed to be cross-platform, and
indeed it is, the Nepomuk implementors preferred a method that
relies on an external database to implement what are in effect
custom extended attributes.
But the idea that Nepomuk is designed to keep track of associated external attributes for files anywhere on a system (with the possible exclusion of removable filetrees) is extreme, as some systems can have hundreds of thousands or dozen of millions of directories, each of which would require an inotify watchpoint.
Since I needed an extra monitor I have been rather uncertain whether to get a 27in LCD or a 24in LCD.
The point for a 27in LCD was to get one with a larger pixel size display like 2560×1440, with a 16:9 aspect ratio, such as a Digimate DGM IPS-2701WPH or a Dell U2713HM.
The point for a 24in LCD was to get a less expensive 16:10 aspect ratio display in a 1920x1200 pixel size. Eventually I decided to get one of the latter in part because I don't like 16:9 aspect ratios, and the 1200 pixel vertical size is (just) sufficient for me. But in part because 27in and larger display are a bit too large for my home computer desk, and a bit too heavy for my my monitor arm.
But mostly because I mostly use the monitor connected to a laptop and most laptops don't support the 2560×1440 pixel size, certainly not over VGA, and likely not over HDMI either.
I would have gone for another Philips but they no longer offer a monitor 24in 1920×1200 model with a wide viewing angle, so I bought a Dell U2412M which at around £220 inclusive of tax seems pretty good to me, the good things being:
The less good things are very few:
anti-aliasedtext, resulting in an excessively thin appearance of text that has not been anti-aliased.
Compared to my other 24in 1920×1200 IPS display, the Philips 240PW9 it is broadly equivalent:
In the usually excellent review and techology site X bit laboratories there is a recent review with intelligent, useful speed tests of an external disk with both USB3 and Thunderbolt interfaces.
Because the default rotating disk drive has a maximum
transfer rate of around 115MB/s the reviewers replaced it with
a flash
SSDs with both SATA2 and SATA3
interfaces capable of up to 400-500MB/s and amazingly both
USB3 and Thunderbolt
support a few hundred MB/s transfer rates
with top read rates of 300-370MB/s and top write rates of
240-270MB/s.
This is quite remarkable and not far from the transfer rates possible with eSATA which remains my favourite high speed interface as it is very likely to be less buggy and it is cheap. The same article makes clear that USB2 is still pretty slow with a top read and write speeds of 35MB/s and 28MBs/, but that is usually adequate for most peripherals, except for storage and video ones.
But USB3 is certainly a good alternative, and increasingly popular, and lower cost than Thunderbolt. It is still USB, and therefore likely to be much buggier especially in corner cases than eSATA, but it will probably be useful to have.
In various discussions about networking I mentioned the XNS internet framework, which was one of the first large scale internetworking standards and was highly popular for decades as it was adopted by several laerge vendors (Novell, Banyan, ...).
It was in many ways better designed than the ARPANET/NCP framework, and as a result inspired many aspects of the TCP/IP network framework that succeeded the ARPANET one, and arguably it was in some ways better than TCP/IP.
The most interesting aspect of XNS is the addressing structure, where XNS addresses are 80 bit long, where the bottom 48 bits uniquely identify each host, and the top 32 bits are a routing hint that identifies the network to which the host is connected.
Such an addressing structure is interesting for several reasons, the principal of which is that the availability of 32 bit long network prefixes and 48 bit long host addresses would have most probably avoided running out of addresses as in the current IPv4 situation.
Because 232 networks is a pretty large number, especially as each network can have an essentially arbitrary number of hosts.
Note that the number of hosts in a network is limited only by the size of the routing tables used in that network, as individual host routes can always be used, and thus a network could have several thousand hosts.
The other interesting aspect of the addressing structure is that the network number is not actually part of the address of a host, it is just a routing hint, and that therefore 48 bit host addresses must be globally unique. Indeed in XNS the prescription was that they should be Ethernet addresses, as there is already a mechanism to ensure their global unicity.
Having host addresses be identically the same as Ethernet
addresses has an interesting advantage that it avoids the need
for neighbor discovery
protocols like
ARP
at least on Ethernet, simplifying protocol implementation.
This is possible because appositely XNS requires that all Ethernet interfaces a host has have the Ethernet address, that is the XNS host address, a somewhat interestin situation.
One other interesting aspect of XNS was that it was based on 576 byte packets, even if the typical underlying medium was Ethernet and supported up to 1500 byte frames; the small maixmum guaranteed packet size also meant that path MTU discovery was not necessary.
Apart from the addressing structure one of the more interesting differences between XNS and TCP/IP is the rather different starting points:
While in some ways I think that the TCP/IP way is more
universal, the XNS architecture matches better how networking
has actually been used for decades, as mesh networks are
exceptionally rare nowadays, and the
Internet is an
internet
of LANs.
During a mailing list discussion about 4 drive RAID sets
someone argued that
raid6 can lose any random 2 drives, while raid10 can't
which seems superficially a good argument.
It is instead a poor argument based on geometry, one of those
I call a syntactic
argument, where the
pragmatics
of the situation are not taken
into account.
The major point in a comparison of RAID6 with RADI10 is that
they have very different performance and resilience
envelopes
which cannot be compared in a
simplistic way.
As to speed (a component of performance) a RAID10 set intact or with 1 drive missing usually is much better than a an equivalent 4 drive RADI6 set intact or with 1 drive missing, and similarly during rebuld, because, for example:
stripefor reconstruction for all reads of blocks on the missing drive, for for a RADI6 set of 4 drives on 25% of reads.
Resilience is also not comparable, in geometric terms because a RAID10 can continue operating despite the loss of more than 2 members of the set as long as they are in different pairs.
Also, in purely geometric terms it is easy to have a RAID10 made of 3-way RAID1s, and that would also have the property of continuining to operate regardless of the loss of any 2 drives.
But it may be objected at this point that would cost a lot more than a RAID6, and then if this objections is relevant it must be because purely geometric considerations, or purely speed based ones. Then one must consider opportunity costs, incuding purchase cost and other environmental factors.
Purely geometric considerations as to resilience in particular are rather misleading, for example because they must be based on the incorrect assumptions that RAID set failure rates are uncorrelated, and independent of the RAID set size and environmental conditions.
But the probability of a further failure in a RAID set depends on its size and that probability is correlated and depends on common modes of failure such as all drives being similar and in a similar environment, which are the worse during all three modes of operation, intact, incomplete and rebuiding:
The extra stresses and especially during incomplete and rebuilding operation can significantly raise the probability of a further drives failure with respect to a similar RAID10 set, and these are inevitably due to the P and Q blocks correlating work across the whole stripe.
The single great advantage of a RAID6 set is that it performs much like a slightly smaller RAID0 set during pure read workloads, and that it offers some degree of redundancy, at the heavy cost of making writes expensive, and incomplete or rebuilding operation very expensive.
Therefore in the narrow number of cases and setups where that performance and resilience envelope fits the workload it is useful. But it is not comparable to RAID10, whether with 2 or 3-way mirroring, and the RAID10 is generally preferable.
I was asked for an opinion on a case of an internal leaf mail server having mail refused by an internal destination server and this is the relevant transcript:
$ telnet smtp.example.com 25 Trying 192.0.2.10... Connected to smtp.example.com. Escape character is '^]'. 220 mail.example.com ESMTP Postfix HELO server1.example.net.com 250 mail.example.com MAIL FROM: <test@server1.example.net.com> 250 Ok RCPT TO: <a.n.user@example.com> 450 <test@server1.example.net.com>: Sender address rejected: Domain not found quit 221 Bye Connection closed by foreign host.
Here the obvious minor issue is that the domain suffix of the name of the sending internal node was mistyped as .example.net.com instead of the intended .net.example.com.
The slightly less obvious minor issue is that on server1 the MTA configuration defaults to append the full name of the node, instead of the name of the email domain example.com, which can be achieve in Postfix with append_at_myorigin or append_dot_mydomain.
But the really bad detail is that the destination MTA, for
which example.com is local
, checks
the validity of the domain of the MAIL FROM: address
at all, because the Postfix parameter
reject_unknown_sender_domain
is set, and that such a parameter defaults to enabled and
exists at all, is rather improper for several reasons:
envelopesource address made sense, the rejection should be in the response to MAIL FROM:, which instead gets a 250 Ok response; the invalidity of the envelope source address is declared in the response to the RCPT TO: request instead.
localaddresses, to prevent open relaying, and not in general; and as a form of authentication for messaging non-local address it is rather weak, and there are specific authentication facilities in SMTP.
spamtraffic, but this is a very improper and futile way to do it. A proper evaluation of whether a message is spam should not be done by the MTA.
While discussing my impression that
LVM2
is mostly
useless except for snapshots
(also
1,
2)
a smart colleague pointed out that there it has one other
useful feature, the ability to move a
LV
from one storage device to another transparently
that is while it is being actively accessed.
That was an intereating point because the overall reasons why
I do not value DM/LVM2 is that I prefer to manage storage
always at the filetree
level, that is by
dealing with collections of directories and files, because
that is what is portable
and well
understood and clear.
In general I find very little reason to have either more than
one partition
per storage device, or even
worse to have a partition spanning multiple storage devices,
unless some level of
RAID
is involved, and RAID usually has significant advantages over
mere volume management.
However admittedly the ability to transparently move a block device is sometimes interesting, just like the ability to snapshot a block device. These are I think two occasionally worthwhile reasons for virtualizing a block device via DM/LVM2.
However the ability to transparently move filetrees is also available from the underrated AFS which can be deployed in pretty large (800TB, 200M files) configurations with the ability to manage over a hundred thousand independent filetrees, and it is also part of the abilities of ZFS BTRFS as they in effect contain something like DM/LVM2, but in a slightly less futile design.
Other than AFS I personally think that moving the contents of
block devices is best done with a bit of planning ahead using
standardized partition
sizes and by using
RAID1 or something similar like
DRBD
because it is rather simpler and more robust.