Computing notes 2018 part one

This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.

[file this blog page at: digg del.icio.us Technorati]

2018 February

180213 Wed: Skype containerization for Linux

A blog post by Canonical reports the opinions of the developers of Skype for distributing their binary package in the Snap (1, 2) containerized format (descended in part from Click) sponsored by them, similar to the Docker format (but I reckon Snap is in some ways preferable) with arguments that look familiar:

The benefit of snaps means we don’t need to test on so many platforms – we can maintain one package and make sure that works across most of the distros. It was that one factor which was the driver for us to invest in building a snap.

In addition, automatic updates are a plus. Skype has update repos for .deb and rpm which means previously we were reliant on our users to update manually. Skype now ships updates almost every week and nearly instant updates are important for us. We are keen to ensure our users are running the latest and greatest Skype.

Many people adopt snaps for the security benefits too. We need to access files and devices to allow Skype to work as intended so the sandboxing in confinement mode would restrict this.

These are just the well known app store properties for systems like Android: each application in effect comes with its own mini-distribution, often built using something equivalent to make install and that mini-distribution is fully defined and managed and updated by the developer, as long as the developer cares to do so.

For users, especially individual users rather than organizations, it looks like a great deal: they hand over the bother of system administration to the developers of the applications that they use (very similarly to virtualized infrastructures).

/

Handing over sysadmin control to developers often has inconvenient consequences, as developers tend to underestimate sysadm efforts, or simply don't budget for them if the additional burden of system administration is just not free. This is summarized in an article referring to an ACM publication:

In ACMQueue magazine, Bridget Kromhout writes about containers and why they are not the solution to every problem.

Development teams love the idea of shipping their dependencies bundled with their apps, imagining limitless portability. Someone in security is weeping for the unpatched CVEs, but feature velocity is so desirable that security's pleas go unheard.

Platform operators are happy (well, less surly) knowing they can upgrade the underlying infrastructure without affecting the dependencies for any applications, until they realize the heavyweight app containers shipping a full operating system aren't being maintained at all.

Those issues affect less individual users and more organizations: sooner or later many systems accrete black-box containers that are no longer maintained, just like legacy applications running on dedicated hardware, and eventually if some sysadms are left they are told to sort that out, retroactively, in the worst possible conditions.

2018 January

180127 Sat: Experiences of being hacker'ed

I was recently asked about having been hacker'ed, where some hacker affected some system I was associated with, and my answer was perhaps a long time ago. Indeed I had to think about it a bit more and try to remember.

Before giving some details the usual premise: this is not necessarily because of superhuman security of the systems that I use, it is mostly because they don't have valuable data or other reasons to be valuable (for eample I keep well clear of Bitcoin (Don't own cryptocurrencies, Kidnappers free blockchain expert Pavel Lerner after receiving $US1 million ransom in bitcoin) or online finance and trading, and therefore hackers won't invest big budgets to get them, so that diligent and careful practice and attention to detail mean that low budget attempts are foiled, because low budget attacks are targeted at low hanging fruits (trhe systems of people who are not diligent and careful, of which there are plenty).

So the only major case I can remember was around 15 years ago when a web server was compromised by taking advantage of a bug in a web framework used by a developer of an application running on the server. This allowed the hackers to host some malware on the server. While I was not that developer (but honestly I might well have been, I would not have done either a full security audit of every framework or library I would use), I was disappointed that it took a while to figure out this happened, because the hacker had taken care to keep the site looking the same, with only some links pointing to the malware.

There have been minor cases: for example even longer ago a MS-Windows 2000 system was compromised using a day-0 bug while I was installing it (just bad luck) at someone else's home and I guess in part because downloading the security updates after installation too so long, but this had no appreciable effect because I realized it was hacked (for use as a spam bot) pretty soon during that installation. There have also been a handful of cases where laptops or workstations of coworkers were hacked because they clicked on some malware link and then clicked Run, and either the antivirus had been disabled or the malware was too new.

In all cases the hacks seemed to originate from far away developing countries, which is expected as it is too risky to hack some systems in the same jurisdiction (or allied) where the hacker resides, given that the Internet gives access to many resources across the world, which has provideded many good business opportunities for developing countries, but also for less nice businesses.

As to targeted operations, that is high budgets hackers from major domestic or allied security services or criminal organisations, I am not aware of any system I know having been targeted. Again because I think it has never been worth it, but also because high budgets allow use of techniques, like compromised hardware, microscopic spycams or microphones to collect passwords, firmware compromised before delivery, that are pretty much undetectable by a regular person without expensive equipment. If there is a minuscule spy chip inside my keyboard or motherboard or a power socket or a router I would not be able to find out.

180113 Sat: Tidy cable storage and tagging

My preference is for cable and equipment labeling and tiday storage, also because when there are issues that need solving that makes both troubleshoting and fixing issue easier and quicker. Having clear labels and tidy spare cables helps a lot under pressure. Therefore I have been using some simple techniques that I wish were more common:

As to how to label I am aware of the standard practice in mechanical and electrical engineering to label both cables and sockets with an inventory number, and a central list of which cable to connect to which socket, but I don't like it for computer equipment because:

So my usual practice is to label both cables and sockets with matching labels, and possibly different different labels on each end, and the labels be descriptive, not just inventory numbers. For example a network connection may have a label with host name and socket number on the host side for both socket and cable end, and host name and switch name and port number on the switch side. This means that verifying the right connections are in place is very easy, and so is figuring out which cable plugs into which socket, which minimizes mistakes under pressure. The goals are: make clear what happens when a cable is disconnected, make clear which cable to disconnect to get a given effect, make clear which cable needs to be reconnected.

For me it is also important to minimize connections specificity, so for example I usually want to ensure that nearly all downstream ports on a switch are equivalent (on the same VLAN and subnet) so that they don't need to be tagged with a system name, and the switch end of a downstream cable does not need to be tagged with a specific switch port. So for example I usually put on a switch something a label such as ports 2-42 subnet 192.168.1.56 and hostname and subnet on the switch end of downstream cables.

In order to have something like a central list of cable and socket connections I photograph with a high resolution (and with large pixels and a wideangle lens) digital photocamera the socket areas, ensuring that most of the labels be legible in the photograph. This is not as complete as a cable and socket list, but it is usually a lot more accurate, because much easier to keep up-to-date.

Note: however I do use explicit inventory numbers and a central list in come more critical and static cases, usually in addition to meaningful matching labels.

180106 Sat: The popular version control systems and Breezy

I had long had the intention to complete my draft (from 2011...) of a blog post on the four most popular distributed version control systems (monotone, Mercurial, Bazaar, git) but over the years both monotone and Bazaar have lost popularity, and currently only git and Mercurial (among distributed version control systems, Subversion and even CVS are still popular) are still widely used (with git being more popular than Mercurial), at least according to Debian's popcon report:

That's sad as both monotone and Bazaar have some very good aspects, for example, but there are a lot of contrasting opinions for example there is some criticism about both monotone and Bazaar, mainly that they were much slower than git and Mercurial, especially for large projects, as their storage backends are somewhat slower than the one of git on typical disks; secondarily that monotone is too similar to git, and that Bazaar has some privileged workflows similar to Subversion.

Note: Bazaar however has been very competitive in speed since the switch to a git-like storage format in version 2.0 and some improvements in handling large directory trees (1, 2) in 2009 and while slower, monotone happened to be under the spotlight as to speed at an early and particularly unfortunate time, and has improved a lot since.

However it seems to me quite good that after a couple of years of stasis Bazaar development has been resumed as in its new Breezy variant even if the authors say that Your open source projects should still be using Git which is something I disagree with, and there is a pretty good video presentation about the Breezy project on YouTube and it makes the same point to be using git, and where Mercurial is ignored.

Note: the two years of stasis have demonstrated that Bazaar, whose development started over 10 years ago, is quite mature and usable and does not need much maintenance beyond occasional bug fixes, and conversion to Python 3 which is one of the main goals of the Breezy variant, and is nearly complete.

The video presentation makes some interesting points, that at least partially contradict that repeated advice, and in my view they are largely based on the difference between the command front-end of a version control system and its back-end:

While I like Mercurial otherwise, I think that having two storage files (revision log and revision index) per every file in the repository is a serious problem: most software packages have very many small files in part because archives are not used as much as they should be in UNIX and successors like Linux, and most storage systems cannot handle lots of small files very well, never mind tripling their number. While I like git the rather awkward command interface and the inflexibility make it a hard tool to use without a GUI interface and GUI interfaces limit considerably its power.

Bazaar/Breezy can use its native storage format, can use the git storage format, and both grow by commit, and both supporting repacking of commit files into archives, and for old style programmers it can be used at first in a fairly Subversion style workflow.

Note: while most developers seem superficially uninterested in the storage format of their version control system, it has a significant impact on speed and maintainability, as storage formats with lots of small files can be very slow on disks, for backups, for storage maintenance, for indexing. Other storage formats can have intrinsic issues: for example monotone uses SQLite2, which as a relational DBMS tries hard to implement atomic transactions, which unfortunately means that an initial cloning of a remote repository can generate either a lot of transactions or one huge transaction (the well know issue of initial population of a database), which can be very slow.

180106 Mon: Great 8-CPU parallelism during backup

While doing a backup using pigz (instead of pbzip2) and aespipe I have noticed that the expected excellent parallelism happened very nicely, and also mostly from a Btrfs filesystem on dm-crypt that is, with full checksum checking during reads and decryption:

top - 14:46:56 up  2:05,  2 users,  load average: 9.23, 3.74, 1.88
Tasks: 493 total,   1 running, 492 sleeping,   0 stopped,   0 zombie
%Cpu0  : 80.5 us, 12.9 sy,  0.0 ni,  3.6 id,  2.3 wa,  0.0 hi,  0.7 si,  0.0 st
%Cpu1  : 75.4 us, 13.3 sy,  0.0 ni,  0.7 id,  7.0 wa,  0.0 hi,  3.7 si,  0.0 st
%Cpu2  : 82.9 us,  8.2 sy,  0.0 ni,  7.9 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 81.7 us,  9.2 sy,  0.0 ni,  9.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  : 86.8 us,  7.8 sy,  0.0 ni,  5.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 87.6 us,  6.0 sy,  0.0 ni,  5.7 id,  0.7 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 85.5 us,  6.2 sy,  0.0 ni,  5.6 id,  2.6 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 89.1 us,  5.0 sy,  0.0 ni,  5.6 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8071600 total,  7650684 used,   420916 free,       36 buffers
KiB Swap:        0 total,        0 used,        0 free.  6988408 cached Mem

  PID  PPID USER      PR  NI    VIRT    RES    DATA  %CPU %MEM     TIME+ TTY      COMMAND
 7636  7194 root      20   0  615972   8420  604436 608.6  0.1  11:53.45 pts/7    pigz -4
 7637  7194 root      20   0    4424    856     344  81.8  0.0   1:16.04 pts/7    aespipe -e aes -T
 7635  7194 root      20   0   35828  11524    8892  11.9  0.1   0:24.68 pts/7    tar -c -b 64 -f - --one-file-system +