Computing notes 2019 part two

This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.

[file this blog page at: digg del.icio.us Technorati]

2019 December

2019 November

191109 Sat: Cloud VMs with long term leasing compared to colocation

AWS announced today a long term leasing offer that reduces considerably the cost per hour of using their virtual machines. With the new pricing a commitment to leasing something like a m4.4xlarge instance (16 cores, 64GiB RAM, 2gb/s storage connection, no storage) would cost instead of $0.80/h $0.3008/h with an Instance Family Savings Plan paid upfront for 3 years, that is just under $8,000 plus taxes, or $16,000 plus taxes for 6 years, if it is on an extended depreciation period. A really nice server like that costs less then $2,000 plus taxes to buy outright. AWS is in effect charging over $2,000 per year plus taxes to host it and give an availability guarantee, when a colocation service may charge around $1,000 a year (but they do not guarantee availability of the server itself) for a server that cost $2,000 plus taxes to buy.

What AWS have done with their new pricing option is to to strip away most of the value of the "flexibility" argument for AWS, because buying up front 3 years of virtual machine time is pretty inflexible. They offer flexibility in that the prepaid time applies to instances in a family, but for any customer with a small or large fleet of systems that is pretty much irrelevant.

Therefore the only rationales for using a service like AWS seems to me to be:

2019 October

2019 September

190908 Sun A big issue with NFS Ganesha and the Ubuntu LTS 16 and 18 kernels

I have been using the NFS Ganesha for a while and recently I discovered a significant issue with it and contemporary (4.4.0 for Ubuntu 16 and 4.15.0 for Ubuntu 18) Ubuntu kernels: it somehow formats responses to the Linux kernel NFSv4 client in such a way that even if correct triggers a bug in the NFS client such that it ignores some entries in the responses to the READDIR and READDIRPLUS operations.

This for example means that rm -rf DIR usually does not remove all files under DIR and itself, but gives an error claiming that the directory is not empty: that's because rm first lists all the entries in that directory, and since some entries are ignored, they are not deleted.

This seems to be an issue with those Ubuntu kernels, as it does not happen under Fedora 30 that uses a 5.2 series kernel, under Ubuntu LTS 18 with a 5.0 kernel, and does not happen with the NFS nfs-kernel-server implementation, which I think is not as flexible and maintainable as NFS Ganesha.

Note: the client side bug seems to have been fixed in 4.17, and depends also on the value of rsize: the larger the less often it happens. A particular example with a directory with 90 entries:

$ uname -a; for N in 1024 2048 4096 6144 8192 12288 16384 32768 65536; do echo -n "$N => "; sudo mount -t nfs -o rw,vers=4,proto=tcp,timeo=10,intr,rsize=$N azara:/scratch /mnt/tmp && ls /mnt/tmp/test | wc -l; sudo umount /mnt/tmp; done
Linux noether 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC
2019 x86_64 x86_64 x86_64 GNU/Linux
1024 => 90
2048 => 81
4096 => 86
6144 => 86
8192 => 88
12288 => 88
16384 => 90
32768 => 90
65536 => 90
190904 Wed NFS v3, v4 and types of identity mapping

In UNIX systems users have both a name and a number, and it is the number that is used for authorization even if it is the name that is used for authentication (several user accounts can share a user number but have different passwords), and each UNIX system instance has a user name space and a user number space that are incommensurable with those of other UNIX systems instances.

Note: here user means eithe user or group, as they are handled similar in UNIX systems, and user number means uid and group numbers means gid.

This means that a user name or number can have completely different meanings on difference UNIX instances, for example correspond to different people.

This poses a problem both for transferring filetrees on media between UNIX system instances, and sharing them over a network: the only sensible action by default is to pretend that the foreign filetree is owned by a special user name and user number that is presumed meaningless.

However it is possible, at a site with a central system administration, to assign the same meanings to the same user names, or the same user numbers, or to different ones, by convention.

In that case NFS v3 and v4 support those conventions, and the way in which they are supported is somewhat peculiar and not entirely well documented, and for NFS v3 it is quite simple:

For NFS v4 there are four different options:

In all these options commonly available implementations don't allow any specific mapping, for example from remote tagged username V@B to local tagged username U@A.

In practice the only useful options are to share on site all usernames to be tagged with the same domain, or to share on a site all usernumbers, with a central repository of account names or numbers, and usually that makes easy to have the same user names and user numbers everywhere.

Note: in order to support use of both NFS v3 and v4 it might seem more important to share the same meaning for user numbers, but if Kerberos is used (as it should) to authenticate NFS users, since it uses only names, names should also be the same. Many NFS v4 Kerberos subsystems allow arbitrary mappings between Kerberos user names and UNIX user names, but that ideally should be avoided.

190901 Sun NFS user/group name or id mapping options

The Linux NFS kernel client uses as the default mapping method the nfsidmap executable, the NFS kernel server uses the rpc.idmapd daemon, and both use the nfsidmap library; the NFS Ganesha server uses that library directly.

There are some interesting non-obvious details about the nfsidmap executable and library:

The NFS server Ganesha has some relevant but somewhat underdocumented settings:

2019 August

190827 Tue A weird change in the semantics of Linux linking

In Linux the specifications of link(2) and rename(2) turn out to be different from the traditional UNIX ones in that they fail not only if the destination is in a different filetree, but also if it is in the same filetree but under a different mountpoint.

This causes trouble with software like GNOME: to move a file to the trash the relevant GNOME module only checks whether the file and the user's default trash folder are in the same filesystem, and fails if they are but under different mount point.

Note: as the link above shows this change dates at least back to 2005, but not only I missed it, so did the GNOME guys for well over a decade.

The same filetree can appear under different mount points in Linux if it is mounted multiple times (which cannot happen in traditional UNIX) or it is --bind mounted (which is not available in traditional UNIX).

This limitation is regrettable because it is very convenient to use --bind mounts to decouple physical from logical filetrees.

In the specific case the naming convention was to have home directories under /home/ and a scratch area under /scratch/ but because of local storage limitations both were actually in the same filetree /data/local/home/ and /data/local/scratch/ and mounted with --bind to their conventional location.

2019 July

2019 June