Computing notes 2018 part one

This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.

[file this blog page at: digg del.icio.us Technorati]

2018 June

180609 Sat: A small example of the UNIX way and other ways

The UNIX way of thinking has been largely lost in cacophony of misunderstandings and unreflecting shortcuts, but it can be still relevant when explained, and recently it occurred to me that part of it can be explained with a simple example, a command line to suspend (SIGSTP) the Firefox web browser (because it leaks memory and website scripts can loop) overnight, with some examples:

killall -STOP firefox

This seems convenient but it is the least UNIXy because:

It only stops some firefox processes because it matches process command lines with its string in some particular way.
That cannot be fixed because the command in monolithic.
It is a special purpose duplicate of kill.

kill -STOP $(pidof firefox)

This is better because it does not use a special case of kill but composes it with something that produces a list of process ids from a filter string.

pidof firefox | xargs kill -STOP

This is the better UNIXy solution: instead of using the shell as a programming language passing arguments as strings, it is uses it properly as a builder of process pipelines, where processing is organized but not not done by the shell. In this case the list of process ids is not expanded in line, which can incur in various issues, but is piped into xargs that is designed to handle safely precisely that case.

It is possible to argue that pidof should be replaced by a more elementary pipeline like:

ps ... | sed 's/^ \+//; s/ \+/\t/g' \
| egrep ':..  [^     ]*firefox' | cut -f 1

But I like better something like pidof as it is a simple single purpose command. However as the pipeline above suggests, pidof as it is is rather not-UNIXy because it embeds a version of ps that is rather limited in functionality, when it should instead be a filter for ps output.

But recent versions of ps already do some filtering, so pidof firefox should be replaced by ps -o pid= -C firefox (or some variant). Of course a ps that contains its own filtering logic is not-UNIXy itself, but rather less than pidof itself, so as a compromise the final version is:

ps -o pid= -C firefox | xargs kill -STOP

180602 Sat: Economies of scale are economies of inflexibility

Discussing the topic of economies of scale in computing infrastructure I was arguing that beyond a fairly low level of scale they become quite small, as the high prices of computing infrastructure-as-a-service from the big providers demonstrate.

The question is why that is so, and the answer is important in general: economies of scale as such as a rule don't exist, scale by itself does not bring any greater efficiency. What brings higher efficiency is specialization, that is economies of inflexibility.

Consider making car body panels or door frames: you can make them manually or by using a huge press (videos: 1, 2) that flattens a steel sheet in the desired shape in one go. This enables production at scale. It can be argued that this is also enabled by production at scale, because the cost of the gigantic press is only justified for long production runs. But this is begging the question: which is why would a high scale of production justify the cost of the press, when it would not if the press had the same efficiency as manually hammering a sheet of metal into shape. Having a huge press instead of many workers, or several small presses, would not justify the cost of a big press.

But a large press has better efficiency, not because of the larger scale of production, but because it is highly specialized: given a mould for a car door it can only flatten steel ingots into that car door frame, unlike a worker or a smaller press, and it can take weeks to make a news mould and fit the press with it. In other words it is not the scale of the production, or even the scale of the press in itself, that delivers the benefit, but the high specialization of the press that allows construction of a very efficient machine.

Note: a huge press is more efficient than a smaller one because only a huge pressure on a large surface can flatten an ingot into a car door frame in one go.

But high specialization usually results in high inflexiblity, thus a large production run enables ther cost of vcery efficient but inflexible machine, where many smaller runs don't make that investment worthwhile, even if the more expensive specialized system would be more efficient in technical terms.

Note: that it is specialization that results in greater efficiency, rather than scale itself, was recognized by Adam Smith who called it division of labour in his famous example of a pin factory. But assigning the various steps of manufacturing a pin to specialized worker is inflexible, and thus justified only by relatively large production runs of identical pins, and once every specialized worker is fully busy, there are no more economies of specialization to achieve, doubling the number of workers just doubles the output.

The same applies even more strongly to computing infrastructure, and its opposite too :-):

Computing infrastructure is already extremely specialized to begin with, because of the existence of a few hardware (AMD 64 bit/Intel 32 bit platform) and software (MS-Windows, Linux/Android, Mac OSX) platforms and related components, which get manufactured at stupendous scale enabling extreme specialization of their manufacturing processes, from which all customers benefit pretty much equally (the Shenzen price applies to any medium-large customer).
The extreme uniformity of a few platforms not only enables high production specialization, it also prevents high specialization of use, and thus economies of scale in their use, because the resulting systems are all pretty much the same: the computers (both hw and sw) in a data center with 1,000 systems are going to be pretty much the same as those in one with 10,000 systems or 100,000 systems, because the components they are built from are being manufactured in production runs of millions a year.
Some economies of specialization are going to be reaped by building slightly different computers from the same hw and sw components everybody else uses, but then they are not going to be huge. Some big internet companies have produced customized systems out of standard mass-market parts, but this has reduced their flexibility considerably, and probably generated fairly small savings.
In the special case of cloud computing for IaaS specialization of the systems is particularly hard to achieve as they must be as generic as possible to be compatible with any workload, so economies of scale don't apply very much to the systems. There are claims that economies of specialization are possible in the data center building itself, but beyond a fairly small scale electricity supply, cooling systems, and their maintenance cannot be specialized further.

2018 March

180331 Sat: With UTF 'grep -i' is very expensive

This has been noted elsewhere: when LC_CTYPE includes a UTF character set grep -i is very expensive as per these cases:

$ cat $FILES | time sh -c 'LC_CTYPE=en_GB.UTF-8 grep -i aturana'
Command exited with non-zero status 1
14.94user 0.03system 0:15.00elapsed 99%CPU (0avgtext+0avgdata 2492maxresident)k
0inputs+0outputs (0major+210minor)pagefaults 0swaps
$ cat $FILES | time sh -c 'LC_CTYPE=C.UTF-8 grep -i aturana'
Command exited with non-zero status 1
14.96user 0.15system 0:15.12elapsed 99%CPU (0avgtext+0avgdata 2292maxresident)k
416inputs+0outputs (10major+205minor)pagefaults 0swaps

$ cat $FILES | time sh -c 'LC_CTYPE=en_GB grep -i aturana'
Command exited with non-zero status 1
0.63user 0.05system 0:00.70elapsed 98%CPU (0avgtext+0avgdata 2288maxresident)k
0inputs+0outputs (0major+200minor)pagefaults 0swaps
$ cat $FILES | time sh -c 'LC_CTYPE=C grep -i aturana'
Command exited with non-zero status 1
0.61user 0.04system 0:00.66elapsed 98%CPU (0avgtext+0avgdata 1912maxresident)k
0inputs+0outputs (0major+181minor)pagefaults 0swaps

$ cat $FILES | time sh -c 'tr A-Z a-z | grep aturana'
Command exited with non-zero status 1
0.33user 0.37system 0:00.38elapsed 184%CPU (0avgtext+0avgdata 2424maxresident)k
0inputs+0outputs (0major+280minor)pagefaults 0swaps

The time does not depend on the language locale but on the character code. What is manifestly happening is that with UTF grep is lowercasing with a conversion table that includes all upper case characters in the enormous UTF code space, and that is quite expensive indeed, something a bit unexpected like the previous case where using the en_GB collating sequence resulted in non-traditional sorting order.

Given that I very rarely care about lowercasing other than as the traditional tr A-Z a-z equivalent I have decided for now to set LC_CTYPE to C, as well as LC_COLLATE and LC_NUMERIC.

180330 Fri: Chromium adds extended attributes to downloaded files

While duplicating somew directories with rsync -i -axAXHO (where the AX options request the duplication of attributes and extended attributes) to a filetree of a type that does not support extended attributes I was surprised to see that some files had extended attributes. It turns out that:

If you download and save a file with Chromium (even in incognito mode), it saves potentially sensitive metadata in a way that's completely unknown to almost all users, even highly technical ones:
user.xdg.referrer.url: ...
user.xdg.origin.url: ...

Which is entirely undocumented, and may even be useful. I also saw that the Baloo indexer for KDE may add the user.baloo.rating extended attribute, but that is at least documented.

180313 Tue: "The true cost of hosting in the cloud"

Entertaining article about cloud and colocation hosting costs which is briefly summarized as:

So how much does hosting cost these days? To answer that apparently trivial question, Dyachuk presented a detailed analysis made from a spreadsheet that compares the costs of "colocation" (running your own hardware in somebody else's data center) versus those of hosting in the cloud. For the latter, Dyachuk chose Amazon Web Services (AWS) as a standard, reminding the audience that "63% of Kubernetes deployments actually run off AWS". Dyachuk focused only on the cloud and colocation services, discarding the option of building your own data center as too complex and expensive. The question is whether it still makes sense to operate your own servers when, as Dyachuk explained, "CPU and memory have become a utility", a transition that Kubernetes is also helping push forward.

Just the premise that building your own datacenter is to say the least contentious: for a small set of a few dozen servers it is usually very easy to find space for 1-2 racks, and for larger datacenters there are plenty of places on industrial estates that are suitable. The costs can be quite low, because contrary to myth the big hosting companies don't have significant economies of scale or scope. But the mention of Kubernetes is a powerful hint: the piece promotes Kubernetes, and in effect does so by assuming workloads for which Kubernetes works well. Indeed the argument applies to workloads that work well with Kubernetes, which is not necessarily many:

Kubernetes helps with security upgrades, in that it provides a self-healing mechanism to automatically re-provision failed services or rotate nodes when rebooting. This changes hardware designs; instead of building custom, application-specific machines, system administrators now deploy large, general-purpose servers that use virtualization technologies to host arbitrary applications in high-density clusters.

That applies only in the simple case where systems are essentially stateless and interchangeable, which is the simple case of most content delivery networks. But in that case using a large server and partitioning it into many smaller servers each with essentially the same system and application seems to me rather futile, because Kubernetes is an effect a system to manage a number of instance of the same live CD type of system.

When presenting his calculations, Dyachuk explained that "pricing is complicated" and, indeed, his spreadsheet includes hundreds of parameters. However, after reviewing his numbers, I can say that the list is impressively exhaustive, covering server memory, disk, and bandwidth, but also backups, storage, staffing, and networking infrastructure.

Precise pricing is complicated, but the problem is much simplified by two realizations: usually a few aspects are going to dominate the pricing, and anyhow because of the great and different and changing anisotropy of many workloads and infrastructures precise pricing of an infrastructure is futile, and round up by a factor of 2 is goping to be an important design technique, to a point.

For servers, he picked a Supermicro chassis with 224 cores and 512GB of memory from the first result of a Google search. Once amortized over an aggressive three-year rotation plan, the $25,000 machine ends up costing about $8,300 yearly. To compare with Amazon, he picked the m4.10xlarge instance as a commonly used standard, which currently offers 40 cores, 160GB of RAM, and 4Gbps of dedicated storage bandwidth. At the time he did his estimates, the going rate for such a server was $2 per hour or $17,000 per year.

That three year assumed lifetime is absiurdly short for many organizations: they will amortize the systems over 3 years if tax laws allows it because it saves tax (if the organization is profitable), but they will keep it around for usually much longer. The cloud provider will do the same, but still charge $2 per hour as long as they can.

So, at first, the physical server looks like a much better deal: half the price and close to quadruple the capacity. But, of course, we also need to factor in networking, power usage, space rental, and staff costs.

And that is the big deal: for some business models the cost of electricity (both to power and cool) is going to be bigger than the price of the system. But at the same time within limits the incremental cost will be much lower than the average cost, and in some others much bigger, bigger costs can be highly quantized (for example incremental costs can be very low while an existing datacenter is being filled up and very high if a new datacenter has to be built). The advantage or disadvantage of cloud pricing is that incremental and average cost are always the same.

Which brings us to the sensitive question of staff costs; Dyachuk described those as "substantial". These costs are for the system and network administrators who are needed to buy, order, test, configure, and deploy everything. Evaluating those costs is subjective: for example, salaries will vary between different countries. He fixed the person yearly salary costs at $250,000 (counting overhead and an actual $150,000 salary) and accounted for three people on staff.

That $250,000 per person labour cost seems very high to me, and might apply only to San Francisco based staff even in the USA.

Dyachuk also observed that staff costs are the majority of the expenses in a colocation environment: "hardware is cheaper, but requires a lot more people". In the cloud, it's the opposite: most of the costs consist of computation, storage, and bandwidth.

The system and network administrators are needed whether using a cloud system or a colocated or in-house system, because the cloud vendors do system administration of the virtualization platform, which is invisible to users, not of the virtualized systems.

The difference is between spinning up or installing systems, so the additional labour costs of a colocated or in-house system are almost entirely in procurement and physical installation and maintenance of the network and systems infrastructure, and they are not large.

The software defined configuration of a set of virtual networks and systems is going to be much the same whether they are virtual or physical, lots and lots of program code.

That of course applies even more so to the system and network administration of a large and complex cloud platform, and the related costs are going to be part of the price quoted to customers, where for colocated or in-house systems they need not exist, unless the organization decide to create their own large and complex private cloud infrastructure, which is a self inflicted issue.

Staff also introduce a human factor of instability in the equation: in a small team, there can be a lot of variability in ability levels. This means there is more uncertainty in colocation cost estimates.

Only for very small infrastructures.

Once the extra costs described are factored in, colocation still would appear to be the cheaper option. But that doesn't take into account the question of capacity: a key feature of cloud providers is that they pool together large clusters of machines, which allow individual tenants to scale up their services quickly in response to demand spikes.

Here there are several cases: predictable and gentle long term changes of demand don't require the on-the-spot flexibility of large pools of cloud resources, they can be handled with trivial planning of hardware resources.

What cloud resources can handle within limits is unpredictable changes in resource usage, or predictable but quite large (but not large compared to the cloud ecosystem) ones. Unfortunately in the real world those are relatively rare cases except for small amounts of demand.

Note: variability of capacity and demand happens all the time in the case of shared infrastructures of all types, whether is it is the electricity grid or transport systems. For example if usage is predictably very variable for everybody in specific periods, for example supplying shops for the end-of-year shopping season, there is no way around it other than building massive overcapacity to handle peak load, or accept congestion. For electricity grids there is a big difference between sources of firm capacity providers and dispatchable generation to balance variable non-dispatchable generation, such as wind farms or to handle variable demand.

Self-hosted servers need extra capacity to cover for future demand. That means paying for hardware that stays idle waiting for usage spikes, while cloud providers are free to re-provision those resources elsewhere.

Only if aggregate demand is much larger than individual demand, an individual demands are uncorrelated. In practice virtual infrastructure providers need to have significant spare dispatchable capacity too.

Satisfying demand in the cloud is easy: allocate new instances automatically and pay the bill at the end of the month.

Easy does not mean cheap, and if the cloud provider needs to have significant dispatchable capacity it will not be cheap.

So the "spike to average" ratio is an important metric to evaluate when making the decision between the cloud and colocation.

It is not just the "spike to average" ratio it is also whether that ratio across could users are correlated or uncorrelated. For example the "spike to average" ratio of my house heating demand is huge between winter and summer for some reason, but I suspect that for the same reason most people withing several hundred kilometers of me will have the same "spike to average" ratio at the same time.

Note: electricity grids could be built that bridge regions on opposite longitudes to balance day and night demand or on opposite latitudes to balance winter and summer demand, but that might be really expensive. A worldwide computer grid can and has been built, at a cost, but latency issues prevent full capacity balancing.

Colocation prices finally overshoot cloud prices after adding extra capacity and staff costs. In closing, Dyachuk identified the crossover point where colocation becomes cheaper at around $100,000 per month, or 150 Amazon m4.2xlarge instances, ...

That is a pretty low point but the comparison with m4.2xlarge instances is ridiculously unfair to the cloud case: a big reason to create a virtual cloud infrastructure is that medium-size servers somewhat more cost efficient than small servers, so buying a few large physical servers to run many smaller virtual servers may be cost effective, if many small servers are what the application workload runs best on. But a cloud infrastructure is already virtualized, so it makes sense to rent small instances sizes, not the same size of the physical servers one would otherwise buy.

He also added that the figure will change based on the workload; Amazon is more attractive with more CPU and less I/O. Inversely, I/O-heavy deployments can be a problem on Amazon; disk and network bandwidth are much more expensive in the cloud. For example, bandwidth can sometimes be more than triple what you can easily find in a data center.

Storage on the cloud is usually very expensive indeed, and while networking can be expensive it is usually expensive only outside the cloud infrastructure, for example having web frontends cloud hosted and storage backends elsewhere. This is influenced by the cost of virtualization, and the anisotropy of storage capacity.

So in the end a cloud rented systems can be cost effective when these conditions apply:

The workload has wide variability and that is both small compared to the cloud infrastructure and uncorrelated with the variability of other workloads on the same cloud.
The workload is fairly isotropic so it can be easily mapped onto generic virtual resources.
Engineer labor costs are very high at Silicon Valley levels.
It is important to minimize the difference between incremental and average computing costs at all times.
The workload is mostly CPU and network bound, and it can be entirely hosted by the same cloud provider, or traffic with parts outside it are small.
It cannot possibly matter that a third-party virtual infrastructure is potentially a giant undetectable rootkit.
Quick setup times at low levels of scale matter very much.
Turning a capital expense into operating expenses is advantageous for some tax or financial reason.

As a rule I think that all those conditions are rarely met for any significant (100 and over) sets of systems, unless quick setup or turning capital into operating expenditure really matter. The major case in which the technical conditions are met is for a public-content distribution network, but then large ones use as a rule not third party virtualized infrastructures, but lots of physical hardware colocated around the world.

He also emphasized that the graph stops at 500 instances; beyond that lies another "wall" of investment due to networking constraints. At around the equivalent of 2000-3000 Amazon instances, networking becomes a significant bottleneck and demands larger investments in networking equipment to upgrade internal bandwidth, which may make Amazon affordable again.

High bandwidth networking is falling in price and large cloud providers don't have significant economies with respect the midsize or large organizations.

But more importantly a scale of the equivalent of 2000-3000 Amazon instances the infrastructure is so huge that having an in-house data center is pretty much always the most cost-effective solution, because of flexbility and being well beyond the scale at which make-versus-buy resolves favourably for buying. Even if the costs and complexities of maintaining a vast software defined infrastructure are likely to be bigger than that difference.

The only reason why an organization was reported as having:

2,000 servers from Amazon Web Services (AWS), Amazon.com’s cloud arm, and paid more than $10 million in 2016, double its 2015 outlay.

That $10 million per year can pay for a very nice set of data centres, quite a lot of hardware, quite a bit of colocated servers too, and use of a third-party public-content distribution network, and getting to 2,000 servers in the cloud instead could only be due to the short-term ease for each project in the organization to just add an extra line item to their operation expense budget that for each project was not particularly noticeable but across all project built up to a big number, and indeed the same source said that it was:

the largest ungoverned item on the company’s budget ... meaning no one had to approve the cloud expenses.

180312 Mon: Another view of overloading and infix overloading in Rust

So yesterday I summarized with some examples and analysis the weird way in which the Rust programming language allows function name overloading, and added some notes on infix notation.

I have in the meantime found that almost at the same time another blogger has written a post about overloading in Rust. The post is somewhat interesting even if the terminology and some of the concepts are quite confused, starting with that between functions, their names, and methods

However it is quite interesting in abusing the trait system: it is possible to create a dummmy type like struct O; (with no fields), and then write this (much simplified from the original blogger):

use std::fmt::Debug;

struct O;

trait FOverloading<T,U>
{
  fn f(t: T,u: U);
}

impl FOverloading<i32,f32> for O
{
  fn f(t: i32,u: f32)	{ println!("t {:?} u {:?}",t,u); }
}

impl FOverloading<bool,char> for O
{
  fn f(t: bool,u: char)	{ println!("t {:?} u {:?}",t,u); }
}

fn main()
{
  O::f(1i32,2.0f32);
  O::f(false,'c');
}

Update 180313: To be more explicit, the following is not accepted, the introduction of two traits or a pointlessly generic trait is essential, even if the two definition of f are identical to those above which are accepted. Another way to say this is that impl of functions is not allowed, but indirectly via a struct or a trait, and multiple impl of a function with the same name can happen only via impl of a trait.

impl O
{
  fn f(t: i32,u: f32)	{ println!("t {:?} u {:?}",t,u); }
  fn f(t: bool,u: char) { println!("t {:?} u {:?}",t,u); }
}

The obvious point made in my preceding blog is that if f can be overloaded using two templated traits for O, then O and the trait FOverloading are redundant palaver.

To explore infix function call notation, whatis called method invocation in Rust terminology, I have written some simples examples that are also fairly weird:

use std::ops::Add;

struct S1 { i: i32, }
struct S2 { f: f32, }

fn addto_a(v: &mut i32,n: i32) -> ()	{ *v = (*v)+n; }	
/* Fails because non-trait function names cannot be overloaded */
// fn addto_a(v: &mut f32,n: f32) -> ()	{ *v = (*v)+n; }	

/* Fails because 'self' can only be used in 'trait' functions */
// fn addto_b(self: &mut i32,n: i32) -> () { *self = (*self)+n; }
// fn addto_c(n: i32,self: &mut i32) -> () { *self = (*self)+n; }

fn addto_g<T>(v: &mut T,n: T) -> ()
  where T: Add<Output=T> + Copy		{ *v = (*v)+n; () }

trait Incr<T>
  where T: Add
{
  fn incrby_a(v: &mut Self,n: T) -> ();
  fn incrby_b(self: &mut Self,n: T) -> ();
  /* Fails because only the first argument *can* be named 'self' */
  // fn incrby_c(n: T,self: &mut Self) -> ();
}

impl Incr<i32> for S1
{
  fn incrby_a(v: &mut Self,n: i32)    { v.i += n; }
  fn incrby_b(self: &mut Self,n: i32) { self.i += n; }
  // fn incrby_c(n: i32,self: &mut Self) { self.i += n; }
}
impl Incr<f32> for S2
{
  fn incrby_a(v: &mut Self,n: f32)    { v.f += n; }
  fn incrby_b(self: &mut Self,n: f32) { self.f += n; }
  // fn incrby_c(n: f32,self: &mut Self) { self.f += n; }
}

fn main()
{
  let mut s1 = S1 { i: 1i32 };
  let mut s2 = S2 { f: 1.0f32 };

  addto_a(&mut s1.i,1i32);
  // addto_a(&mut s2.f,1.0f32);

  addto_g(&mut s1.i,1i32);
  addto_g(&mut s2.f,1.0f32);

  Incr::incrby_a(&mut s1,1i32);
  Incr::incrby_a(&mut s2,1.0f32);

  /* Fails for infix the first argument *must* be named 'self' */
  // s1.incrby_a(1i32);
  // s2.incrby_a(1.0f32);

  Incr::incrby_b(&mut s1,1i32);
  Incr::incrby_b(&mut s2,1.0f32);
  s1.incrby_b(1i32);
  s2.incrby_b(1.0f32);

  // s1.incrby_c(1i32);
  // s2.incrby_c(1.0f32);
}

It looks to me that the Rust designers created some needless distinctions and restrictions:

There are two distinctions between self-standing functions like addto_a and and functions associated with types via traits like incrby_a and incrby_b):
- Non-template self-standing function names cannot be overloaded; the specific error message is note: `addto_a` must be defined only once in the value namespace of this module which suggests that somehow the Rust designers regard function names associated with different traits to be in different namespaces.
- Templated self-standing function names can be overloaded, but the overloading relies on traits, because implementations can only be specified generically with bodies using trait-specific functions, as in addto_g.
There is a limitation that the special argument name self can only be used in functions associated with traits, even if they can be used in prefix syntax just like self-standing functions, as in both Incr::incrby_b(&mut s1,1i32); and s1.incrby_b(1i32);.
For functions associated with traits, the limitation is that only the first parameter may be called self, and if it is not called self the limitation is that the function name cannot be used in infix notation.

These restrictions and limitations are pretty much pointless and confusing. If there is a deliberate intent it seems to be that the recommended style is:

Virtually all function names should be associated with traits. Free-standing functions are not really a thing in Rust.
Traits document which sets of function names (and signatures) are overloaded with respect to which types.

A commenter points out that requiring overloading and genericity in templates to be via traits constrains the programmer to think ahead about the relationships between function names and types:

There is at least one downside: it cannot be an afterthought.

In C++, ad-hoc overloading allows you to overload a function over which you have no control (think 3rd party), whereas in Rust this is not actually doable.

That being said, ad-hoc overloading is mostly useful in C++ because of ad-hoc templates, which is the only place where you cannot know in advance which function the call will ultimately resolve to. In Rust, since templates are bound by traits, the fact that overloading cannot be an afterthought is not an issue since only the traits functions can be called anyway.

Conceivably that could be an upside, but the designers of Rust don't seem to have made an explicit case for this restriction.

180311 Sun: Overloading in Rust is possible but weirdly complicated

Genericity, whether static (what is called templates by some) or dynamic (very incorrectlly called dynamic typing by many) arises from overloading, where the same function name (sometimes called appallingly the message) can resolve on context to many different function implementations (sometimes called appallingly the method).

After a brief online discussion about overloading in the Rust programming language I have prepared a small example of how overloading and genericity are handled in it, which is a bit weird:

struct S1 { i: i32, }
struct S2 { f: f32, }

fn incr_s1(v: &mut S1)	{ v.i += 1i32; }
fn incr_s2(v: &mut S2)	{ v.f += 1.0f32; }

trait Incr		{ fn incr(self: &mut Self) -> (); }

impl Incr for S1	{ fn incr(self: &mut Self) { self.i += 1i32; } }
impl Incr for S2	{ fn incr(self: &mut Self) { self.f += 1.0f32; } }

fn incr<T>(v: &mut T)
  where T: Incr
{ Incr::incr(v); }

fn main()
{
  let mut s1 = S1 { i: 1i32 };
  let mut s2 = S2 { f: 1.0f32 };

  /* Cannot overload 'incr' directly */
  incr_s1(&mut s1);
  incr_s2(&mut s2);

  /* But 'Incr::incr' can be overloaded */
  Incr::incr(&mut s1);
  Incr::incr(&mut s2);

  /* And so is consequently '.incr' */
  s1.incr();
  s2.incr();

  /* But can overload 'incr' via explicit genericity */
  incr(&mut s1);
  incr(&mut s2);
}

In the above functions incr_s1 and incr_s2 cannot have the same name, so direct overloading is not possible.

However by defining a suitable trait Incr with function signature incr in it the overloading becomes possible if the function name is qualified by the trait name as Incr::incr, fairly needlessly, unless the function name is used in an infix position as .incr in which case the trait name is not (always) necessary.

Note: in the case of the infix notation .incr that is possible only if the first parameter is explicitly called self; also only the first parameter can be called self. Which means that really the rule should be that infix call notation is equivalent to prefix call notation, regardless of the name of the first parameter.

But it gets better: by defining a template function that expands to calls of Incr::incr it is possible to overload the plain name incr itself.

This palaver is entirely unncessary in the general case. However there is a weak case for relating overloaded function names to the function bodies that implement them:

Giving a name to the overloading (the trait name) may help in clarifying what is being overloaded.
Listing in a trait a set of overloaded function names may help document that they are supposedly related.
In the case of Rust the trait relates function signatures as well as names to implementations, which may be a useful restriction.

The latter point about signatures perplexes me most: without thinking too hard about it, that seems quite pointless, because it can be undone as in the example above by creating suitable template functions, and it just leads to verbosity, which seems unnecessary.

But the general impression is that the designers of Rust just assumed that overloading is derived from genericity, when in the general case it is the viceversa.

180308 Thu: The disposable computing situation

It just occurred to me that there is a way to summarize many of my previous arguments about app stores and virtualization and containerization: disposable computing.

In effect the model adopted for consumer level computing by Apple, Google/Android, and MS-Windows is that computing products, whether hardware or software, are disposable "black boxes": no user serviceable parts inside, whether it is replaceable battery packs or updates to the Android system on a tablet, or abandonware applications under MS-Windows.

When the product becomes too unreliable or stops working, instead of fixing it, the user throws it away and buys a news one. That was the model pioneered by the computer games industry both for PC games and console games, and gamification is now prevalent: in effect when a consumer buys a cellphone or a tablet or even a PC they are buying a "use-and-throw-away" game, and iOS or Android or MS-Windows 10 and their core applications are currently games that consumers play.

Disposable products have conquered consumer computing, and have been taking over enterprise computing too: the appeal is based on doing away with maintenance costs, and the ease of disposing of old applications and systems with a click on the web page of a cloud provider account.

The difficulty is that for both consumers and enterprises the disposable computing model only really works for a while, because actually the requirements of both people and businesses are much longer than disposable product lifetimes:

If a tablet does not get software updates for a few years and some bits of it start failing it is usually feasible to buy a new one, but what about the data on it?
If an application has too many dependencies that make it hard to keep it updated and secure, it is often feasible to get a new similar one, but what about the migration project involving retraining its users and moving the data?

Sure the data can go to the cloud, or one can just move to a cloud application but that just shifts the long term maintenance burden, does not eliminate it, and there is considerable churn also among cloud storage and application. Also moving data and retraining from one cloud provider to another can be very, very hard and expensive, as most providers try to discourage that to achieve lock-in.

If it were easy to just throw away old messy stuff and buy new pristine stuff then major well managed IT organizations would not be buying very expensive Windows 2003 extended service packs still...

180304 Sun: Replacing a flash SSD because it has become old

Six years ago I bought a 256GB Crucial M4 flash SSD for my desktop that has ended up in my laptop as my main storage unit for that period. Six years is a long time, and just-in-case I replaced it with a new Toshiba Q300 pro of the same capacity. It has pretty much the same speed (limited by the SATA2 interface) and characteristics. I put the M4 in my older laptop that I keep as a spare, and it is still working perfectly, and has had no problems, no bad sectors, and has only used up 6% of its endurance:

leaf# smartctl -A /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-116-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   050    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   001    Old_age   Always       -       48774
 12 Power_Cycle_Count       0x0032   100   100   001    Old_age   Always       -       1212
170 Grown_Failing_Block_Ct  0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   001    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   001    Old_age   Always       -       130
173 Wear_Leveling_Count     0x0033   094   094   010    Pre-fail  Always       -       180
174 Unexpect_Power_Loss_Ct  0x0032   100   100   001    Old_age   Always       -       167
181 Non4k_Aligned_Access    0x0022   100   100   001    Old_age   Always       -       109 26 82
183 SATA_Iface_Downshift    0x0032   100   100   001    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   001    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   001    Old_age   Always       -       0
189 Factory_Bad_Block_Ct    0x000e   100   100   001    Old_age   Always       -       81
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       0
195 Hardware_ECC_Recovered  0x003a   100   100   001    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   001    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   001    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   001    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   001    Old_age   Always       -       0
202 Perc_Rated_Life_Used    0x0018   094   094   001    Old_age   Offline      -       6
206 Write_Error_Rate        0x000e   100   100   001    Old_age   Always       -       0

I find it quite impressive still. Actually the Q300 Pro has a couple of small advantages: the FSTRIM operation seems to have much lower latency, and it has a temperature sensor. It also costs 1/4 of what the M4 cost 6 years ago, but even for flash SSDs the cost decrease curve has flattened in recent years.

Part of the reason I switched is not only precaution about a very critical (even if I do a lot of backups) component getting old, but because my spare old laptop still had a (pretty good) rotating disk, and doing package upgrades on that it very slow, as they require (because of not very good reasons)) a lot of random small IOPS. With a flash SSD my occasional upgrades of its root filetree are way faster. There is also the great advantage that flash SSDs are very resistant to mechanical issues like falling inside a dropped lptaop, so I really want to use disk drives only for backup and inside deskptops for mostly-cold data like my media and software collections.

2018 February

180228 Wed: Optane enterprise SSD from intel

After releasing an 32GB Optane cache module Intel have finally released an Optane SSD line as the NVMe card and 2.5in drive model DC P4800X in capacities of 375GB, 750GB, 1500GB, with prices around £3 per GB (VAT included) compared to around £0.5 per GB for similar NVMe flash SSDs.

Intel have published some tests as a Ceph storage unit but they are a bit opaque as the Optane devices are only used as DB/WAL caches/buffers together with 4TB flash SSDs.

On pages 11 and 12 however it is interesting to see that faster CPU chips deliver significantly better IO rates, which means that these solid state devices are in part limited by memory-bus capacity.

On pages 20 and 21 also interesting to see that Intel are preparing to deliver Optane RAM modules, as long anticipated, and confirmation that latencies for Optane are very much better than those for flash SSDs, at around 10µs versus 90µs for a 4KiB read.

There are more details in some third party tests (1, 2, 3) and the (somewhat inconsistent) results are:

For 128KiB sequential IO roughly 2-2.4GB/s at a queue depth of 4, versus 1.2-1.7GB/s for flash SSDs of equivalent capacity.
For 4KiB random IO, whether read or write or mixed, results (at a queue depth of 4) in rates of 300k IOPS for writes and 370k IOPS for reads for the P4800X and 140k IOPS for writes and 40k IOPS for reads for flash SSDs, thanks the much lower access times.

Note: some results indicate writes are rather slower than reads, which is what I would expect given the profile of XPoint memory, so perhaps the tests that give higher write speeds than expected are for uncommitted writes, or else the P4800X has some really clever or huge write buffering.

The numbers are for devices of equivalent capacity, but given the much lower cost of flash SSDs larger and faster SSDs can be used which might have significantly higher sequential rates.

Overall my impression is that very approximately speaking on troughput Optane gives a 2-3 times improvement over flash SSDs for 6 times higher cost, and the improvement in latency is higher than the increase in cost.

180224 Sat: The "whatever" approach to software configuration management

There has been some recent debate (1, 2) how distributions should package unmaintainable applications:

It's now typical for hundreds of small libraries to be used by any application, often pegged to specific versions. Language-specific tools manage all the resulting complexity automatically, but distributions can't muster the manpower to package a fraction of this stuff.

The second sentence is quite wrong: those Language-specific tools don't manage at all that complexity, they simply enable it. All those APIs and dependencies are not managed in any way, an application that uses all those hundreds of randomly installed compents work right just by coincidence, as far as the language-specific tools go. Those language-specific tools are not much better than git clone or the old make install as to managing complexity.

Also no developer is forced to write applications that have hundreds of dependencies, nor vast collections of tiny libraries: applications and libraries can be deliberately designed for cohesiveness and maintainability. Choosing the granularity of a software unit is an important design choice, and developers are free to choose a larger, more convenient granularity over a smaller one.

I can only speak about the Haskell community. While there are some exceptions, it generally is not interested in Debian containing Haskell packages, and indeed system-wide installations of Haskell packages can be an active problem for development. This is despite Debian having done a much better job at packaging a lot of Haskell libraries than it has at say, npm libraries. Debian still only packages one version of anything, and there is lag and complex process involved, and so friction with the Haskell community.

On the other hand, there is a distribution that the Haskell community broadly does like, and that's Nix. A subset of the Haskell community uses Nix to manage and deploy Haskell software, and there's generally a good impression of it. Nix seems to be doing something right, that Debian is not doing.

This is far from surprising: the packaging model of Nix is the old depot model, which is basically the same used by most language-specific tools, and which crucially enables having a whatever approach to software configuration management: having many different siloses of different collections of randomly assembled components that for the time being seem to work together.

things like Docker, that also radically rethink the distribution model. We could easily end up with some nightmare of lithification, as described by Robert "r0ml" Lefkowitz in his Linux.conf.au talk. Endlessly copied and compacted layers of code, contained or in the cloud. Programmer-archeologists right out of a Vinge SF novel.

Well, yes, but the entire business case of Docker is in practice lithification (1, 2) to make application environments into black boxes that are maintained by the developers of the application, that is in practice soon by nobody.

Note: lithification is often achieved by system emulation/virtualization, and D Knuth in Art of Computer Programming wrote that entirely too much programmers' time has been spent in writing such simulators and entirely too much computer time has been wasted in using them, already decades ago.

In ancient times lithification had a jargon name: the dusty deck, that is an application written as a pack of punched cards, consolidating lots of modules in one pack, that usually gathered dust and would be occasionally re-run when needed.

A current trend in software development is to use programming languages, often interpreted high level languages, combined with heavy use of third-party libraries, and a language-specific package manager for installing libraries for the developer to use, and sometimes also for the sysadmin installing the software for production to use. This bypasses the Linux distributions entirely. The benefit is that it has allowed ecosystems for specific programming languages where there is very little friction for using libraries written in that language to be used by developers, speeding up development cycles a lot.

That seems to me a complete misdescription, because by itself doing language specific package management is no different from doing distribution specific package management, the rationale for effective package management is always the same.

My understanding is that things now happen more like this: I'm developing an application. I realise I could use a library. I run the language-specific package manager (pip, cpan, gem, npm, cargo, etc), it downloads the library, installs it in my home directory or my application source tree, and in less than the time it takes to have sip of tea, I can get back to developing my application.

This has a lot less friction than the Debian route. The attraction to application programmers is clear. For library authors, the process is also much streamlined. Writing a library, especially in a high-level language, is fairly easy, and publishing it for others to use is quick and simple.

But the very same advantages pertained to MS-Visual Basic programmers under MS-Window 95 over 20 years ago, and earlier, because they are not the advantages of using a language-specific package manager that downloads the library, installs it in my home directory, but of some important simplifications:

Using a single-language environment.
Using languages that are usually interpreted within an abstract virtual machine.
Creating applications that work only on the system on which they are developed or an exact clone.

Simply avoiding all build, portability, maintainability issues does result in a lot less development friction (1, 2) and the famous book The mythical man-month several decades ago already reported that simply developing an application in ideal circumstances is easy, but:

Finally, promotion of a program to a programming product requires its thorough documentation, so that anyone may use it, fix it, and extend it. As a rule of thumb, I estimate that a programming product costs at least three times as much as a debugged program with the same function.

Moving across the vertical boundary, a program becomes a component in a programming system. This is a collection of interacting programs, coordinated in function and disciplined in format, so that the assemblage constitutes an entire facility for large tasks. To become a programming system component, a program must be written so that every input and output conforms in syntax and semantics with precisely defined interfaces. The program must also be designed so that it uses only a prescribed budget of resources — memory space, input-output devices, computer time.

Finally, the program must be tested with other system components, in all expected combinations. This testing must be extensive, for the number of cases grows combinatorially. It is time-consuming, for subtle bugs arise from unexpected interactions of debugged components.

A programming system component costs at least three times as much as a stand-alone program of the same function. The cost may be greater if the system has many components.

The avoidance of most system engineering costs has always been the attraction of workspace development environments, like LISP machines (1, 2, 3) or the legendary Smalltalk-80 or even the amazing APL\360 (1, 2) or across the decades of using GNU EMACS and EMACS LISP: developing single language prototype applications in optimal and limited conditions is really quite easier than engineering applications that are products.

Note: it is little known in proper IT culture, but there is a huge subculture in the business world of developing, selling, or just circulating huge MS-Excel spreadsheets with incredibly complicated and often absurd internal logic. This is enabled by MS-Excel being in effect a single language prototyping environment too.

Developing cheaply single language unengineered prototypes can be a smart business decision, especially for startup where demonstrating a prototype is more important than delivering a product, at least initially.

The problem for systems engineers with that approach, whether it was a smart business choice or just happened, is that inevitably some such applications become critical and later turn into abandonware and engineering in the form of portability and maintainability has to be retrofitted to them, or they have to be frozen.

The freezing of unmaintainable legacy prototypes is an ancient issue, there have been reports of such applications for IBM 1401 systems running in a frozen environments inside an IBM/360 emulator running inside an emulator for a later mainframe.

I wish I had spent less time unraveling bizarre issues with whatever applications written frictionlessly for R, Python", Matlab, Mathematica. But the temptation for developers (and their managers) to take development shortcuts and then tell systems engineers deal with it is very large, also because it dumps a large part of the IT budget of an organization on the system engineering team. I suspect that containerazation, where whatever applications are isolated withing environments that are then pushed back to developers, is a reaction to such dumping.

Going back to GNU/Linux and other distributions, the story with them is that they are by necessity not just single interpreted language workspaces, but they are collections of very diverse system components, and a single distribution packages may contain code from several languages customized for several different system types to be compiled to several different architecture types. Traditional packaging is a necessity for that.

180223 Fri: Consolidation of infrastructure and economies of scale or scope

I have mentioned before that the economic case for virtualization rests in large part on 4U servers being cheaper per unit of capacity then smaller servers but not by much, and that infrastructure consolidation beyond that does not offer much in terms of economies of scale or scope.

Perhaps that needs explaining a bit more, so I'll make the classic example of economies of scale: in order to make car body parts the best method is a giant and expensive press that can shape in one massive hammer action an ingot of steel into a body part, like a door. Anything smaller and cheaper than that press requires a lot more time and work; but conversely anything bigger does not give that much of a gain. Another classic example is a power station: large turbines, whether steam or water powered, have significant better cost/power rations than smaller ones, in part I suspect because the central area is a smaller proportion of the total inlet area.

In computing (or in finance or in retail) such effects don't happen beyond a low scale: sure, 4U servers are rather cheaper than desktops per unit of computing capacity delivered, and (not much) larger cooling and power supply systems are more efficient than pretty small ones, but these gains taper off pretty soon. Wal*Mart and Amazon in retail have similar profiles: their industrial costs (warehousing, distribution, delivery) are not significantly lower than their smaller competitors, they simply accept smaller margins on a bigger volume of sales.

Are there economies of scope in computer infrastructures? Perhaps if they are very highly uniform: software installation on 10,000 nearly identical computers is rather parallelizable and thus costs pretty no much more than on 1,000 in terms of manpower, but very highly uniform infrastructures only work well with embarassingly parallelizable workloads, like content delivery networks, and even so very highly uniform infrastructures often don't stay such for long. There is the problem that heterogeneous workloads don't work well on homogeneous systems.

However in some corporate sectors I am familiar with there is a drive to consolidate several heterogeneous workloads on unified homogenous infrastructures via techniques like containers and virtualization, even if there is no obvious demonstration that this delivers economies of scale or scope. In part this is because of corporate fads, which are an important influence on corporate careers, but I think that there is an important other motivation: cutting costs by cutting service levels, and obfuscating the cut in service levels with talk of economies of scale or scope.

Consider the case where rather 6 different IT infrastructure each with 200 servers and 8 system engineers each are consolidated into a single one with 800 servers running 1,200 virtual machines or containers and 36 system engineers.

In that case there are likely zero economies of scale or scope: simply service levels (system responsiveness, system administrator attention) have been cut by 1/3 or more. Because a 200 server infrastructure with a 8 system engineers is probably already beyond the level at which most economies of scale or scope have already been mostly used.

Such a change is not a case of doing more with less, but doing less with less, and more than linearly so, because while the hardware and staff have been cut by 1/3, the actual workload of 1,200 systems has not been cut.

The inevitable result is that costs will fall, but also service levels will fall, and with reduced hardware and staff for the same workload the only way to cope will be to cut maintenance and development more than proportionately, which is usually achieved by freezing existing infrastructures in place, as every change becomes more risky and harder to do, that is doing rather less with less.

Doing rather less with less is often a perfectly sensible thing to do from a business point of view, when one has to make do with less resources and doing rather less does not have a fatal effect on business, and can be reversed later. Obfuscating the swiotch to doing rather less with less with an apparent change of strategy can also be useful from a business point of view, as appearances matter. But sometimes I have a feeling that is not a deliberate choice.

Consolidation can be positive at a fairly low scale, where there can be economies of scale and scope: consider 1,000 diverse servers and 50 system engineers, divided into 25 sets with 40 servers and 2 system engineers each: that might be highly inconvenient as:

Teams of 2 cannot cover each other very well (what if both fall ill at the same time.
Such small teams may have times of very high workload and much less busy times as distributing peaks and troughs across such a small team is harder than for a larger team.
1-2 racks of servers may be quite expensive to build a dedicated computer room for. But then viceversa they may not need a dedicated room if they are chosen to be quiet and running cool, as in most business environments there is some spare space.

There might be a case to consolidate a bit, into 5 groups each with 200 servers and 10 (or 8 perhaps) system engineers each (so that coverage becomes a lot easier), but I think that it would be hard to demonstrate economies of scale or scope beyond that.

180115 Mon: Availability zones as a symptom

Having mentioned previously the additional cost and higher complexity of both public and private virtualized infrastructures, they arise in large part from a single issue: that the mapping between dependencies of virtual systems and hardware systems is arbitrary, and indeed it must be arbitrary to enable most of the expected cost savings of treating a generic hardware infrastructure as an entirely flexible substrate.

Since the mapping is arbitrary resilience by redundancy becomes unreliable within a single cloud: some related virtual instances might well be hosted by the same hardware system, or or by some hardware systems attached to the same switch, etc.

It is is possible to build virtualized infrastructures that expose hardware dependencies to the administrators of the virtual layer and that allow them to place virtual infrastructure onto specific parts of the hardware infrastructure, but this soon becomes unmaintainably complex, as the hardware infrastructure changes, or removes most of the advantages of virtualized infrastructures, if the placements are static, so that the hardware layer can no longer be treated as an amorphous pool of resources.

Therefore most virtual infrastructures choose a third, coarser approach: availability zones (as they are called in AWS) which are in practice entirely separate and independent virtual infrastructure, so that the placement of virtual systems can be in different zones for redundancy.

In theory this means that services within a zone can arbitrarily fail, and only inter-zone replication gives a guarantee of continuity. This leads to extremely complex service structures in which all layers are replicated and load-balanced across zones, including the load balancers themselves.

To some extent this is not entirely necessary: several public virtual infrastructures give availability guarantees within a zone, for example at most 4 hours a year of outage or 99.95% availability, but these are per virtual system not per zone, so if a virtual system depends on several other virtual systems in the same zone, it can happen that they fail at different times. Also it is expensive: for a given availability target the hardware infrastructure must contain a lot of redundancy everywhere, as its virtual guests can indeed be anywhere on it.

So availability zones are a symptom of the large cost of making the hardware substrate a consolidation of hardware systems into a generic virtual infrastructures: the infrastructures both have to be generically redundant within themselves, and yet they still need wholesale replication to provide redundancy to services.

Running services on physical systems instead allows case-by-case assessment of which hardware and software components need to be replicated, based on their individual requirements, probabilities of failure, and times to recovery, and in my experience with a little thinking not much complexity is required to achieve good levels of availability for most services. So virtualized infrastructures seem to be best suited to two cases, that often overlap:

Services that don't really need continuity as such, but simply replication, for example content delivery services such as web frontends, where the failure of a small number of them impacts a small number of users that can then just reload the page.
Cases where it is cheaper to create highly complex resilient configurations in software for worst-case scenarios (that is across availability zones) than looking at combined dedicated hardware and software resilience.

But the first case can be easily handled without virtualization, and the second is very rare and expensive in its complexity, yet consolidation into virtualized infrastructures turns most applications setups into that.

In a case I saw a significant service uised by thousands of users daily was split onto three pretty much unrelated systems for workload balancing and for ease of maintenance, and each system was designed for resilience without much redundancy, except for the database backend.
The result was a service that was very highly available, at an amazingly low cost in both hardware and maintenance, and maintenance was made that much easier by its much simpler configuration than had it been "continously available". Since it was acceptable for recovery from incidents to take an hour or two, or even half a day in most cases, running a partial service on a subset of the systems was quite acceptable and happened very rarely.

We also ignore that there are as a rule quite small or no benefits on cost by increasing scale by consolidation, either at the datacentre level, or at the hardware procurement and installation level, and businesses like Google that claim savings at scale achieve them by highly specializing their hardware and datacentres, to run highly specialized applications. Other companies like Amazon, for the inconvenience of having to deal with redundancy at the level of availability zones, have been charging considerably more than the cost of running a private non virtualized infrastructure.

As with storage large amorphous pools of resources don't work well, primarily because of huge virtualization overhead and the mismatch between homogenous hardware and heterogenous workloads which are highly anisotropic. I suspect that their popularity rests mainly on a fashion for centralization (in the form of consolidation) regardless of circumstances, and the initial impression that they work well when they are still not fully loaded and configured, that is when they are most expensive.

The virtualized infrastructures offered by third party vendors have the same issues only bigger, but I suspect that they are popular because they allow clever CFOs a class of capital expenses into current expenses, much the same as leasing does, and that sometimes is for them worth paying a higher price.

180213 Wed: Skype containerization for Linux

A blog post by Canonical reports the opinions of the developers of Skype for distributing their binary package in the Snap (1, 2) containerized format (descended in part from Click) sponsored by them, similar to the Docker format (but I reckon Snap is in some ways preferable) with arguments that look familiar:

The benefit of snaps means we don’t need to test on so many platforms – we can maintain one package and make sure that works across most of the distros. It was that one factor which was the driver for us to invest in building a snap.

In addition, automatic updates are a plus. Skype has update repos for .deb and rpm which means previously we were reliant on our users to update manually. Skype now ships updates almost every week and nearly instant updates are important for us. We are keen to ensure our users are running the latest and greatest Skype.

Many people adopt snaps for the security benefits too. We need to access files and devices to allow Skype to work as intended so the sandboxing in confinement mode would restrict this.

These are just the well known app store properties for systems like Android: each application in effect comes with its own mini-distribution, often built using something equivalent to make install and that mini-distribution is fully defined and managed and updated by the developer, as long as the developer cares to do so.

For users, especially individual users rather than organizations, it looks like a great deal: they hand over the bother of system administration to the developers of the applications that they use (very similarly to virtualized infrastructures).

Handing over sysadmin control to developers often has inconvenient consequences, as developers tend to underestimate sysadm efforts, or simply don't budget for them if the additional burden of system administration is just not free. This is summarized in an article referring to an ACM publication:

In ACMQueue magazine, Bridget Kromhout writes about containers and why they are not the solution to every problem.

Development teams love the idea of shipping their dependencies bundled with their apps, imagining limitless portability. Someone in security is weeping for the unpatched CVEs, but feature velocity is so desirable that security's pleas go unheard.

Platform operators are happy (well, less surly) knowing they can upgrade the underlying infrastructure without affecting the dependencies for any applications, until they realize the heavyweight app containers shipping a full operating system aren't being maintained at all.

Those issues affect less individual users and more organizations: sooner or later many systems accrete black-box containers that are no longer maintained, just like legacy applications running on dedicated hardware, and eventually if some sysadms are left they are told to sort that out, retroactively, in the worst possible conditions.

180211 Sun: VLAN tags define broadcast domains with no internal links

VLAN tags effectively just turn broadcasts into multicasts, including the initial flooding to discover a neighbor. That is not too bad, but they are also used to classify hosts into subsets that have almost no internal connectivity, which is the opposite of what is desirable.

Apart from the added complexity, that means that very little traffic happens within a VLAN, and almost all traffic is among VLANs.

Unless of course the servers for the clients on a VLAN are on the same VLAN, which means that most servers will be on most VLANs.

2018 January

180127 Sat: Experiences of being hacker'ed

I was recently asked about having been hacker'ed, where some hacker affected some system I was associated with, and my answer was perhaps a long time ago. Indeed I had to think about it a bit more and try to remember.

Before giving some details the usual premise: this is not necessarily because of superhuman security of the systems that I use, it is mostly because they don't have valuable data or other reasons to be valuable (for eample I keep well clear of Bitcoin (Don't own cryptocurrencies, Kidnappers free blockchain expert Pavel Lerner after receiving $US1 million ransom in bitcoin) or online finance and trading, and therefore hackers won't invest big budgets to get them, so that diligent and careful practice and attention to detail mean that low budget attempts are foiled, because low budget attacks are targeted at low hanging fruits (trhe systems of people who are not diligent and careful, of which there are plenty).

So the only major case I can remember was around 15 years ago when a web server was compromised by taking advantage of a bug in a web framework used by a developer of an application running on the server. This allowed the hackers to host some malware on the server. While I was not that developer (but honestly I might well have been, I would not have done either a full security audit of every framework or library I would use), I was disappointed that it took a while to figure out this happened, because the hacker had taken care to keep the site looking the same, with only some links pointing to the malware.

There have been minor cases: for example even longer ago a MS-Windows 2000 system was compromised using a day-0 bug while I was installing it (just bad luck) at someone else's home and I guess in part because downloading the security updates after installation too so long, but this had no appreciable effect because I realized it was hacked (for use as a spam bot) pretty soon during that installation. There have also been a handful of cases where laptops or workstations of coworkers were hacked because they clicked on some malware link and then clicked Run, and either the antivirus had been disabled or the malware was too new.

In all cases the hacks seemed to originate from far away developing countries, which is expected as it is too risky to hack some systems in the same jurisdiction (or allied) where the hacker resides, given that the Internet gives access to many resources across the world, which has provideded many good business opportunities for developing countries, but also for less nice businesses.

As to targeted operations, that is high budgets hackers from major domestic or allied security services or criminal organisations, I am not aware of any system I know having been targeted. Again because I think it has never been worth it, but also because high budgets allow use of techniques, like compromised hardware, microscopic spycams or microphones to collect passwords, firmware compromised before delivery, that are pretty much undetectable by a regular person without expensive equipment. If there is a minuscule spy chip inside my keyboard or motherboard or a power socket or a router I would not be able to find out.

180113 Sat: Tidy cable storage and tagging

My preference is for cable and equipment labeling and tiday storage, also because when there are issues that need solving that makes both troubleshoting and fixing issue easier and quicker. Having clear labels and tidy spare cables helps a lot under pressure. Therefore I have been using some simple techniques that I wish were more common:

Cable tieing, both for storage and on live systems, with hook-and-look (velcro) tapes. It is possible to buy strips, but I find more convenient to buy a roll of tape and cut it to measure. It is usuall much better to wrap with the hook side inside, and the look side outside, as the loop size is rather less likely to attach to something. Of course a lot of people use traditional nylon ties for that and they are cheaper, but velcro is cheap enough, reusable, and removable without damaging the cable. Also nylon cable ties are very inappropriate for fibre cables.
Transparent plastic bags (e.g. food bags 1, 2) for storing cables, tied with plastic coated wire twist ties, or sometimes I use clear plastic food boxes This prevents cables with tangling with other cables. Cables can then be stores in boxes, and one can just pull out the single cable needed. Tieing cables with velcro for storage achieves that too, but it is slightly less convenient (velcro from one cable can end up attached to another cable).
Putting labels on cables and sockets. For this I like label printer tape and page markers or label tape (1, 2) for handwritten labels, both possibly secured by a bit of adhesive tape. At one site I worked the standard was to use very nice (and somewhat expensive) nylon label tags but I also use as label carriers clothespins for the more resilient type of cable (power, copper network, usually).

As to how to label I am aware of the standard practice in mechanical and electrical engineering to label both cables and sockets with an inventory number, and a central list of which cable to connect to which socket, but I don't like it for computer equipment because:

That requires keeping up-to-date a potentially very large number of cable-socket mappings in the centra list, which often falls rather behind reality.
Having to access a central list to install cables or verify cables is both time consuming and error prone, especially during issues where there is pressure.
The practice in electrical and mechanical engineering is based on cabling arrangement changing very little, for example after a building or engine are built, and a central list may help audit cabling arrangements if they are quite dangerous (e.g. power distribution), but that does not apply to most computer cables.
Things become more complicated if cables are not symmetric, and ends are different.

So my usual practice is to label both cables and sockets with matching labels, and possibly different different labels on each end, and the labels be descriptive, not just inventory numbers. For example a network connection may have a label with host name and socket number on the host side for both socket and cable end, and host name and switch name and port number on the switch side. This means that verifying the right connections are in place is very easy, and so is figuring out which cable plugs into which socket, which minimizes mistakes under pressure. The goals are: make clear what happens when a cable is disconnected, make clear which cable to disconnect to get a given effect, make clear which cable needs to be reconnected.

For me it is also important to minimize connections specificity, so for example I usually want to ensure that nearly all downstream ports on a switch are equivalent (on the same VLAN and subnet) so that they don't need to be tagged with a system name, and the switch end of a downstream cable does not need to be tagged with a specific switch port. So for example I usually put on a switch something a label such as ports 2-42 subnet 192.168.1.56 and hostname and subnet on the switch end of downstream cables.

In order to have something like a central list of cable and socket connections I photograph with a high resolution (and with large pixels and a wideangle lens) digital photocamera the socket areas, ensuring that most of the labels be legible in the photograph. This is not as complete as a cable and socket list, but it is usually a lot more accurate, because much easier to keep up-to-date.

Note: however I do use explicit inventory numbers and a central list in come more critical and static cases, usually in addition to meaningful matching labels.

180106 Sat: The popular version control systems and Breezy

I had long had the intention to complete my draft (from 2011...) of a blog post on the four most popular distributed version control systems (monotone, Mercurial, Bazaar, git) but over the years both monotone and Bazaar have lost popularity, and currently only git and Mercurial (among distributed version control systems, Subversion and even CVS are still popular) are still widely used (with git being more popular than Mercurial), at least according to Debian's popcon report:

That's sad as both monotone and Bazaar have some very good aspects, for example, but there are a lot of contrasting opinions for example there is some criticism about both monotone and Bazaar, mainly that they were much slower than git and Mercurial, especially for large projects, as their storage backends are somewhat slower than the one of git on typical disks; secondarily that monotone is too similar to git, and that Bazaar has some privileged workflows similar to Subversion.

Note: Bazaar however has been very competitive in speed since the switch to a git-like storage format in version 2.0 and some improvements in handling large directory trees (1, 2) in 2009 and while slower, monotone happened to be under the spotlight as to speed at an early and particularly unfortunate time, and has improved a lot since.

However it seems to me quite good that after a couple of years of stasis Bazaar development has been resumed as in its new Breezy variant even if the authors say that Your open source projects should still be using Git which is something I disagree with, and there is a pretty good video presentation about the Breezy project on YouTube and it makes the same point to be using git, and where Mercurial is ignored.

Note: the two years of stasis have demonstrated that Bazaar, whose development started over 10 years ago, is quite mature and usable and does not need much maintenance beyond occasional bug fixes, and conversion to Python 3 which is one of the main goals of the Breezy variant, and is nearly complete.

The video presentation makes some interesting points, that at least partially contradict that repeated advice, and in my view they are largely based on the difference between the command front-end of a version control system and its back-end:

Indeed the git command line is awkward, with simple operations requiring complex, hard to understand commands. Even worse it was designed with a great confusion: git actually has a very clear state machine and operating philosophy, based on the differences between the stage, the object store, tree, working directory, its commands are named without any reference as to what they manipulate, so for example git commit and git reset operate on the stage and the object store and are to some extent opposite operations, and git checkout and git add operate on the working directory and the stage, and are opposite operations too.
In part by being written (mostly) in Python, in part because they have changed storage layer a few times, in part because of the ambition to be universally usable, both Mercurial and Bazaar support many different back-ends, including that of git. That is their front-ends can work pretty seamlessly on git remote repositories without converting them (1, 2); they also support a wide range of plugins written in Python, quite seamlessly.

While I like Mercurial otherwise, I think that having two storage files (revision log and revision index) per every file in the repository is a serious problem: most software packages have very many small files in part because archives are not used as much as they should be in UNIX and successors like Linux, and most storage systems cannot handle lots of small files very well, never mind tripling their number. While I like git the rather awkward command interface and the inflexibility make it a hard tool to use without a GUI interface and GUI interfaces limit considerably its power.

Bazaar/Breezy can use its native storage format, can use the git storage format, and both grow by commit, and both supporting repacking of commit files into archives, and for old style programmers it can be used at first in a fairly Subversion style workflow.

Note: while most developers seem superficially uninterested in the storage format of their version control system, it has a significant impact on speed and maintainability, as storage formats with lots of small files can be very slow on disks, for backups, for storage maintenance, for indexing. Other storage formats can have intrinsic issues: for example monotone uses SQLite2, which as a relational DBMS tries hard to implement atomic transactions, which unfortunately means that an initial cloning of a remote repository can generate either a lot of transactions or one huge transaction (the well know issue of initial population of a database), which can be very slow.

180106 Mon: Great 8-CPU parallelism during backup

While doing a backup using pigz (instead of pbzip2) and aespipe I have noticed that the expected excellent parallelism happened very nicely, and also mostly from a Btrfs filesystem on dm-crypt that is, with full checksum checking during reads and decryption:

top - 14:46:56 up  2:05,  2 users,  load average: 9.23, 3.74, 1.88
Tasks: 493 total,   1 running, 492 sleeping,   0 stopped,   0 zombie
%Cpu0  : 80.5 us, 12.9 sy,  0.0 ni,  3.6 id,  2.3 wa,  0.0 hi,  0.7 si,  0.0 st
%Cpu1  : 75.4 us, 13.3 sy,  0.0 ni,  0.7 id,  7.0 wa,  0.0 hi,  3.7 si,  0.0 st
%Cpu2  : 82.9 us,  8.2 sy,  0.0 ni,  7.9 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 81.7 us,  9.2 sy,  0.0 ni,  9.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  : 86.8 us,  7.8 sy,  0.0 ni,  5.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 87.6 us,  6.0 sy,  0.0 ni,  5.7 id,  0.7 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 85.5 us,  6.2 sy,  0.0 ni,  5.6 id,  2.6 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 89.1 us,  5.0 sy,  0.0 ni,  5.6 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8071600 total,  7650684 used,   420916 free,       36 buffers
KiB Swap:        0 total,        0 used,        0 free.  6988408 cached Mem

  PID  PPID USER      PR  NI    VIRT    RES    DATA  %CPU %MEM     TIME+ TTY      COMMAND
 7636  7194 root      20   0  615972   8420  604436 608.6  0.1  11:53.45 pts/7    pigz -4
 7637  7194 root      20   0    4424    856     344  81.8  0.0   1:16.04 pts/7    aespipe -e aes -T
 7635  7194 root      20   0   35828  11524    8892  11.9  0.1   0:24.68 pts/7    tar -c -b 64 -f - --one-file-system +