This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.
[file this blog page at: digg del.icio.us Technorati]
Some weeks ago I upgrade my default web browser to Firefox 57 and so far so good: it works pretty well and it has some significant advantage, and I have found that being forced to change the main add-ons that I wanted, to limit usage of JavaScript and other inconveniences worked out pretty well, as both Policy Control and Tab suspend work well, just as uMatrix Control does.
The major advantage is actually in the enforced switch to
multi-process, that is displaying tabs inside child
processes. Since many web pages are prone to cause
memory increases
and
CPU loops,
this means that often my system freezes
,
which I can then painlessly fix by using the
Linux kernel's Alt-SysRq-f
keysequence that invokes the
OOM
killer logic, that kills the most invasive processes.
Invariably these are Firefox )or other browser) child
processes with tabs displaying troublesome pages, and onm
Firefox 57 it is easy to restart them, and just them. In
previous version of Firefox the OOM killer would kill the main
and only Firefox process, necessitating a slow restart of all
Firefox windows and tabs.
Some of the more common themes in my discussions with system
engineers are about security
and VM
infrastructures, as many of my recent posts here show. In
particular I am a skeptic about the business advantages of
virtualized infrastuctures, and I was trying to make the point
recently that in the unlucky case that a virtualized
infrastructure is mandated, a skeptic is what is needed to
design and operate it, to avoid the many longer term
downsides. Therefore here is a summary of the short-term
advantages of a virtualized infrastructure:
software-defined datacentresand
software-defined networking and storage.
This is my summary of the some of the matching downsides:
devops. Unfortunately developers are not keen on system maintenance and eventually like most smartphones most virtual machines become unmaintained. It is also particularly dangerous that the effective administrative domain of all VMs running on a host is actually that of the host, so that security issues with the host automatically affect all its VMs, that is every VM hosting server is a giant undetectable backdoor into all the hosted VMs.
The latter point is particularly important: the typical way around it is to increase the redundancy of systems, with live migration of virtual machines and active-active redundancy, on a corse scale, typically by having multiple sites for private clouds and multiple regions for public clouds.
Note: I have seen actual production
designs where network systems, frontend servers, backend
servers, where all not just virtualized but also in separate
active-active clusters, creating amazing complexity. When
talking about software-defined
virtual
infrastructure it is usually not mentioned that the software
in issue has to written, tested, understood, maintained in
addition to the hardware infrastructure; this applies also to
non computing infrastructures, like power generating,
communications, etc. infrastructures.
The problem with increased redundancy is that it largely
negates the cost advantage of consolidation, and increases the
complexity and thus the fragility of the virtualized
infrastructure. The really hot topic in system engineering
today is magical solutions, either for private or public
clouds, that are claimed to deliver high availability and
great flexibility thanks to software
defined
clustering, data-centres, networking and storage
over multiple virtualized infrastructures. I have even been
asked to propose designs for these, but my usual conclusion
has been that the extreme complexity of the result made it
very difficult to even estimate the overall reliability and
actual cost.
Note: it is for me particularly hard to
believe claims that high reliability is something that can be
transparently and cheaply added at the infrastructure level by
adding complex and fragile configuration software, that is
without an expensive redesign of applications, which claim is
often one of the main motivations to consolidate
them onto a virtual infrastructure.
Of course swivel-eyed
consultants and
salesmen speak very positively about the benefits (especially
those proposing
OpenStack style
infrastructures), as they look awesome on paper at a very high
level of abstraction. But I find it telling that third-party
cloud infrastructure providers have extremely cautious
SLAs
that give very few guarantees even by paying extra for extra
guarantees, and their costs are often
quite high, and higher than physical infrastructure.
Note: public shared infrastructures that make sense all rely on colossal economies of scale or colossal networking effects being available to fund the extra reliability and performance issues that they have. A huge dam with a dozen immense turbines produces electric power at such a lower cost that it is worth building a big long distance delivery infrastructure to share its output among many users; a national shared telephone network provides immense connectivity value to its users. Simply putting a lot of servers together in big data centres has some but not large economies of scale (and the scale beyond which they peter out is not that large), and negligible network effects, never mind reducing their capacity by running virtualization layers on top of them.
My conclusion is that virtualized infrastructures are good in small doses, and for limited purposes, in particular for non-critical services that require little storage and memory, and for development and testing, whether private or public, and that in general physical hardware, even in shared data centres, is a more reliable and cost-effective alternative for critical services.
The great attraction of virtualized infrastructures for the executive level is mostly as to financial accounting: the up-front cost of a private virtual infrastructure is low compared to the long-run maintenance costs, and leasing VMs on a public virtual infrastructure can be expensed straight away from taxable income. But my impression is that like other past very similar fads virtualized infrastructures will become a lot less popular with time as their long term issues become clearer.
Note: the past fad most similar in
business rationale and as to business economics, and also as
to downsides, to virtualized infrastructures has been
blade system
:
I have found them fairly usable, but they have not become
popular in the long term, because of the very similar
weaknesses to virtualized infrastructures.
In particular I have noticed that like in many other cases with infrastructures (not just virtualized ones) a newly installed infrastructure in the first years of of use seems to work well as it has not reached full load and complexity, and therefore there will be early declarations of victory, as in most cases where long run-run costs are indeed deferred.
Given this, a virtual infrastructure skeptic is much better
placed than an optimist as to minimizing the issues with
virtual infrastructures until the fad passes and slow down the
negative effects of increasing utilization and complexity: by
discouraging the multiplication of small VMs
that need independent setup and maintenance, by not
undersizing the physical layer, by keeping clear inventories
of which VM is on which part of the physical infrastracture to
minimize actual dependencies and the impact of infrastructure
failures, by exercising great discipline in building the
configuration programs of software-defined
data centres and networks and storage.
Finally my reflections above are from the point of view of cost effectiveness at the business level: at the level of system engineers virtualized infrastructures not only improve employment prospects and job security because of their complexity and the increased staff numbers that they require for effective operation, but they are also lots of fun because of the challenges involved as to technological complexity and sophistication and minimizing the downsides.
So I was disappointed previously that
my laptop flash SSD seemed to have halved
but I noticed recently that the unit was in SATA1 instead of
SATA2 mode, and I could not change thatm even by disabling
power saving using the
PowerTOP
tool or using the hdparm tool; then I suspected that
was due to the SATA chipset resetting itself down to SATA1
mode on resuming from suspend-to-RAM, but after a fresh boot
that did not change. Then I found out that I had at some point
set in the laptop
BIOS
an option to save power on the SATA interface. Once I unset it
the flash SSD resumed working at the top speed of the laptop's
SATA2 interface of around 250MB/s. It looks like the BIOS
somewhat sets the SATA interface to SATA1 speeds in a way that
the Linux based tools cannot change back.
Also I retested the SK Hynix flash SSD and it is back to almost full nominal speed at 480MB/s reading instead of 380MB/s, I guess because of garbage collection in the firmware layer.
In most use, that is for regular-sized files, the change in transfer rate does not reslt in noticeable changes in responsiveness, which is still excellent, because the real advantage of flash SSDs for interactive use is in the much higher random IOPS than disks, more than the somewhat higher transfer rates.
Overall I am still very happy with the three flash SSDs I have, which very chosen among those with a good reputation, as none of them have had any significant issues in several years and they all report having at least 95% of their lifetimes writes available. This is one of the few cases where I have bought three different products from different manufacturers and all three have been quite good. The Samsung 850 Pro feel a bit more responsive than the others but they are all quite good, even the nearly 6 years old Micron M4.
First a list of recent and semi-recent successful hacks
that were discovered:
In its statement, the company said:
Subsequent to Yahoo's acquisition by Verizon, and during integration, the company recently obtained new intelligence and now believes, following an investigation with the assistance of outside forensic experts, that all Yahoo user accounts were affected by the August 2013 theft.
The hacked user information included phone numbers, birth dates, security questions and answers, and "hashed," or scrambled, passwords, Yahoo said in a list of frequently asked questions on its website. The information did not include "passwords in clear text, payment card data, or bank account information," the company said.
However, the technique Yahoo used to hash passwords on its site is an outdated one that is widely considered to be easily compromised, so it's possible that people who had the hashed passwords could unscramble them.
These are rather different stories because not all hacks are equal: those that generate revenue or offer the opportunity to generate revenue are much worse than the others:
NotPetyaransomware headline figure is for the loss of 2.5% of the usual business, not money gained by the hackers. There is no indication whether Maersk actually paid the ransom.
11 million US driver’s licenseswere a small and not very significant part of the Equifax issue and the whole issue is not in itself very big because it is about information about people, even if it can be partly used in identity theft and thus has a revenue generating market value; the more concerning aspect of the Equifax issue is that
209,000 people had their credit card info leak and the breach also included dispute documents with personally identifying information from 182,000 consumers, because those can be used to charge the credit card accounts.
590 'significant attacks' over past yeardon't mean much, as they usually will be
phishingattempts that while ruinous to individuals will usually hit few of them for small amounts of money, even if they are usually far more effective, according to Google, on a statistical basis:
The blackhat search turned up 1.9 billion credentials exposed by data breaches affecting users of MySpace, Adobe, LinkedIn, Dropbox and several dating sites. The vast majority of the credentials found were being traded on private forums.
Despite the huge numbers, only seven percent of credentials exposed in data breaches match the password currently being used by its billion Gmail users, whereas a quarter of 3.8 million credentials exposed in phishing attacks match the current Google password.
The study finds that victims of phishing are 400 times more likely to have their account hijacked than a random Google user, a figure that falls to 10 times for victims of a data breach. The difference is due to the type of information that so-called phishing kits collect.
Overall the real import of the above issues is very varied. And they are the least important, because they were discovered: the really bad security issues are those that don't get found. Knowing about a security breach is much better than not knowing about it.
As to this, phishing
has evolved from
banal e-mails to entire
fake sites
quite a while ago:
Phishing kits contain prepackaged fake login pages for popular and valuable sites, such as Gmail, Yahoo, Hotmail, and online banking. They're often uploaded to compromised websites, and automatically email captured credentials to the attacker's account.
Phishing kits enable a higher rate of account hijacking because they capture the same details that Google uses in its risk assessment when users login, such as victim's geolocation, secret questions, phone numbers, and device identifiers.
Security services use the same technique for entrapment, which is a form of phishing:
Australian police secretly operated one of the dark web’s largest child abuse sites for almost a year, posing as its founder in an undercover operation that has triggered arrests and rescues across the globe.
The sting has brought down a vast child exploitation forum, Childs Play, which acted as an underground meeting place for thousands of paedophiles.
Obviously this is just one case among very many that are as yet undisclosed. There may be few qualms about entrapment of paedophiles (but for the fact that it facilates their activities for a time), but consider the case of a web site for north korean political dissenters actually run by the north korean secret police, or a system security discussion forum for bank network administrators run by a russian hacker gang.
Never mind phone home
ever-listening
devices like
digital assistants
or smart television monitors
(1,
2),
or smartwatches that act as remote listening devices
(1,
2).
The issue with these is not just that they phone home to the
manufacturers, but that they may be easy to compromise by
third parties, and then the would phone
home
to those third parties, with potentially vast
conseqences. For example it is not entirely well known that
passwords can be easy to figure out from the sound of typing
them, especially if they are typed repeatedly.
So as always there are two main types of security situations: those where one is part of a generic attack against a large number of mostly random people with a low expected rate of success, and what matters is not to be a success, and those targeted against specific individuals or groups of high value, and what matters is either to be not of high value or not to select oneself as being or appearing to be high value.
I have belatedly become aware of some tools that seem interesting but I have not used them yet:
warmdata recovery scheme that can complement a
coldbackup scheme. It works similarly to the
par2
erasure-codetool, but at the filesystem level instead of the file-archive level. Given N filesystems and some additional space (on some other filesystem) it will compute statically a redundancy code in the additional space that allows recovering any single data loss in unmodified files.
container, so that the only dependency is on the Linux kernel system call API, which is extremely stable.
So I have finally switched to Mozilla's new Firefox 57 Quantum, a largely re-engineered jump away from the previous codebase.
The re-engineering was supposed to deliver more portable add-ons
and faster rendering of web pages, as
well as making multi-process operation common. In all it seems
to deliver: the new-style add-ons are no longer based on
rewriting parts of the Firefox user interface internal XUL
code itself, but on
published internal interfaces, mostly compatible
with those of Google's Chrome, and indeed page loading and
rendering are much snappier, and multi-process operation means
that in cases like memory overflow only some tabs get
killed. There are however a few very important issues in
switching, which makes it non-trivial:
containedusing methods like SELinux or AppArmor in Linux, and similar facilities are not quite available under other operating systems, Firefox 57 includes a built-in containment system. This containment systems, called the sandbox, is by default set to be very constrictive, which on many systems means stuff does not work.
The solutions are:
profilewhere JavaScript is not restricted and drag and drop the URL to that.
window managerI found it does not handle well the popup dialog windows for the add-on installer, so I replaced it with the more modern OpenBox.
Having done the above Firefox 57 works pretty well, and in large part thanks to Tab Suspender and Policy Control, and in part to its improved implementation, it is far quicker and consumes much less memory, as the new add-ons are stricter than those they replaced. There are minor UI changes, and they seem to be broadly an improvement, even if slight.
As I spend a fair bit of time helping distraught users on IRC channels, in part for community service, in part to keep in touch with mortal user contemporary issues, and I was asked today for a summary of what Btrfs currently is good for and what are the limitations, so I might as well write it here for reference.
The prelude to a few lists is a general issue: Btrfs has a sophisticated design with a lot of features, and regardless of current status its main issue is that it is difficult both to explain its many possible aspects, it complex tradeoffs and to understand them. No filesystem design is maintenance-free or really simple, but Btrfs (and ZFS and XFS too) is really quite not fire-and-forget.
Because of that complexity I have decided to create a page of notes dedicated to Btrfs with part of its contents extracted from the Linux filesystems notes and with my summary how best to use it.
I was chatting with two quite knowledgeable, hard bitten guys about system administration and their issue was that they got 500 diverse systems running diverse machines to manage and they use Puppet, and they have a big maintainability problem for Puppet scripts and configuration templates.
Note: the 500 different virtual systems
is a large number, but there are organmizations with 1,000 and
the diversity is usually the result of legacy issues, such as
consolidating
different physical
infrastructures belonging to different parts of an
organization onto a single virtual infrastructure.
So I discussed a bit with them, and suggested that the big deal is not so much the wide variety of GNU/Linux distributions they use, because after all they run all the same applications, and they are not radically different; after all where a configuration file is put by a distribution is a second order complication.
I suggested that the bigger problems is that there are innumerable and incompatible versions of those applications in every possible combination, and that results in hard to maintain configuration files and templates. I also added that this is a known issue in general programming, which for example results in C source code with a lot of portability #ifdef cases.
They agreed with me, and asked my opinion as to other configuration management systems that might alleviate the problem. I mentioned Ansible, but did not have much time to get into that, as I had to leave So I did not have the time to make more explicit my hint that it it not so much a problem with the tool, even if different tools make a difference on how easily the solution is implemented.
The difficulty is that in using software
defined
infrastructure the software needs to be designed
carefully for understandability, modularity, maintainability,
just like any other software that will be in se in the long
term, and configuration management languages are often not
designed to make that easy, or to gently encoruage the best
practices: it os exceptionally easy to create a software
defined infrastructure, be it servers, networks, hardware that
is complex, opaque and fragile, by the usual programming
technique of whatever works
in the short
term.
The main issue is thus one of program design, and in
particular of which decomposition paradigm
to use and discipline in adhering to it.
I have previously discussed what an enormous difference to maintainability there is between two different styles of Nagios configuration the issue is even larger in the case of generic configuration files, never mind instantiating and deploying them in many variants to different types of systems via configuration management scripts.
The main difficulty is that in a system infrastructure there are one-to-many and many-to-many relationships among for example:
The final configuration of a system is a
mixin
(1,
2)
of all those aspects. In an ideal world the scripts and
templates used to create software-defined configurations would
be perfectly portable across changes to any or all these
aspects. This is the same problem as writing a complex program
composed of many parts written in different languages, to be
portable across many different hardware and software
platforms, and easily maintainable.
This problem cannot be solved by changing tool; it can only be solved by using a structured approach to programming the configuration management system based on insight and discipline.
For example one of the great high level choices to make is
whether to organize the configuration management system by
system, or by service: this can be expressed by asking whether
systems belong
to services or whether services
belong to systems. This relates to
whether systems that run multiple services are common, or
whether services that run on multiple systems are common,
which depends in turn on the typical size of systems, the
typical weight of services. The same questions can be asked
about operating system types and application types.
Attempts to answer questions like this with both
or
whatever
are usually straight paths to
unmaintainability, because an excess of degrees of freedoms is
usually too complex to achieve, never mind to maintain in the
long term, both conceptually and as to capacity. Arbitrary
combinations of arbitrary numbers of systems, services, OS
types, application types are beyond current technology and
practice for software-defined infrastructures, just as full
modularity and portability of code is for applications.
What can be done if someone has got to the point of having
several hundred virtual systems where (because of the pressure
of delivering something quickly rather than well) little
attention has been paid to the quality and discipline in the
configuration system, and the result is a spaghetti
outcome?
The answer usually is nothing
, that is just to live
with it, because any attempt to improve the situation will be
likely to be too disruptive. In effect many software-defined
infrastructures soon become unmodifiable, because the
software
part of software-defined
soon becomes unamanageable, unless it is managed with great
and continuing care from the beginning.
For software at the application level it is sometimes
possible to refactor
it incrementally,
replacing hard-to-maintain parts of it with better structured
ones, but it is much more difficult to do this with software
that defines infrastructures because it has to be done and
commissioned on the live infrastructure itself. Unless it
is either quite small or quite simple.
One of the great advantages of a
crop rotation
strategy
for infrastructure upgrades is that it offers a periodic
opportunity for incremental refactoring of infrastructure both
as to hardware and software-defined.
In effect software-defined infrastructures will achieve the same advantages and disadvantages as very detailed work-to-rule outsourcing contracts, because software is a type of contract.
This is going to happen whether the virtualized infrastructure is public or private, with the added difficulty for public infrastructure of the pressure by all customers to change nothing in existing interfaces and implementations to avoid breaking something in their production environments: because change reduces the cost of maintenance for the infrastructure maintainer, and backwards compatibility reduces the cost of the clients of those infrastructures.
That is not a new situation as to history of computing infrastructures: mainframe based infrastructures went the same way in the 1970s-1980s, and after 20 years of ever greater piling up of special cases, became very inflexible, any changes being too risky to existing critical applications. Amusingly the first attempt to fix this was by virtualizing the mainframe infrastructure with VM/370, as this was supposed to allow running very diverse system layers by sharing the same hardware.
But the really successful solution was to bypass the central mainframes first with departmental minicomputers, and then with personal computers, which could be maintained and upgraded individually, maximizing flexibility and minimizing the impact of failure.
To some extent entirely standardized systems processing
trivially simple workloads can be served successfully from
centralized cloud
resources, as it happens
for the electricity to power lightbulbs, fridges, washers in
the average home. So for example I reckon that cloud-based
content distribution networks are here to stay. But I think
that the next great wave in business computing will be
decentralization, as both private and public centralized
virtual infrastructures become ossified and unresponive like
mainframe-based computing became two-three decades ago.
As non-technical people ask me about the metaphysical subject
of computer security
my usual story is that
every possible electronic device is suspicious, and if
programmable it must be assumed to have several backdoors put
in by various interested parties, but they are not going to be
used unless the target is known to be valuable.
It is not just software or firmware backdoors: as a recent article on spyware dolls and another one reporting WiFi chips in electric irons suggest, it is very cheap to put some extra electronics on the circuit boards of various household items, either officially or unofficially, and so probably these are quite pervasive.
As I
wrote previously
being (or at least being known as) a target worth only a low
budget is a very good policy, as otherwise the cost of
preventive measures can be huge; for valuable targets, such as
large Bitcoin wallets, even two factor
authentication
via mobile phones can be nullified because
it is possible to subvert mobile phone message routing
and the same article makes strongly the point that being known
as a valuable target attracts unwanted attempts, and that in
particular Bitcoin wallets are identified by IP address and it
is easy to associate addresses to people.
In the extreme case of Equifax which held the very details of hundreds of millions of credit card users, the expected sale value is dozen of millions of dollars which may justify investments of millions of dollars to acquire, and it is very difficult to protect data against someone with a budget like that, that allows paying for various non technical but very effective methods.
The computers of most non-technical people and even system
administrators are not particularly valuable targets, as long
as they keep the bulk of their savings, if any, in offline
accounts, and absolutely never write down (at least not on a
computer) the PINs and passwords to their online savings
accounts. Even so a determined adversary can grab those
passwords when they are used, indirectly, by
installing various types of bugs
in a house or a computer before it gets delivered, but
so-called targeted operations
have a
significant cost which is not worth spending over
average-income people.
So my usual recommendation is to be suspicious of any electrical, not just electronic, device, and use them only for low value activities not involving money and not involving saleable assets like lists of credit card numbers.
I have been quite fond of using the
Btrfs
filesystem, but only for its simpler features and in a limited
way. But on its IRC channel and mailing list I come often
across many less restrained users who get into trouble of some
sort or another by using arbitrary combinations of features,
expecting them to just work
and fast
too.
That trouble is often bugs, because there are many arbitrary combinations of many features, and designing so all of them behave correctly and sensibly, never mind testing them, is quite hard work.
But often that trouble relates to performance and speed, because the performance envelope of arbitrary combinations of features can be quite anisotropic indeed; for example in Btrfs creating snapshots is very quick, but deleting them can take a long time.
This reminded me of the C++ programming language, which also has many features and makes possible many arbitrary combinations, many of them unwise, and in particular one issue it has compared to the C programming language: in C every language feature is designed to have a small, bounded, easily understood cost, but many features of C++ can involve very large space or time costs, often surprisingly, and even more surprising can be the cost of combinations of features. This property of C++ is common to several programming languages that try to provide really advanced features as if they were simple and cheap, which of course is something that naive programmers love.
Btrfs is very similar: many of its advanced features sound
like the filesystem code just does it
but
involve potentially enormous costs at non-trivial scale, or
are very risky in case of mishaps. Hiding great complexity
inside seemingly trivial features seems to lead astray a lot
of engineers who should know better, or who believe that they
know better.
I wish I could say something better than this: that it is
very difficult to convey appropriately an impression of the
real cost of features when they are presented elegantly. It
seems to be a human failing to assume that things that looks
elegant and simple are also cheap and reliable; just like in
movies the good people are also handsome and look nice, and
the bad people are also ugly and look nasty. Engineers should
be more cynical, but many don't, or sometimes don't want to,
as that may displease management
who usually love optimism
.
One of the reasons why I am skeptical about virtualization is that it raises the complexity of a system: a systems engineer has then to manage not just systems running one or more applications, but also systems running those systems, creating a cascade of dependencies that are quite difficult to investigate in case of failure.
The supreme example of this design philosophy is OpenStack which I have worked on for a while (lots of investigating and fixing), and I have wanted to create a small OpenStack setup at home on my test system to examine it a bit more closely. The system at work was installed using MAAS and Juju, which turned out to be quite unreliable too, so I got the advice to try the now-official Ansible variant of Kolla (1, 2, 3, 4) setup method.
Kolla itself does not setup ansible: it is a tool to build Docker deployables for every OpenStack service. Then the two variants are for deployment: using Ansible or Kubernetes which is another popular buzzword along with Docker and OpenStack. I chose the Ansible installer as I am familiar with Ansible and also I wanted to do a simple install with all relevant services on a single host system, without installing Kubernetes too.
It turns out that the documentation as usual is pretty awful, very long on fantastic promises of magnificence, and obfuscates the banal reality:
Note: one of the components and Dockerfile is for kolla-toolbox which is a new fairly small component for internal use.
The main value of kolla and kolla-ansible is that someone has already written the 252 Dockerfiles for the 67 components, and Ansible roles for them, and keeps somewhat maintaining them, as OpenStack components change.
Eventually my minimal install took 2-3 half days, and resulted in building 100 Docker images taking up around 11GiB and running 29 Docker instances:
soft# docker ps --format 'table {{.Names}}\t{{.Image}}' | grep -v NAMES | sort cron kolla/centos-binary-cron:5.0.0 fluentd kolla/centos-binary-fluentd:5.0.0 glance_api kolla/centos-binary-glance-api:5.0.0 glance_registry kolla/centos-binary-glance-registry:5.0.0 heat_api_cfn kolla/centos-binary-heat-api-cfn:5.0.0 heat_api kolla/centos-binary-heat-api:5.0.0 heat_engine kolla/centos-binary-heat-engine:5.0.0 horizon kolla/centos-binary-horizon:5.0.0 keystone kolla/centos-binary-keystone:5.0.0 kolla_toolbox kolla/centos-binary-kolla-toolbox:5.0.0 mariadb kolla/centos-binary-mariadb:5.0.0 memcached kolla/centos-binary-memcached:5.0.0 neutron_dhcp_agent kolla/centos-binary-neutron-dhcp-agent:5.0.0 neutron_l3_agent kolla/centos-binary-neutron-l3-agent:5.0.0 neutron_metadata_agent kolla/centos-binary-neutron-metadata-agent:5.0.0 neutron_openvswitch_agent kolla/centos-binary-neutron-openvswitch-agent:5.0.0 neutron_server kolla/centos-binary-neutron-server:5.0.0 nova_api kolla/centos-binary-nova-api:5.0.0 nova_compute kolla/centos-binary-nova-compute:5.0.0 nova_conductor kolla/centos-binary-nova-conductor:5.0.0 nova_consoleauth kolla/centos-binary-nova-consoleauth:5.0.0 nova_libvirt kolla/centos-binary-nova-libvirt:5.0.0 nova_novncproxy kolla/centos-binary-nova-novncproxy:5.0.0 nova_scheduler kolla/centos-binary-nova-scheduler:5.0.0 nova_ssh kolla/centos-binary-nova-ssh:5.0.0 openvswitch_db kolla/centos-binary-openvswitch-db-server:5.0.0 openvswitch_vswitchd kolla/centos-binary-openvswitch-vswitchd:5.0.0 placement_api kolla/centos-binary-nova-placement-api:5.0.0 rabbitmq kolla/centos-binary-rabbitmq:5.0.0
Some notes on this:
containers, each image is setup to run (via the popular and useful dumb-init) a single daemon, usually a single process, that's why there are 8 Nova containers and 5 Neutron ones.
Most of the installation time was taken to figure out the rather banal nature of the two components, to de-obfuscate the documentation, to realize that the prebuilt images in the Kolla Docker hub repository were both rather incomplete and old and would not do, that I had to overbuild images, but mostly to work around the inevitable bugs in both the tools and OpenStack.
I found that the half a dozen blocker bugs that I investigated were mostly known years ago, as it often happens, ands that was lucky, as most of the relevant error messages were utterly opaque if not misleading, but would eventually lead to a post by someone who had investigated it.
Overall 1.5 days to setup a small OpenStack instance (without backing storage) is pretty good, considering how complicated it is, but the questions are whether it is necessary to have that level of complexity, and how fragile it is going to be.
On a boring evening I have run the STREAM and HPL benchmarks on some local systems, with interesting results, first for STREAM:
over$ ./stream.100M ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ... Function Best Rate MB/s Avg time Min time Max time Copy: 5407.0 0.296645 0.295911 0.299323 Scale: 4751.1 0.338047 0.336766 0.344893 Add: 5457.5 0.439951 0.439759 0.440204 Triad: 5392.4 0.445349 0.445068 0.445903 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------
tree$ ./stream.100M ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ... Function Best Rate MB/s Avg time Min time Max time Copy: 5668.9 0.299895 0.282241 0.327760 Scale: 5856.8 0.305240 0.273185 0.363663 Add: 6270.9 0.404547 0.382720 0.446088 Triad: 6207.4 0.408687 0.386636 0.456758 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------
soft$ ./stream.100M ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ... Function Best Rate MB/s Avg time Min time Max time Copy: 5984.0 0.268327 0.267381 0.270564 Scale: 5989.1 0.269746 0.267154 0.279534 Add: 6581.8 0.366100 0.364640 0.371339 Triad: 6520.0 0.374828 0.368098 0.419086 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------
base$ ./stream.100M ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ... Function Best Rate MB/s Avg time Min time Max time Copy: 6649.8 0.242686 0.240608 0.244452 Scale: 6427.1 0.251241 0.248944 0.257000 Add: 7444.9 0.324618 0.322367 0.327456 Triad: 7522.1 0.322253 0.319058 0.324474 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------
virt$ ./stream.100M ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ... Function Best Rate MB/s Avg time Min time Max time Copy: 9961.3 0.160857 0.160621 0.160998 Scale: 9938.1 0.161200 0.160997 0.161329 Add: 11416.7 0.210518 0.210218 0.210647 Triad: 11311.7 0.212472 0.212170 0.214260 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------
The X3 720 system has DDR2, the others DDR3, obviously it is not making a huge difference. Note that the X3 720 is from 2009, and the i3-370M is in a laptop from 2010, and the FX-8370e from 2014, so a span of around 5 years.
Bigger differences for HPL; run single-host with mpirun -np 4 xhpl; the last 8 tests are fairly representative, and the last column (I have modified the original printing format from %18.3e to %18.6f) is GFLOPS:
over$ mpirun -np 4 xhpl | grep '^W' | tail -8 | sed 's/ / /g' WR00C2R2 35 4 4 1 0.00 0.011727 WR00C2R4 35 4 4 1 0.00 0.012422 WR00R2L2 35 4 4 1 0.00 0.011343 WR00R2L4 35 4 4 1 0.00 0.012351 WR00R2C2 35 4 4 1 0.00 0.011562 WR00R2C4 35 4 4 1 0.00 0.012483 WR00R2R2 35 4 4 1 0.00 0.011615 WR00R2R4 35 4 4 1 0.00 0.012159
tree$ mpirun -np 4 xhpl | grep '^W' | tail -8 | sed 's/ / /g' WR00C2R2 35 4 4 1 0.00 0.084724 WR00C2R4 35 4 4 1 0.00 0.089982 WR00R2L2 35 4 4 1 0.00 0.085233 WR00R2L4 35 4 4 1 0.00 0.088178 WR00R2C2 35 4 4 1 0.00 0.085176 WR00R2C4 35 4 4 1 0.00 0.092192 WR00R2R2 35 4 4 1 0.00 0.086446 WR00R2R4 35 4 4 1 0.00 0.092728
soft$ mpirun -np 4 xhpl | grep '^W' | tail -8 | sed 's/ / /g' WR00C2R2 35 4 4 1 0.00 0.074744 WR00C2R4 35 4 4 1 0.00 0.074744 WR00R2L2 35 4 4 1 0.00 0.073127 WR00R2L4 35 4 4 1 0.00 0.075299 WR00R2C2 35 4 4 1 0.00 0.072952 WR00R2C4 35 4 4 1 0.00 0.076627 WR00R2R2 35 4 4 1 0.00 0.076052 WR00R2R4 35 4 4 1 0.00 0.073127
base$ mpirun -np 4 xhpl | grep '^W' | tail -8 | sed 's/ / /g' WR00C2R2 35 4 4 1 0.00 0.152807 WR00C2R4 35 4 4 1 0.00 0.149934 WR00R2L2 35 4 4 1 0.00 0.150643 WR00R2L4 35 4 4 1 0.00 0.152807 WR00R2C2 35 4 4 1 0.00 0.132772 WR00R2C4 35 4 4 1 0.00 0.160093 WR00R2R2 35 4 4 1 0.00 0.163582 WR00R2R4 35 4 4 1 0.00 0.158305
virt$ mpirun -np 4 xhpl | grep '^W' | tail -8 | sed 's/ / /g' WR00C2R2 35 4 4 1 0.00 0.3457 WR00C2R4 35 4 4 1 0.00 0.3343 WR00R2L2 35 4 4 1 0.00 0.3343 WR00R2L4 35 4 4 1 0.00 0.3104 WR00R2C2 35 4 4 1 0.00 0.3418 WR00R2C4 35 4 4 1 0.00 0.3622 WR00R2R2 35 4 4 1 0.00 0.3497 WR00R2R4 35 4 4 1 0.00 0.3756
Note: the X3 720 and the FX-6100 are running SL7 (64 bit), and the i3-370M and FX-8370e are running ULTS14 (64 bit). On the SL7 systems I got better slightly better GFLOPS with OpenBLAS, but on the ULTS14 systems it was 100 times slower than ATLAS plus BLAS.
Obvious here is that the X3 720 was not very competive in FLOPS even with an i3-370M for a laptop, and that at least on floating point intel CPUs are quite competitive. The reason is that most compilers optimize/schedule floating point operations specifically for what's best intel floating point internal archictectures.
Note: the site cpubenchmark.net rates using PassMark® the X3 720 at 2,692 the i3-370M at 2,022 the FX-6100 at 5,412 and the FX-8370e at 7,782 which takes into account also non-floating point speed and the number of CPUs, and these ratings seem overall fair to me.
It is however faily impressive that the FC-8370e is still twice as fast at the same GHz as the FX-6100, and it is pretty good on an absolute level. However I mostly use the much slower i3-370M in the laptop, and for interactive work it does not feel much slower.
In the blog of a software developer there is a report of the new laptop he has bought to do hiw work:
a new laptop, a Dell XPS 15 9560 with 4k display, 32 GiBs of RAM and 1 TiB M.2 SSD drive. Quite nice specs, aren't they :-)?
That is not surprising when a smart wristwatch has dual CPU 1GHz chip, 768MiB of RAM, 4GiB of flash SSD but it has consequences for many other people: such a powerful development system means that improving the speed and memory use of the software written by that software developer will not be a very high priority. It is difficult for me to see practical solutions for this unwanted consequence of hardware abudance.
Some years ago I had reported that the standard sftp (and sometimes scp) implementation then available for GNU/Linux and MS-Windows was extremely slow because of a limitation in the design of its protocol and implementation which means that it effectively behaved as if in half-duplex mode.
At the current time the speed of commonly available sftp implementations is pretty good instead, with speed of around 70-80MB/s on a 1Gbit links, because the limitation could be circumvented and the implementations have been (arguably) improved, both the standards one and an extended one.
Previously I mentioned the
large capacitors on flash SSDs
in the 2.5in format targeted at the enterprise
market
. In an article about recent flash SSDs in the M.2
format there are
photographs and descriptions
of some with mini capacitors with the same function, and a
note that:
The result is a M.2 22110 drive with capacities up to 2TB, write endurance rated at 1DWPD, ... A consumer-oriented version of this drive — shortened to the M.2 2280 form factor by the removal of power loss protection capacitors and equipped with client-oriented firmware — has not been announced
The absence of the capacitors is likely to save relatively
little in cost, and having to manufacture two models is likely
to add some of that back, but the lack of capacitors makes the
write IOPS a lot lower because (because it requires
write-through rather than write-back caching) and for the same
reason also increase write amplification, thus creating enough
differentiation
for a much higher price for
the enterprise version.
Interesting new of a line of (instead of the more
conventional M.2 and 2.5in) 3.5in form factor flash SSD
products, the Viking
UHC-Silo
(where UHC presumably stands for Ultra High
Capacity
) in 25TB and 50TB capacities, with a dual-port
6Gb/s SAS interface.
Interesting aspects: fairly conventional 500MB/s sequential read and 350MB/s sequential write rates, and not unexpectedly 16W power consumption. But note that at 350MB/s to fill (or duplicate) this unit takes 56 hours. That big flash capacity could probably sustain a much higher transfer rate, but that might involve a lot higher power drawn and thus heat dissipation, so I suspect that the limited transfer rate does not depend just on the 6Gb/s channel that after all allows for slightly higher 500MB/s read rates.
Initial price seems to be US$37,000 for the 50TB version, or around US$1,300 per TB which is more or less the price for a 1TB flash SSD enterprise SAS product.
The main economic reason to virtualize has been allegedly
that in many cases hardware servers that run the hosts
systems of applications are mostly idle, as a typical data
centre hosts many low-use applications that nevertheless are
run on their own dedicated server (being
effectively containerized
on that hardware server) and therefore consolidating
them onto a single hardware
server can result in a huge saving; virtualization allows
consolidating the hosting systems as-they-are, simply by
copying them from real to virtual hardware, saving transition
costs.
Note: that gives for granted that in many cases the applications are poorly written and cannot coexist in the same hosting system, which is indeed often the case. More disappointingly also gives for granted that increasing the impact of hardware failure from one to many applications is acceptable, and that replication of virtual systems to avoid that is cost-free, which it isn't.
But there is another rationale for consolidation, if not virtualization, that does not depend on there being a number of mostly-idle dedicated hardware servers: the notion that the cost per capacity unit of midrange servers is significantly lower than for small servers, and that the best price/capacity ratio is with servers like:
- 24 Cores (48 cores HT), 256GB RAM, 2* 10GBit and 12* 3TB HDD including a solid BBU. 10k EUR
- Applications that actually fill that box are rare. We we are cutting it up to sell off the part.
That by itself is only a motivation for consolidation, that is for servers that run multiple applications; in the original presentation that becomes a case for virtualization because of the goal of having a multi-tenant service with independent administration domains.
The problem with virtualization to take advantage of lower marginal cost of capacity in mid-size hardware is that it is not at all free, because of direct and indirect costs:
Having looked at the numbers involved my guess is that there is no definite advantage in overall cost to consolidation of multiple applications, and it all depends on the specific case, and that usually but not always virtualization has costs that outweight its advantages, especially for IO intensive workloads, and that this is reflected in the high costs of most virtualized hosting and storage services (1, 2).
Virtualization for consolidation has however an advantage as to the distribution of cost as it allows IT departments to redefine downwards their costs and accountabilities: from running application hosting systems to running virtual machine hosting systems, and if 10 applications in their virtual machines can be run on a single host system, that means that IT departments need to manage 10 times fewer systems, for example going from 500 small systems to 50.
But the number of actual systems has increased from 500 to
550, as the systems in the virtual machines have not
disappeared, so the advantage for IT departments comes not so
much from consolidation, but handing back cost of managing the
500 virtual systems to the developers and users of the
applications running within them, which is what usually DevOps
means.
The further stage for IT department cost-shifting is to get
rid entirely of the hosting systems, and outsource the hosting
of the virtual machines to a
more expensive cloud
provider,
where the higher costs are then charged directly to the
developers and users of the applications, eliminating those
costs from the budget of IT department, that is only left with
the cost of managing the contract with the cloud provider on
behalf of the users.
Shifting hardware and systems costs out of the IT department budget into that of their users can have the advantage of boosting the career and bonuses of IT department executives by shrinking their apparent costs, even if does not reduce overall business costs. But it can reduce aggregate organization costs when it discourages IT users from using IT resources unless there is a large return, by substantially raising the direct cost of IT spending to them, so even at the aggregate level it might be, for specific cases, ultimately a good move.
That is a business that consolidates
systems and switches IT provision from application hosting
to systems hosting and then outsources system hosting is in
effect telling its component business that they are overusing
IT and that they should scale it back, by effectively charging
more for application hosting, and supporting it less.
Today after upgrading (belatedly) the firmware of my BDR-2209 Pioneer drive to 1.33 I have seen for the first time a 50GB BD-R DL disc written at around 32MB/s average:
Current: BD-R sequential recording Track 01: data 31707 MB Total size: 31707 MB (3607:41.41) = 16234457 sectors Lout start: 31707 MB (3607:43/41) = 16234607 sectors Starting to write CD/DVD at speed MAX in real TAO mode for single session. Last chance to quit, starting real write in 0 seconds. Operation starts. Waiting for reader process to fill input buffer ... input buffer ready. Starting new track at sector: 0 Track 01: 9579 of 31707 MB written (fifo 100%) [buf 100%] 8.0x.
Track 01: Total bytes read/written: 33248166496/33248182272 (16234464 sectors). Writing time: 1052.512s
This on a fairly cheap no-name disc. I try sometimes also to write to a 50GB BD-RE DL discs, but it works only sometimes, and at best at 2x speed. I am tempted to try, just for the fun of it, to get a 100GB BD-RE XL disc (which have been theoretically available since 2011) but I suspect that's wasted time.
As an another example that there are no generic products I was looking at a PC card to hold an M.2 SSD device and interface it to the host bus PCIe and the description carried a warning:
Note, this adapter is designed only for 'M' key M.2 PCIe x4 SSD's such as the Samsung XP941 or SM951. It will not work with a 'B' key M.2 PCIe x2 SSD or the 'B' key M.2 SATA SSD.
There are indeed several types of M
slots and with
different widths and speeds, supporting different protocols,
and this is one of the most recent and faster variants. Indeed
among the user reviews there is also a comment as to the speed
achievable by an M.2 flash SSD attached to it:
I purchased LyCOM DT-120 to overcome the limit of my motherboard's M.2. slot. Installation was breeze. The SSD is immediately visible to the system, no drivers required. Now I enjoy 2500 MB/s reading and 1500 MB/s writing sequential speed. Be careful to install the device on a PCI x4 slot at least, or you will still be hindered.
Those are pretty remarkable speeds and much higher (peak sequential tranfer) than those for a memristor SSD.
So last night I was discussing the O_PONIES controversy and was asked to summarise it, which I did as follows:
There is the additional problem that available memory has
grown at a much faster rate than IO speed, at least that of
hard disks, and this has meant that users and application
writers have been happy to
let very large amounts of unwritten data accumulate
in the Linux page cache
, which then takes a
long time to be written to persistent storage.
The comments I got on this story were entirely expected, and I was a bit disappointed, but in particular one line of comment offers me the opportunity to explain a particularly popular and delusional point of view:
the behavior I want should be indistinguishable from copying the filesystem into an infinitely large RAM, and atomically snapshotting that RAM copy back onto disk. Once that's done, we can talk about making fsync() do a subset of the snapshot faster or out of order.
right. I'm arguing that the cases where you really care are actually very rare, but there have been some "design choices" that have resulted in fsync() additions being argued for applications..and then things like dpkg get really f'n slow, which makes me want to kill somebody when all I'm doing is debootstrapping some test container
in the "echo hi > a.new; mv a.new a; echo bye > b.new; mv b.new a" case, writing a.new is only necessary if the mv b.new a doesn't complete. A filesystem could opportunistically start writing a.new if it's not already busy elsewhere. In no circumstances should the mv a.new operation block, unless the system is out of dirty buffers
you have way too much emphasis on durability
The latter point is the key and, as I understand them, the implicit arguments are:
durable, only the net outcome of the whole sequence needs to be durable.
As to that I pointed out that any decent implementation of O_PONIES indeed is based on the true and obvious point that it is pointless and wasteful to make a sequence of metadata and data updates persistent until just before a loss of non-persistent storage content and that therefore all that was needed were two systems internal functions returning the time interval to the next loss of non-persistent storage content, and to the end of the current semantically-coherent sequence of operations.
Note: the same reasoning of course applies to backups of persistent storage content: it is pointless and wasteful to make them until just before a loss of the contents of that persistent storage.
Given those O_PONIES functions, it would never be necessary to explicitly fsync, or implicitly fsync metadata in almost all cases, in a sequence of operations like:
echo hi > a.new; mv a.new a; echo bye > b.new; mv b.new a
Because they would be implicitly made persistent only once it were known that a loss of non-persistent storage content would happen before that sequence would complete.
Unfortunately until someone sends patches to Linus
Torvalds implementing those two simple functions there
should be way too much emphasis on durability
because:
In the same issue of Computer Shopper
as the
review of a memristor SSD
there is also a review of a mobile phone also from Samsung,
the model
Galaxy S8+
which has an 8 CPU 2.3GHz chip, 4GiB of memory, and 64GiB of
flash SSD builtin. That is the configuration of a pretty
powerful desktop or a small server.
Most notably for me it has a 6.2in 2960×1440 AMOLED display, which is both physically large and with a higher pixel size than most desktop or laptop displays. There have been mobile phones with 2560×1440 displays for a few years which is amazing in itself, but this is a further step. There are currently almost no desktop or laptop AMOLED displays, and very few laptops have pixel sizes larger than 1366×768, or even decent IPS displays. Some do have 1920×1080 IPS LCD displays, only a very few even have 3200×1800 pixel sizes.
The other notable characteristic of the S8+ is that also given its processing power is huge it has an optional docking station that allows it to use an external monitor, keyboard and mouse (most likely the keyboard and mouse can be used anyhow as long as they are Bluetooth ones).
This is particularly interesting as
desktop-mobile convergence
(1,
2,
3)
was the primary goal of the
Ubuntu strategy
of Canonical:
Our strategic priority for Ubuntu is making the best converged operating system for phones, tablets, desktops and more.
Earlier this year that strategy was abandoned and I now suspect that a significant motivation for that was that Samsung was introducing convergence themselves with Dex, and for Android, and on a scale and sophistication that Canonical could not match, not being a mobile phone manufacturer itself.
In the same Computer Shopper UK,
issue 255
with a
review of a memristor SSD
there is also a review of a series of fitness-oriented
smartwatches
, and some of them like the
Samsung
Gear S3
have 4GB of flash storage, 768MiB of RAM and a two CPU
1GHz chip. That can run a server for a significant
workload.
Apparently memristor storage has arrived as Computer Shopper UK, issue 255 has a practical review of Intel Intel 32GB Optane Memory in an M.2 form factor. That is based on the mythical 3D XPoint memristor memory brand. The price and specifications have some interesting aspects:
Enhanced Power Loss Data Protection
technology
, but then being persistent random access low
latency memory it should not have a DRAM cache itself.The specifications don't say it, but it is not a mass
storage device, with a SATA or SAS protocol, it is a sort of
memory technology device
as far as the Linux kernel is concerned, and for optimal
access it requires
special handling.
Overall this first memristor product is underwhelming: it is more expensive and slower than equivalent M.2 flash SSDs, even if the read and write access time are much better.
To edit HTML and XML files like this one I use EMACS with the PSGML library, as it is driven by the relevant DTD and this drives validation (which is fairly complete), and indentation. As to the latter some HTML elements should not be indented further, because they indicate types of speech, rather than parts of the document, and some do not need to be indented at all, as the indicate preformatted text.
Having looked at PSGML there is for the latter case a variable in psgml-html.el that seems relevant:
sgml-inhibit-indent-tags '("pre")
but is is otherwise unimplemented. So I have come up with a more complete scheme:
(defvar sgml-zero-indent-tags nil "*An alist of tags that should not be indented at all") (defvar sgml-same-indent-tags nil "*An alist of tags that should not be indented further") (eval-when-compile (require 'psgml-parse)) (eval-when-compile (require 'cl)) (defun sgml-indent-according-to-levels (element) (let ((name (symbol-name (sgml-element-name element)))) (cond ((member name sgml-zero-indent-tags) 0) ((member name sgml-same-indent-tags) (* sgml-indent-step (- (sgml-element-level element) 1))) (t (* sgml-indent-step (sgml-element-level element))) ) ) ) (setq sgml-mode-customized nil) (defun sgml-mode-customize () (if sgml-mode-customized t (setq sgml-content-indent-function 'sgml-indent-according-to-levels) (fset 'sgml-indent-according-to-level 'sgml-indent-according-to-levels) (setq sgml-mode-customized t) ) ) (if-true (and (fboundp 'sgml-mode) (not noninteractive)) (if (fboundp 'eval-after-load) (eval-after-load "psgml" '(sgml-mode-customize)) (sgml-mode-customize) ) ) (defun html-sgml-mode () (interactive) "Simplified activation of HTML as an application of SGML mode." (sgml-mode) (html-mode) (make-local-variable 'sgml-default-doctype-name) (make-local-variable 'sgml-zero-indent-tags) (make-local-variable 'sgml-same-indent-tags) (setq sgml-default-doctype-name "html" ; tags must be listed in upper case sgml-zero-indent-tags '("PRE") sgml-same-indent-tags '("B" "I" "U" "S" "EM" "STRONG" "SUB" "SUP" "BIG" "SMALL" "FONT" "TT" "KBD" "VAR" "CODE" "MARK" "Q" "DFN" "DEL" "INS" "CITE" "OUTPUT" "ADDRESS" "ABBR" "ACRONYM") ) )
It works well enough, except that I would prefer the elements with tags listed as sgml-zero-indent-tags to have the start and end also not-indented, not just the content; but PSGML indents those as content of the enclosing element, so to achieve that would require more invasive modications of the indentation code.
Fascinating report with graphs on how route lookup has improved in the linux kernel, and the very low lookup times reached:
Two scenarios are tested:
- 500,000 routes extracted from an Internet router (half of them are /24), and
- 500,000 host routes (/32) tightly packed in 4 distinct subnets.
Since kernel version 3.6.11 the routing lookup cost was 140s and 40ns, since 4.1.42 it is 35ns and 25ns. Dedicated "enterprise" routers with hardware routing probably equivalent. In a previous post amount of memory used is given:
With only 256 MiB, about 2 million routes can be stored!.
As previously mentioned, once upon a time IP routing was much more expensive than Ethernet forwarding and therefore there was a speed case for both Ethernet forwarding across multiple hubs or switches, and for routing based on prefixes; despite the big problems that arise from Ethernet forwarding across multiple switches and the limitations that are consequent to subnet routing.
But it has been a long time since subnet routing has
been as cheap as Ethernet forwarding, and it is now
pretty obvious that even per-address host
routing
is cheap enough at least on the datacentre
and very likely on the campus level (500,000 addresses
is huge!).
Therefore so-called host routing
can result
in a considerable change of design decisions, but that depends
on realizing that host routing
is improper
terminology and the difference between IP adddresses and
Ethernet addresses
:
hostaddresses don't exist in either IP or Ethernet: in IP addresses name endpoints (which belong to interfaces), of which there can be many per interface and per host, while in Ethernet addresses are per interface rather than per host.
neighbourhooddetermined by its prefix.
Note: the XNS internetorking protocols used addresses formed of a 32 bit prefix as network identifier (possibly subnetted) and 48 bit Ethernet identifier which to me seems a very good combination.
The popularity of Ethernet and in particular of VLAN broadcast domains spanning multiple switches depends critically on Ethernet addresses being actually identifiers: an interface can be attached to any network access point on any switch at any time and be reachable without further formality.
Now so-called host routes
turn IP addresses into
(within
The Internet
)
endpoint identifiers, because the endpoint can change location,
from one interface to another, or one host to another, and still
identify the service it represents.
So I have this Aquaris E4.5 with Ubuntu Touch, where the battery is a bit fatigued, so it sometimes runs out before I could recharge it. Recently when it restarted the installed Ubuntu system seemed damaged: various things had bizarre or no behaviours.
For example when I clicked on System Settings>Wi-Fi I see nothing listed except that Previous networks lists a number of them, some 2-3 times, but when I click on any of them, System Settings ends; no SIM card is recognized (they work in my spare identical E4.5), and no non-local contacts appear.
Also in System Settings when clicking About, Phone, Security & Privacy, Reset it just crashes.
Previous "battery exhausted" crashes had no bad outcomes, as expected, as battery power usually runs out when the system is idle.
After some time looking into it figured out that part of the issue was that:
$HOME/.config/com.ubuntu.address-book/AddressBookApp.conf.lock $HOME/.config/connectivity-service/config.ini.lock
were blocking the startups of the relevant services, so removing them allowed them to proceed.
As to System Settings crashing, it was doing SEGV, so strace let me figure out that was for running out of memory just after accessing something under $HOME/.cache/, so I just emptied that directory, and then all worked. Some cached setting had been perhaps corrupted. I suspect that the cache needs occasional cleaning out.
Note: another small detail: /usr/bin/qdbus invokes /usr/lib/arm-linux-gnueabihf/qt5/bin/qdbus which is missing, but /usr/lib/arm-linux-gnueabihf/qt4/bin/qdbus works.