Software and hardware annotations 2007 December
This document contains only my personal opinions and calls of
judgement, and where any comment is made as to the quality of
anybody's work, the comment is an opinion, in my judgement.
[file this blog page at:
digg
del.icio.us
Technorati]
I have a backlog of draft blog entries, and the following
are just some random quick notes.
- 071223 Sun
BCM5752 jumbo frames work only in output
- I have been doing some transfer rate tests among hosts
with Broadcom
BCM5752 or
BCM5708
(
tg3
Linux driver),
Intel
82541PI
(e1000
Linux driver),
Realtek
RTL-8169
(r8169
linux driver) chipsets (more details
on these tests sometime later).
During these I have been quite confused by the behaviour of
the Broadcom BCM5752 with
jumbo frames
,
until I discovered that officially it does not allow them, but
in practice supports them in transmission only. In addition the
RTL-8169 supports jumbo frames but only up to 7000B instead of
the more common 9000B.
Anyhow the BCM5752 has various form of network processing
offloading, including TCP segmentation offloading, which means
that it can handle small frames on the wire fairly efficiently,
while the RTL-8169 does not, so jumbo frame support is rather
more useful for it.
Also, to take advantage of the ability of the BCM5752 to
transmit but not receive jumbo frames is easy: change the
relevant routes to have with an MTU value higher than 1500, but
set or leave the advertised MSS value under 1500 (minus 40 for
the header), for example (the advmss 1460
part
is redundant) instead of issuing
ifconfig eth0 mtu 9000
which is not valid, this would work:
ip route change 192.168.1.0/24 dev eth0 mtu 9000 advmss 1460
ip route change default via 192.168.1.1 dev eth0 mtu 9000 advmss 1460
As in the example above each route via the interface associated
with the BCM5752 chip must be changed, if opportune.
- 071215 Sat
Network accelerators run Linux
- I have read some reviews about the
Killer NIC
game network accelerator
which is quite a remarkable network card for a number of
reasons. One is that it is quite expensive, and the
other is that it reduces the network latency of
MS-Windows games by
offloading network processing
to a card with a Linux kernel on it (and some sort of
Freescale CPU
and 64MB of RAM). In some reviews it is reported as
actually lowering latency, but this is not a large or as
uniform effect. Anyhow Cavium"
have released
some general purpose network accelerator product
based on their
previously mentioned
Octeon multi-CPU chip. A bit
pricey as they seem to cost about the same or more as
low latency 10gb/s cards.
- 071215 Sat
2.5" hard disks differences
- My impression is that there are very few commodities,
as often even very similar products have important
differences, which may or not matter to everybody, but
matter to someone. One of the latest illustrations are
large differences among 2.5" hard disc drives
as to seequential small block performance, which is
quite useful to have. The other tests also show
remarkable differences (around 25%), and in different
ways, as different onboard
firmware
reacts differently to different usage patterns. For
example
multithreaded reading and writing
shows differences in performance of four times
between fastest and slowest.
- 071210 Mon Much
better read latency with AMD's HyperTransport
- From an interesting review of the new
AMD X2 5500
there are some
impressive numbers about memory latency differences
between current AMD and
Intel
CPUs.
Latency in main
RAM
still matters, especially
for systems with smaller caches, and less so the larger the
cache. As to this Intel has the lead, thanks to their superior
capital base that enables investing in better process technology
which results in more onchip cache, and Intel are clearly
driving the memory market towards memory with high latencies and
high bandwidths, which is the combination that best feeds the
large caches of their CPU chips via their high latency memory
buses.
SDR memory had
latencies of a few cycles,
DDR of half a dozen
cycles,
DDR2 of a dozen
cycles, and now
DDR3 of a couple
dozen cycles. A little known detail is tha the intrinsic speed
of a memory cell has improved very little over the past decade
or two (a mere doubling, from around 100MHz to 200MHz), and all
transfer rate increases have come from higher degrees of
pipelining and parallelism at the integrated circuit level.
Unfortunately pipelining and parallelism only work well in the
aggregate, as they involve those ever higher latencies already
mentioned, especially noticeable for random accesses to memory.
Intel seem to be driving RAM to become what used to be
called bulk store in the mainframe era, a cheap vast
repository of seldom used data that can be recalled to main
memory faster than from disk. The level 2 cache is the new main
memory. In other words, RAM is no longer meant to be that random
access, and is meant to be a commodity, where the performance
and value reside in the onchip memory. It is no coincidence that
Intel has exited the RAM market long ago, and have invested
massive capital in processes that allow ever greater amounts of
onchip memory.