This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.
playtesting, for the sake of technology analysis
;-)
, a few popular
games.
bout, which is more or less ideal for jumping in, having a break, and then continuining what was one doing. Strategy games instead tend to require much longer periods of time for example.
frames per second. My system is an average one for 2005 (Athlon XP 2000+, 512MB, NVIDIA 6800LE/Radeon X800 SE) and I tend to get between 20FPS and 30FPS (like this reviewer), with average visual
qualitysettings, in games such as PC Halo, Doom 3, Quake 4, F.E.A.R., UT2004. What are the designers of these games thinking? Well, that it runs just fine on their own top of the line PCs. In other words, they are designing games for other game designers, who tend to have top of the line systems too, in a social dynamic not dissimilar to that in the recent evolution of the Linux culture.
Firm | Model | Capacity | Seek min/avg/max |
Spin up current |
Noise | Misc |
---|---|---|---|---|---|---|
Maxtor | 6L250R0 | 250GiB | 0.8/9.3/17ms | 1.7A@12V | 25dB | NCQ |
HGST | 7K250 | 250GiB | 1.1/8.5/15.1ms | 1.8A@12V | 28dB | NCQ |
Seagate | ST3250823A | 250GiB | 0.8/9/??ms | 2.8A@12V | 28dB | |
WD | WD2500JB | 250GiB | 2/8.9/21ms | 2.4A@12V | 34dB |
File system | Repack | Avg. transfer rate |
---|---|---|
used JFS | 17m31s 52s | 6.3MiB/s |
new JFS | 09m46s 56s | 11.4MiB/s |
somenumber of things that need processing, like say AI for NPCs, or dynamic texture effects per say vehicle, or whatever.
ext2
the filesystems that I need to share between MS
Windows 2000 and GNU/Linux Fedora on my system (almost
only games), using
Ext2IFS
and it has just been working reliably and speedily,
and with the removal of those limitations it is a
large improvement over FAT32.and becomessort, that is assignment combined with the operation on two operands, as in
x = y x += zrather than
x = y + zSo for example by defining for example in C:
void Matrix33PlusAB(Matrix33 *const x,const Matrix33 *const y)or in C++:
void operator *=(Matrix 33 *const x,const Matrix33 *const y)Less preferably one might accept three operand functions as in
void Matrix33SetPlus(Matrix33 *const x, const Matrix33 *const y,const Matrix33 *const z)where unfortunately one has to handle the case where the first operand is the same as one of the last two.
and becomesstyle has three considerable benefits:
spatial localityfor example, as if it meant something useful).
ext3
filesystem, and
then applied many updates with the latest and greatest.
Then I have decided to switch the root filesystem to
JFS like most of the others, and to apply many more
upgrades. So after few days and many updates on
ext3
, I copied the result to a freshly made
JFS filesystem, and then many more updates, and
kept the ext3
one around.
cfq
scheduler:
with some surprising results:
File system | Elapsed (system) | Bandwidth | Notes |
---|---|---|---|
used ext3 1KiB |
25m37s (41s) | 3.9MiB/s | dir_index , ACLs |
new ext3 1KiB |
17m45s (44s) | 5.6MiB/s | dir_index , ACLs |
used JFS 4KiB | 23m33s (34s) | 4.2MiB/s | ACLs |
new JFS 4KiB | 09m57s (37s) | 10.9MiB/s | ACLs |
ext3
filesystem, as
previously,
on a rather similar filesystem on the same disk it was
much lower.
dir_index
) and SELinux file attributes,
and either may be responsible, and indeed without
either the speed for ext3
goes up a lot:
File system | Elapsed (system) | Bandwidth | Notes |
---|---|---|---|
new ext3 4KiB |
04m55s (52s) | 21.8 MiB/s | no dir_index , no ACLs |
new ext3 4KiB |
04m57s (52s) | 21.7MiB/s | no dir_index , ACLs |
new ext3 4KiB |
18m38s (56s) | 5.7 MiB/s | dir_index , no ACLs |
new JFS 4KiB | 09m21s (53s) | 11.5MiB/s | no ACLs |
fsck.ext3 -D
, but that did no change
the result, probably because it only restructures the
hash table, does not improve its locality.
find
is quick, at around 3 minutes, which
increases the mystery. Perhaps the hash trees are
fairly well clustered together, but far away from the
directory or its inodes.
dir_index
option for ext3
unless really necessary and the usual: that
ext3
filesystems are amazingly fast when
freshly loaded, and degrade rather quickly, while JFS
filesystems are less fast to start with, but degrade
less quickly; they also support directory indexes
quite efficiently.
:-)
.tar
archive of it. To compare I decided to do it with
three different compressors, and to check the time
needed to decompress it, as I have already had a look
at
compression time and ratio.
Decompressing the same data on an Athlon XP 2000+ with
512MB of SDRAM gives:
lzop -d | gunzip | bunzip2 | |
---|---|---|---|
size of compressed data | 3.36GiB | 2.78GiB | 2.50GiB |
CPU user | 45s | 117s | 327s |
CPU system | 18s | 12s | 6s |
bunzip2
is
really rather slow at decompressing too, and while
gzip
is over three times slower at
compression than lzop
, and almost three
times slower at decompression, the absolute amount of
time is not that bad.RedHat, RedHat's distributions are called RedHat Enterprise), so I am quite familiar with RPM (a major reason for the change).
-t
options).
Unfortunately this means that KYum and YumEx also
terminate it, which means that if one selects
several packages on which to operate with them,
any operation will succeed only if it can on all
of them. In other words, one is driven to select
only a few at a time./root/cfg/
, so that among others
/etc/passwd
and
/root/cfg/etc/passwd
are linked (the
actual scheme I use is a little more involved than
that, to account for multiple hosts).
/root/cfg/
I
have a complete and current list of all the
configuration files I need to carry over if I do a
distribution reinstallation distribution change. Then
there will be some incompatibilities, as the locations
of the same files will be different, or their contents
need to be adjusted, or some distributions use
different base tools. For example I am now using
CUPS
as the print spooling system instead of
LPRng
(a move about which I am not that happy).
asound.conf
syntax is vaguely Lisp-like itself).drive letterswhich tie file name spaces to volumes and they handle things by subtree, not by volume.
From some quick tests I just ran, for 32bit binaries xfs_check needs around 1GiB RAM per TiB of filesystem plus about 100MiB RAM per 1million inodes in the filesystem (more if you have lots of fragmented files). Double this for 64bit binaries. e.g. it took 1.5GiB RAM for 32bit xfs_check and 2.7GiB RAM for a 64bit xfs_check on a 1.1TiB filesystem with 3million inodes in it.
For xfs_repair, there is no one-size fits all formula as memory consumption depends not only on the size of the filesystem but what is in the filesystem, how it is laid out, what is corrupted in the filesystem, etc. For example, the filesystem I checked above only required ~150MiB for repair to run but that is a consistent filesystem. I've seen equivalently size filesystems (~1TiB) take close to 1GiB of RAM to repair when they've been significantly corrupted.
/proc
filesystem).
CODE_COVERAGE= $(RELEASE_BUILD)CODE_COVERAGE:sh= echo \\043 $(CODE_COVERAGE)$(OBJS_DIR)/unix_bb.o := CPPFLAGS += -DKCOV $(CODE_COVERAGE)$(OBJS_DIR)/unix_bb.ln := CPPFLAGS += -DKCOVIt is specially horrid because the intent seems to be to rely on comments being parsed after macro expansion, as code
043
is for the
#
comment character, the intended effect
apparently being that if $(RELEASE_BUILD)
is empty, then CODE_COVERAGE
is set to
the comment character, and the next two lines are
commented out.
#$(OBJS_DIR)
(for example
#build
), instead of
$(OBJS_DIR)
(for example
build
), and since presumably there is no
rule for targets beginning #$(OBJS_DIR)
,
but only rules for targets beginning
$(OBJS_DIR)
, those macro definitions have
no effect on the build.
# 'CODE_COVERAGE' and 'RELEASE_BUILD' can be either "yes" or # empty, and we want code coverage only if the former is "yes" # and the latter is empty. KCOV+yes+ =KCOV KCOV =${KCOV+${CODE_COVERAGE}+${RELEASE_BUILD}} CPPFLAGS+unix_bb+KCOV =-DKCOV ${OBJ_DIR}/unix_bb.o: CPPFLAGS += ${CPPFLAGS+unix_bb+${KCOV}} ${OBJ_DIR}/unix_bb.ln: CPPFLAGS += ${CPPFLAGS+unix_bb+${KCOV}}There is an equivalent technique to do conditional execution of rules, using phony targets, for example:
# Conditional assignment to CFLAGS depending on ${ARCH} CFLAGS-x86 =-O3 CFLAGS-PPC =-O2 -funroll-loops CFLAGS =${CFLAGS-${ARCH}} # Conditional build depending on ${ARCH} build: build-${ARCH} build-x86: .... .... build-PPC: .... ....which avoids cleanly a lot of inappropriate (because un-Make-like) conditionals even in Make variants that have them.
I don't think 2.2 and 2.4 models are applicable any more. There are more of us, we're better (and older) than we used to be, we're better paid (and hence able to work more), our human processes are better and the tools are better. This all adds up to a qualitative shift in the rate and accuracy of development. We need to take this into account when thinking about processes.As previously remarked the new and improved status of the developers while lucky for them also has significant downsides.
It's important to remember that all those kernel developers out there *aren't going to stop typing*. They're just going to keep on spewing out near-production-quality code with the very reasonable expectation that it'll become publically available in less than three years. We need processes which will allow that.where I suspect that
near-production-qualityis often rather optimistic
:-)
.fake RAIDcards that have been out so far.
* The ITE8212 isn't exactly a standard IDE controller. It has two * modes. In pass through mode then it is an IDE controller. In its smart * mode its actually quite a capable hardware raid controller disguised * as an IDE controller. Smart mode only understands DMA read/write and * identify, none of the fancier commands apply. The IT8211 is identical * in other respects but lacks the raid mode.The only issue I can see with the IT8212F wrt Linux is that neither of the two drivers have made it yet into the mainline Linux kernel, which has caused some slight maintenance complications in the past, as the internal kernel driver API churn has occurred.
drivers/ide/it821x.h
, is available
as a
patch for Linux kernel 2.6.12
from a message to the LKML, and which has become a
standard part of the Linux kernel with the name
drivers/ide/it821x.c
.drivers/scsi/iteraid.[hc]
, by
Mark Lu and colleagues, is available as
ZIP file with sources and binaries
from the ITE web site, but only for Linux kernels up
to 2.6.10, as an external module.
-mm
series of experimental Linux
patches, the latest patch being
for Linux kernel 2.6.13-rc3-mm3,
after which the Alan Cox driver has become part of
the mainline kernel instead. But it can still be
used with this
patch for Linux kernel 2.6.14
from the
enhanced kernel
of the
GRML live/rescue GNU/Linux CD
(which contains much other goodness).iteraid
driver as
updated for 2.6.14 and it just did not work for
me. I suspect this is due to it being oriented to
RAID operation, but it should still work in plain
IDE mode. The problem was that the two drives
attached to it were detected, but with size 0, and
without the right names.it821x
driver and it
seemed to work well and fast in IDE mode; but only
for reading. When trying to write at first I got
lots of IO errors due to CRC issues. I imagined
that since the drives were actually in a dodgy
enclosure this could be a problem, so I attached
them direct to the card. This seemed to avoid the
issue, but then it reappeared even if far less
frequently. Usually CRC errors are due to bad
cables creating electrical problems, but these are
cables that give no trouble on a similar IDE HBA.
What I suspect is poor build quality of the Q-TEC
card more than problems with the IT8212F chipset
or the drivers.There is already a minimal set of IA32 libraries packaged for use in a 64bit Debian system. Simply do an 'apt-get install ia32-libs' and you will be able to run most 32bit binaries within your system.The dumb practice of putting version numbers into package base names has been going on for a while:
i 356kB 807kB 4.1.25-18 4.1.25-18 libdb4.1 i 368kB 868kB 4.2.52-20 4.2.52-20 libdb4.2 i <N/A> 979kB 4.2.52-19 4.2.52-19 libdb4.2++ i 399kB 926kB 4.3.28-3 4.3.28-3 libdb4.3 i 427kB 1020kB 4.3.28-3 4.3.28-3 libdb4.3++c2but since the AMD64 architecture has become popular the Debian packagers have indulged in the even more nefarious one of putting the architecture name into the based name, and inconsistently, as this list of i386 architecture package names I can select by searching for those which contains the strings
amd64
or lib64
shows (forgive the messiness dues to my rather daring
/etc/apt/sources.list
file):
p 4492kB 11.0MB <none> 1.1 stable amd64-libs p 18.6MB 76.2MB <none> 1.1 stable amd64-libs-dev p 2020B 8192B <none> 103 stable kernel-headers-2.6-amd6 p 2038B 8192B <none> 103 stable kernel-headers-2.6-amd6 p 2026B 8192B <none> 103 stable kernel-headers-2.6-amd6 p 224kB 14.6MB <none> 2.6.8-14 stable kernel-headers-2.6.8-11 p 223kB 14.5MB <none> 2.6.8-14 stable kernel-headers-2.6.8-11 p 219kB 14.2MB <none> 2.6.8-14 stable kernel-headers-2.6.8-11 p 2072B 8192B <none> 103 stable kernel-image-2.6-amd64- p 2082B 8192B <none> 103 stable kernel-image-2.6-amd64- p 2092B 8192B <none> 103 stable kernel-image-2.6-amd64- p 12.6MB 44.6MB <none> 2.6.8-14 stable kernel-image-2.6.8-11-a p 13.2MB 46.6MB <none> 2.6.8-14 stable kernel-image-2.6.8-11-a p 13.2MB 46.7MB <none> 2.6.8-14 stable kernel-image-2.6.8-11-a p 34.8kB 115kB <none> 1.0.2-7ubu breezy lib64bz2-1.0 p 29.3kB 106kB <none> 1.0.2-7ubu breezy lib64bz2-dev p 7460B 20.5kB <none> 4.1-0exp0 lib64ffi4 p 54.6kB 127kB <none> 1:3.4.4-9 testing,unstable lib64g2c0 p 84.6kB 135kB <none> 1:3.4.3-13 stable lib64gcc1 p 107kB 352kB <none> 4.0.2-3 unstable,unstabl lib64gfortran0 p 91.0kB 246kB <none> 4.1-0exp0 lib64mudflap0 p 319kB 676kB <none> 5.5-1 unstable,unstabl lib64ncurses5 p 382kB 1282kB <none> 5.5-1 unstable,unstabl lib64ncurses5-dev p 43.9kB 115kB <none> 4.0.2-3 unstable,unstabl lib64objc1 p 4598B 16.4kB <none> 4.1-0exp0 lib64ssp0 p 326kB 1004kB <none> 3.4.3-13 stable lib64stdc++6 p 8705kB 34.8MB <none> 4.0.2-3 unstable,unstabl lib64stdc++6-4.0-dbg p 8661kB 35.3MB <none> 4.1-0exp0 lib64stdc++6-4.1-dbg p 53.2kB 135kB <none> 1:1.2.3-6 unstable,unstabl lib64z1 p 56.3kB 168kB <none> 1:1.2.3-6 unstable,unstabl lib64z1-dev p 3253kB 8098kB <none> 2.3.5-8 unstable,unstabl libc6-amd64 p 2000kB 9462kB <none> 2.3.5-8 unstable,unstabl libc6-dev-amd64The above are
i386.deb
packages that
allow executing some AMD64 stuff on a base 32 bit
system; there are of course amd64.deb
packages that have their name warped to indicate they
contain 32 bit libraries and executables.ext2
my bulk archiveFAT32 partitions, as one can find pretty good
ext2
drivers for MS Windows 2000
(e.g. 1,
2).
SystemDrive
environment variable,
but in practice it is hard-coded in very many paths.
In general it gets hard-coded in the paths of any
installed software packages, but unconscionably it
also gets hard-coded into the path to a particular
system command, Userinit.exe
, which is an
essential component of the login process.
W:
, so that it does not
take one of the lower drive letters leaving them to be
assigned to non-system partitions in the ordinary way,
but I am not aware of any easy way to achieve this.
Also, it might help to make all installations to a
virtual drive letter, one assigned either with
subst
or with the ability to
assign multiple letters to the same volume, which is
possible using Disk Management
. Or
perhaps one should just use the mountvol
command and do mostly away with drive letters. The
downside of this is that then all paths will be wrong
until the one drive letter that one cannot get rid of,
the system drive one, is restored.
Userinit.exe
is
pointless and can be removed because it is in the
system directory anyhow. Indeed it would be advisable
to just remove that prefix at any time after
installation, as it is not necessary and prevents
logins if the system drive letter changes.
ntpasswd
GNU/Linux utility and bootdisk
which also supports offline editing of a MS Windows
2000/XP registry, even if it is on a NTFS volume.
HKEY_LOCAL_MACHINE
hive under
SYSTEM\MountedDevices
) more or less the
equivalent of /etc/fstab
under GNU/Linux)
are more or less by volume ID, and there is no easy
way to check which volume, including the one with the
system, has which ID. By the way, since drive letters
are mostly by volume ID, cloning a MS Windows volume
creates two partitions with the same ID, which can be
one of the reasons why the system drive letter has
changed.
Disk
Management
to change the drive letter, and that
works well for all volumes except the system volume.
Fortunately I also tried
Partition Magic 8.0
which does allow changing the drive letter of the
system partition, and that was easy.
# vmstat 10 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- ... 1 1 232 5948 1140 350224 0 0 19279 19489 1051 2749 1 48 0 50 1 1 232 5784 1076 350212 0 0 18374 18668 1057 2772 2 46 0 51 1 1 232 6428 952 349612 0 0 21343 21600 1130 2958 2 55 0 43 1 1 232 5824 1140 350228 0 0 20692 20371 1200 3228 1 55 0 44 1 1 232 5700 1208 350108 0 0 20916 21106 1089 2900 1 55 0 44 1 1 232 5824 1268 349948 0 0 19301 19504 1044 2752 1 50 0 50This was a particularly bad case; in particularly good cases I have seen the cost as low as half of that, 100MHz per 10MiB/s. This is quite awful in absolute terms, because it means that on modern disks, that can easily do 40-50MiB/s, IO becomes CPU bound on a 0.5-1GHz CPU, which is a joke, considering that DMA means there is almost no CPU involvement in the IO itself, it is all about buffer cache management overhead.
cfq
elevator, but
with the anticipatory
and
noop
ones the cost is roughly the same.
Interestingly though while the average read or write
rate with cfq
is around 20MiB/s, for this
double-tar
copy that with
noop
is just around 16MiB/s, but with
anticipatory
it is around 24MiB/s.
Probably this is because of seeking to update
metadata while a file body is written sequentially.
respinningupdated install ISO images incorporating into them the Fedora updates, which include a newer kernel.
I'm very confident the Novell management will find a competent successor very quickly. After all, there are lots of extremely skilled people over there in the Ximian division.where it is significant that he says
Ximian
, the GNOME and Mono developers,
not SUSE
. He seems to be alluding that
Ximian have won some internal battle with SUSE, and
Novell is now committed to Mono and GNOME, and bye-bye
SUSE and Linux.
reorganizedtoo.
/proc/sys/vm/block-dump
because of a suspicions
and it turned out that both kswapd
and
pdflush
were writing to the destination disk,
and in a non-fully sequential order.
pdflush
was being too lazy,
and I looked at its various parameters in
section 2.4 of Documentation/filesystems/proc.txt
and I realized that just like other parameters like
swappiness
it is set to favour too much caching of disc blocks, which is
largely pointless and/or damaging, because of lack of locality
beyong a certain threshold. So I have set
dirty_background_ratio
to 3 (percent) and
dirty_ratio
to 6 (percent)
as (except probably on laptops) it is fairly obscene to leave
more than a few percent of system memory dirty; indeed the
parameters should be not a percentage, but a quantity,
as the amount of unwritten blocks should not depends on how
much memory there is, but things like disc speed and the
user's tolerance for mishaps.
MN: NaturalMotion's endorphin animation technology is one of the hottest topics on Madden fan sites right now. Is there a chance that we might see this technology or some form of rag doll/on the fly physics in Madden at some point in the future?The
JS: I can't speak towards specific technologies at this point, but let me just say this is something that is a lot easier in demos than a full game, especially on the scale of a football game.
a lot easier in demospoint is lovely, not just because I reckon that it defines the whole startup/dotcom scene, but also regrettably a lot of the software one in general.
ext3
to
JFS around
six weeks ago,
and I have been just using my system normally since, and I
have done several package updates (updated X a couple of time,
KDE a couple of times, several other bits and pieces).
rootfilesystem to an otherwise unused disc, and measured how much time it takes to find a few files in it and to read it all with
tar
; and I
did it again after reformatting the partition, and loading it
from fresh. The results (elapsed and system CPU times) are:
File system | Repack | Find |
---|---|---|
used JFS | 26m45s 63s | 04m17s 06s |
new JFS | 10m20s 55s | 03m15s 06s |
sync
time
afterwards; in the find
test this adds around 4m
as all the atime
s get updated and have to be
finalized.
File name | Size | Used or new JFS filesystem |
Transfer rate |
---|---|---|---|
var/tmp/db | 130,476MiB | used | 4.5MiB/s |
var/tmp/db | 130,476MiB | new | 27.0MiB/s |
var/cache/apt/srcpkgcache.bin | 21,788MiB | used | 16.8MiB/s |
var/cache/apt/srcpkgcache.bin | 21,788MiB | new | 35.0MiB/s |
var/tmp/db
is a test Berkeley DB
I filled from scratch some time ago
and var/cache/apt/srcpkgcache.bin
is a file that
is rebuilt every time I update the repository package lists,
and is created sequentially. This seems to be reflected in
their respective levels of fragementation deducible from the
slowdowns above.
/var
, and in that there is a
100MiB
Squid
cache, but that cannot skew the result that much. As the
couple of file transfers above insinuate, it is the ordinary
files that get shredded into somewhat separated extents..sxc
version with OOo 1.1.5 takes 325s elapsed,
269s of which are user CPU time and 26s of which are system
CPU time; with OOo 2.0 it took 321s elapsed, 282s of which
user CPU time and 6 seconds of which are system CPU time. I
tried also with
KSpread
1.4.2 and it just crashed after a while.
.xls
format. Well, with OOo 1.1.5 this failed, after several
minutes, and after the size of the program had grown to more
than 900MiB, more than 20 minutes had elapsed, and much paging
had occurred, as my PC has only got 512MiB of RAM.
.xls
format using OOo 2.0 in various formats with crashes and
bizarre error messages.
/tmp
filesystem had
become full with the temporary files of previous crashes, and
that was the cause of the problems, even if the error messages
were entirely misleading. Once I cleaned up /tmp
I was able to save the 16,282 line CSV into various formats,
fairly quickly in each case (around a dozen seconds) and with
the following resulting sizes and loading times (the elapsed
times and the RAM used numbers are approximate):
Program | Format | Size | Elapsed time | User CPU time | System CPU time | RAM |
---|---|---|---|---|---|---|
OOo Calc | none | 0 | 5.7s | 4.6s | 0.3s | 19MiB |
OOo Writer | .txt |
1180KiB | 5.9s | 4.7s | 0.5s | 30MiB |
XEmacs | .txt |
1180KiB | 2.1s | 1.6s | 0.2s | 8MiB |
OOo Calc | .csv |
1180KiB | 9.4s | 7.4s | 0.5s | 32MiB |
OOo Calc | .xls |
3220KiB | 7.8s | 6.2s | 0.4s | 29MiB |
OOo Calc | .sxc |
248KiB uncompressed: 19.3MiB |
24.6s | 22.3s | 0.7s | 35MiB |
OOo Calc | .ods |
248KiB uncompressed: 19.3MiB |
19.5s | 17.7s | 0.8s | 35MiB |
.xls
one, which is also the fastest for MS Excel
2003.
.xls
format is 2s,
which means around 25MiB/s of data reading and parsing and
unpacking speed, which sounds somewhat optimistic. Perhaps MS
Excel is optimized to read just what it needs (first screenful
of the first sheet) in the .xls
case.
The Software is a collective work of Novell. You may make and use unlimited copies of the Software for Your distribution and use within Your Organization. You may make and distribute unlimited copies of the Software outside Your organization provided that: 1) You receive no consideration; and, 2) you do not bundle or combine the Software with another offering (e.g., software, hardware, or service).and Fedora Core 4:
The Software is a collective work under U.S. Copyright Law. Subject to the following terms, Fedora Project grants to the user ("User") a license to this collective work pursuant to the GNU General Public License.and it is pretty obvious that there is a big difference: SUSE OSS, while being composed entirely of free/open software:
At openSUSE.org, anyone can download the OSS (Open Source Software) version of SUSE Linux 10 for free. This code is strictly OSS and does not have any of the licensed components like RealPlayer, Adobe, and licensed drivers that you would find in SUSE Linux 10 (non-OSS).is not itself as a whole free software, while Fedora is clearly so, the licence for the compilation copyright being the GPL.
$ time perl megamake.pl /var/tmp/db 1000000 50 100 real 6m28.947s user 0m35.860s sys 0m45.530s $ ls -sd /var/tmp/db* 130604 /var/tmp/dband for doing 100,000 random record accesses in it:
$ time perl megafetch.pl /var/tmp/db 1000000 100000 average length: 75.00628 real 3m3.491s user 0m2.870s sys 0m2.800sthat is about 2,500 records inserted per second and 500 random record accesses per second; the total space used is 130MiB, instead of the 4GiB needed for 1M files whose minimum size in most filesystems is 4KiB.
strace -f -e trace=open,stat64 app 2> /tmp/app.strace egrep -c '\<(open|stat64)\(' /tmp/app.strace egrep '\<(open|stat64)\(' /tmp/app.strace | egrep -c ' = [0-9]'to count total files (inodes) accessed and accessed successfully, with the following results (under KDE, so this is sort of biased in favour of KWord):
Application | Accesses | Successful |
---|---|---|
soffice |
1660 | 715 |
abiword |
1597 | 1274 |
kword |
1836 | 1632 |
vim |
187 | 92 |
gnuserv
and gnuclient
.advisingsystem calls, which are particularly useful if often unimplemented. The discussion was with someone who is doing PhotoShop plugins, and was wondering why it and similar packages implement their own software virtual memory system called
tilinginstead of relying on the virtual memory provided by the operating system.
Although the folklore indicates that LRU is a generally good tactic for buffer management, it appears to perform only marginally in a database environment. Database access in INGRES is a combination of:Although LRU works well for case 4, it is a bad strategy for other situations. Since a DBMS knows which blocks are in each category, it can use a composite strategy. For case 4 it should use LRU while for 1 and 3 it should use toss immediately. For blocks in class 3 the reference pattern is 1, 2, 3 ..... n, 1, 2, 3 ..... Clearly, LRU is the worst possible replacement algorithm for this situation. Unless all n pages can be kept in the cache, the strategy should be to toss immediately. Initial studies[9] suggest that the miss ratio can be cut 10-15% by a DBMS specific algorithm.
- sequential access to blocks which will not be rereferenced;
- sequential access to blocks which will be cyclically rereferenced;
- random access to blocks which will not be referenced again;
- random access to blocks for which there is a nonzero probability of rereference.
In order for an OS to provide buffer management, some means must be found to allow it to accept "advice" from an application program (e.g., a DBMS) concerning the replacement strategy. Designing a clean buffer management interface with this feature would be an interesting problem.
Although UNIX correctly prefetches pages when sequential access is detected, there are important instances in which it fails.However, for the sake of my
Except in rare cases INGRES at (or very shortly after) the beginning of its examination of a block knows exactly which block it will access next. Unfortunately, this block is not necessarily the next one in logical file order. Hence, there is no way for an OS to implement the correct prefetch strategy.
team
buffered sequential copy program I have investigated what kind
of advising calls are available under Linux some time ago, and
I recently revisited the subject. I have been grateful that
the effect of the various options under Linux is now
documented in
posix_fadvise
(2)
but when I examined it I felt very disappointed, because it
says as to the policies on offer:
Under Linux,which seems to me quite misdesigned. These options apparently only affect read-head, and do not address keep-behind, and are rather incomplete, as a LIFO policy is missing. In theory, thePOSIX_FADV_NORMAL
sets the readahead window to the default size for the backing device;POSIX_FADV_SEQUENTIAL
doubles this size, andPOSIX_FADV_RANDOM
disables file readahead entirely. These changes affect the the entire file, not just the specified region (but other open file handles to the same file are unaffected).
NORMAL
policy should be suitable for LIFO access
patterns (so read-ahead no, but keep-behind yes),
SEQUENTIAL
for FIFO access patterns (so
read-ahead yes, but keep-behind no) and RANDOM
for random ones (so neither read-ahead not keep-behind).
team
(or any other) program
to do large sequential copies, as I do to backup whole
partitions on my disks, in particular as keep-behind leads to
the just read, and not likely to be used again, blocks just
copied to be kept around in the file page cache, which leads
them
to crowd out other more useful pages.
POSIX_FADV_WILLNEED
and
POSIX_FADV_DONTNEED
as suitable; adding to the
mess, one of the usual suspect has masterfully added the
O_STREAMING
option
to fcntl
(2), which might as well be used too.
#ifdef POSIX_FADV_SEQUENTIAL (void) posix_fadvise(fdin,(off_t) 0,(off_t) ilength,POSIX_FADV_SEQUENTIAL); (void) posix_fadvise(fdout,(off_t) 0,(off_t) olength,POSIX_FADV_SEQUENTIAL); errno = 0; #endif #ifdef O_STREAMING (void) fcntl(fdin,F_SETFL,O_STREAMING); (void) fcntl(fdout,F_SETFL,O_STREAMING); errno = 0; #endifand something like this code for reading
int readbytes = read(fdin,buffer,bufbytes); #ifdef POSIX_FADV_DONTNEED if (donebytes >= 0 && readbytes >= 0) { (void) posix_fadvise(fdin, donebytes,(off_t) readbytes,POSIX_FADV_DONTNEED); errno = 0; } #endif #ifdef POSIX_FADV_WILLNEED if (donebytes >= 0 && readbytes > 0) { (void) posix_fadvise(fdin, donebytes+readbytes,(off_t) bufbytes,POSIX_FADV_WILLNEED); errno = 0; } #endifto start reading ahead the next buffer and discarding the just read blocks, and something like this code for writing:
int writtenbytes = write(fdout,buffer,readbytes); #ifdef POSIX_FADV_DONTNEED if (donebytes >= 0 && writtenbytes > 0) { (void) posix_fadvise(fdout, donebytes,(off_t) writtenbytes,POSIX_FADV_DONTNEED); errno = 0; } #endifwhich to me seems rather redundantly pleonastic, especially considering that this could be easily autodetected heuristically in the standard C library or by the OS itself.
cmp -l
to compare the ISO
image and the DVD disc it had been burned to, and there were
no differences: but cmp -l
reported that the ISO
image was shorter.
benchmarkshe has performed. The
methodologyemployed is not explained, and neither the context, but parts can be gleaned from the script used to run the
benchmarks, which made me think that these are based on comical misunderstandings.
which to me sounds all the more funny as the script does invoke> Consider for example the relative size of the kernel source tree > and that of your PC's memory, and that 'sync' does not have yet > magical :-) powers. why should i sync to disk, when i can use RAM?
sync
repeatedly, which does clean dirty
blocks, but does not remove them from the cache...serverclass SCSI drives it is disabled by default, while on essentially all ATA drives it is enabled by default, and in many cannot be disabled.
fsck
of large filesystems
and it turns out that since the XFS fsck
utilities keep everything in memory, for somewhat large
filesystems with many files a 32 bit address space is
simply insufficient, and
a 64 bit address space
is needed. Note that the limit is indeed the address
space, not just the physical memory. This reminds me of
similar problems, for example with
programs with many threads
where a 32 bit address space is exhausted.
fsck
speed, a potentially large problem with
very large filesystems
even when, like XFS, they can warm restart very quickly
thanks to journaling and internal parallelism. Sooner or
later a cold restart has to be done with an external
fsck
style tool.logrotate
on their behalf) or downloading
program can preallocate enough space; the downloading
program knows exactly how much to preallocate, and the
logging program can be configured to rotate logs every
N megabytes, so it can do that too in such a
case, or at least keep a significant chunk
preallocated.ext2
implementation does attempt to detect sequential file accesses
in a simple minded way and then switches then to fixed-size
clustering.
advicefor both memory mapped and buffered IO accesses (and invoke
madvise
(2),
posix_madvise
(2)
and
posix_fadvise
(2)
as appropriate) in different places:
:-)
as to cp
under SunOS when Sun switched to mmap
'ed
IO.stdio
(or the PLAN 9 or C++ equivalents)
to perform file accesses, and from the parameters passed
to fopen
(3) and the first few subsequent
operations it is possible to guess the file access
pattern:
"r"
,
"w"
or "a"
there are
extraordinarily good changes that the access pattern is
POSIX_FADV_SEQUENTIAL
; if in particular it
is "r"
one can expect
POSIX_FADV_DONTNEED
to apply."+"
there are
pretty good chances that the file access mode is
POSIX_FADV_RANDOM
if writing and
POSIX_FADV_NORMAL
(should be LIFO, but
under Linux it actually does a small degree of FIFO)
if reading.fseek
operations
precede reads or wirtes or not, corrections can be made
dynamically. For example if a file with a mode ending in
"+"
but there is no seeking one can
expect this is an overwrite operation and switch to
POSIX_FADV_SEQUENTIAL
.ext
[23] filesystem and draws a map of
its fragmentation. Interesting especially considering the
shocking discovery of a
sevenfold
decrease in ext3
performance over time.
fragmentation:
groupswhich are subset of the storage area of a filesystem in which a subset of a filesystem is contained; for example cylinder groups for
ext3
and
allocation groups for JFS.
ext2
and the block IO subsystem, after a
suggestion I made in a news discussion long ago, do both
allocation and read/write clustering, instead of using large
blocks. The discussion was about the idea that
ext2
perhaps should be like the Berkeley FFS,
and have large blocks (for speed) and small tail fragments (for saving space).
ftruncate
(2) system
call.
truncate
logic on
file close. As to the read/write clustering, the SCSI
subsystem of the time could already support it with little
effort, and it was especially efficient if mailboxing was
supported by the host adapter.
ext2
and to this day the
manual page
says:
and even if this phrase appears in themke2fs
accepts the-f
option but currently ignores it because the second extended file system does not support fragments yet.
BUGSsection it describes, for once, not a bug, but really a feature, as confirmed in this paper:
Ext2fs takes advantage of the buffer cache management by performing readaheads: when a block has to be read, the kernel code requests the I/O on several contiguous blocks. This way, it tries to ensure that the next block to read will already be loaded into the buffer cache. Readaheads are normally performed during sequential reads on files and Ext2fs extends them to directory reads, either explicit reads (readdir(2) calls) or implicit ones (namei kernel directory lookup).and this is probably one of the reason why (freshly loaded)
Ext2fs also contains many allocation optimizations. Block groups are used to cluster together related inodes and data: the kernel code always tries to allocate data blocks for a file in the same group as its inode. This is intended to reduce the disk head seeks made when the kernel reads an inode and its data blocks.
When writing data to a file, Ext2fs preallocates up to 8 adjacent blocks when allocating a new block. Preallocation hit rates are around 75% even on very full filesystems. This preallocation achieves good write performances under heavy load. It also allows contiguous blocks to be allocated to files, thus it speeds up the future sequential reads. These two allocation optimizations produce a very good locality of:
- related files through block groups
- related blocks through the 8 bits clustering of block allocations.
ext2
filesystem routinely figure at the top of
every benchmark.
ext3
for a long time (as it complicates
journaling a bit), and I suspect that this may be part of the
reason why I observed a
sevenfold
reduction in performance in a well used filesystem as opposed
to a freshly loaded one.snapshotsof a filesystem for backup purposes.
fsck
as taking
more then one month
on a large but not that large (around
1.6 TiB)
filesystem on a
RAID,
and one of the reasons is that recovery involves attaching
very many files under lost+found/
, which can be
very slow if it contains very many entries. Thsi is a special
case misfortune, but the issue of multithreaded checking remains.
ext3
file system seems not to be,
and neither seems to be the ReiserFS one. Probably the XFS is,
and I hope that the JFS one is too.hdparm -a 32 /dev/hd
a to ensure
that the kernel
block device readahead
is smaller than the rather excessive default of 256.echo
>/sys/block/hd
a/queue/scheduler
cfq
to ensure the latency-impairing anticipatory elevator is
not used, and the
cfq
one is used instead, which prevents one process hogging a
disc.echo >/proc/sys/vm/page-cluster 0
to
disable the counterproductive and ridiculous
swap-ahead logic.echo >/proc/sys/vm/swappiness 20
to
reduce the amount of memory
that the system devotes to the file page cache.Update: it turns out that in nearly all kernel versions the cfq IO scheduler is quite buggy and should be avoided, and only deadline can be recommended.
Update: the block device read-ahead should be much larger than the default of 256 because of very debatable choices in the block IO subsystem that prevent proper sequential read streaming without that.
Of these the one that took me longest to explain was the last one, about thevm/swappiness
parameter of the
virtual memory subsystem, because in a sense it is
representative of all the others, and the poor reasoning
behind each.
phase hypothesis), therefore usually file data should be handled with a FIFO oriented policy (except for top level metadata) and process data with a LIFO oriented policy.
madvise
(2)
and
posix_fadvise
(2)
(or the mysterious, why-ever,
O_STREAMING
option to the fcntl
(2) system call)
to inform the kernel policy modules of the expected access
patterns, for example for file data with
POSIX_FADV_SEQUENTIAL
, but that does not seem to
work that well (as in, they seem ignored by the kernel).
hdparm -a
sort of does the prefetch bit
(inflexibly and always) and as to discard-behind one can just
reduce the size of the file page cache by decreasing
vm/swappiness
drastically.
vm/swappiness
about, but on the
size of the filesystems (and of their metadata) and the
access patterns of the applications to them.
nv
X driver, which does not support OpenGL 3D
acceleration, to the proprietary NVIDIA one, which does.
nvtuner
and
nvclock
)
that allow tweaking the chipset, which is extremely
configurable, to enable the disabled bits and to change the
clock speed, and using these I have managed to get a little
extra out of the chipset.
Without any fanfare, Novell Inc. has released the latest version of its flagship Linux distribution: SuSE Linux 10).
gcj
spawns the compiler
executable many times.
ext3
, if in the longer term it
results no worse than ext3
at performance
degrading over time; which is likely, because of my shock at
discovering that ext3
performance can degrade
sevenfold over time
vfat
file system code in Linux is very CPU
intensive for whatever reason.
ext3
drivers for MS
Windows 2000 I am considering switching those partitions over
to ext3
. This is somewhat ironic as I just
switched from ext3
to JFS for my Linux
filesystems, but it would still be a large improvement.
ext3
driver for MS Windows 2000 is actually a
port of the Linux one, and that's encouraging.