On Thu, 24 Feb 2005, Ramon Bastiaans wrote:
Because SystemImager's image'ing tools dont
compress or image the files, we
need to cast an entire filesystem (lots of files) over the network. And
because udpcast only supports sending/receiving a single file and writing to
a single file descriptor, it can only write asynchronously to one file.
Because of this we use tar to pipe all files through on both receiver and
The idea is to send a whole partition, if possible. This can be
significantly faster than using tar (see below). We used to clone
whole disks or partitions on our 128-node cluster, which was pretty
fast: About 20 MByte/s over two Fast Ethernet links (we used our own
tool though, not udpcast).
This is when the problem arose. When we didn't use
tar, we could get high
speeds and the (network/harddisk) hardware seemed to become the limiting
factor. But only when using the --nosync writes. Because we use tar (which
obviously has no --nosync option), now tar became the bottleneck.
The problem with tar is that it has to deal with each file
individually, which causes many movements of the disk's head. Each
move to a track where the next file or its corresponding inode is
located, brings some latency, which will ultimately reduce throughput.
If you clone a single file (like a whole partition), then there are
only very few head movements and the movements are only to the next
track on the disk. Hence, you get higher throughput than with tar.
Of course, cloning a whole partition also copies empty blocks, which
is not strictly necessary. Therefore, you clone more data than when
using tar, but you can do it with higher throughput. Whether this is
actually a win depends on a number of factors, most importantly the
fill rate of your partition.