Hi,
I wonder if someone could help me. I have started testing the UDPreceive/udpsend programs for rolling out linux onto refurbished PC's for a charity I work for. The idea is to role out linux onto 10-20 refurbished PC's (at once) before shipping them to charities/schools/colleges that really need them, mainly in Africa.
I am using the command line version because we are not transfering a raw disk image we are transferring a tar'd tree and preparing the machines for custom hardware detection (as all the hardware is basically random).
The tar file is about 1.2gig, and I am piping the output through both md5sum and tar to extract it. When I tried this on 1 machine everything was fine but today I tested it onto 5 machines in parallel and the MD5sum of the result on all 5 clients did not match the server version (they did however match each other) and the tar file appears to be slightly corrupted (though the machines did boot ok).
So my question is "is it possible that the data is corrupted during multicasting?"
and if so "Is there anything I can do to make it work reliably?"
There seem to be quite a few error correction parameters available, which ones should I try first?
Any help much appreciated, Matt.
On Saturday 19 June 2004 20:57, Matthew Cooke wrote:
Hi,
I wonder if someone could help me. I have started testing the UDPreceive/udpsend programs for rolling out linux onto refurbished PC's for a charity I work for. The idea is to role out linux onto 10-20 refurbished PC's (at once) before shipping them to charities/schools/colleges that really need them, mainly in Africa.
I am using the command line version because we are not transfering a raw disk image we are transferring a tar'd tree and preparing the machines for custom hardware detection (as all the hardware is basically random).
The tar file is about 1.2gig, and I am piping the output through both md5sum and tar to extract it. When I tried this on 1 machine everything was fine but today I tested it onto 5 machines in parallel and the MD5sum of the result on all 5 clients did not match the server version (they did however match each other) and the tar file appears to be slightly corrupted (though the machines did boot ok).
So my question is "is it possible that the data is corrupted during multicasting?"
Is this repeateable? I.e. if you do two _separate_ transfers of the same file to two sets of 5 machines, do the md5sum's of both transfers match?
If so, it probably is unrelated to the transfer, but may have something to do with how the tar file is made and transferred (i.e. are there other intervening steps in transferring the file, other than UDPcast)
If on the other hand, it is not repeatable (different md5sum's each time, or sometimes good transfers, and sometimes bad), it may be related to Ethernet packet corruption during the transfer. On good equipment, this should be extremely rare, however not impossible.
and if so "Is there anything I can do to make it work reliably?"
Well, if you are indeed seeing in-flight Ethernet packet corruption, there is no easy way of makeing it go away. However, on the other hand, you can make it very noticeable by chosing compresed transfer (using lzop compression). The corruption will still occur, but will be detected and lead to an aborted transfer, which in many cases is preferable over having a lurking, undetected corruption. Of course, lzop is only feasible if the corruption is not so frequent that a second (and third, ..., and fourth, ...) try will fail as well.
From our experience here (several years of transfers in half a dozen
schools), we've only observed such corruption twice. It was detected because we used compression. After repeating the transfer, the problem went away.
There seem to be quite a few error correction parameters available, which ones should I try first?
Because packet corruption is so rare, none of the options unfortunately deals with this case.
Most available options deal with ways to deal with packet _loss_ (rather than corruption) which in our experience is much more frequent.
However, should the problem be confirmed, I'll introduce a new feature to add CRC checksums to the invidual packets. Once available, this will have corrupted packets rejected, leading to the activation of the packet loss recovery algorithms. [Normally, the kernel is already supposed to protect the UDP packets with such a checksum, but possibly it fails under certain rare circumstances...]
Any help much appreciated, Matt.
Alain