Alain Knaff wrote:
support). However, in some two weeks time I'll be
more available to
check out what is going on.
Thanks for your reply. I will continue to investigate (such as with the
ideas you describe below), and hopefully by the time you are able to
attack it, I'll have enough diagnostic info to point to the problem.
The strange thing is, we do use udpcast for
duplicating entire disks,
most of which are larger than 50GB by now, and we never did notice any
I had assumed this was the case, which is why I found the corruption so
surprising!
(At the risk of sounding too critical, I was also surprised that udpcast
doesn't do an end to end checksum or similar, it make me think of the
oft-referenced 1981 paper:
http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf )
One suggestion (careful: this may take some time, and
needs *huge*
amounts of diskspace): try running udpcast under strace (strace -fo
log.send udp-sender ... and strace -fo log.recv udp-receiver ...), and
I am not familiar with strace, but I will get familiar with it.
I might combine this with a different idea I had: add a few lines of
code to compare the file position with # of bytes udp-receiver thinks it
wrong, and if they don't match, die. If I do this, it seems like the
end of the strace would be at more or less exactly where the problem
occurred.
Do you have any feel for how much disk space I might need, to strace
udp-receiver on a file of 50 GB?
Another weird thing is that although the problem
happens relatively
"early" in the file, it only occurs for certain minimum file sizes...
Yes, this is very weird. I will hopefully find a way to run the whole
test in a loop - I have a couple of machines which could pound on it
24x7 for a few days.
just as if the file was being corrupted after the fact
(say, after 10GB
have been transferred.) It might be interesting to do a cmp midway
through and see if the difference is already there "from the
This seems unlikely to be at issue, since the trouble still occurs when
I grab the output using:
udp-receiver --pipe "tee somefile" >/dev/null
Although my knowledge is incomplete, I don't think the OS will let
udp-receiver reach through the pipe and "tee" to seek around on somefile.
I also don't see how udp-receiver could possibly seek backward in to its
output, because of this:
$ grep seek *.c
statistics.c: loff_t offset = lseek64(fd, 0, SEEK_CUR);
statistics.c: off_t offset = lseek(fd, 0, SEEK_CUR);
... offhand I can't think of a way to move around in to a file without
seek()ing.
And, do several runs with the same input file always
produce the error
at the exact same spot?
I will test this carefully, and report back. I think the answer is no,
since in some test runs I udpcasted (do you mind your to three
receivers, and each ends up with a different file length.
--
Kyle Cordes
http://kylecordes.com