Alain Knaff wrote:
support). However, in some two weeks time I'll be more available to check out what is going on.
Thanks for your reply. I will continue to investigate (such as with the ideas you describe below), and hopefully by the time you are able to attack it, I'll have enough diagnostic info to point to the problem.
The strange thing is, we do use udpcast for duplicating entire disks, most of which are larger than 50GB by now, and we never did notice any
I had assumed this was the case, which is why I found the corruption so surprising!
(At the risk of sounding too critical, I was also surprised that udpcast doesn't do an end to end checksum or similar, it make me think of the oft-referenced 1981 paper: http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf )
One suggestion (careful: this may take some time, and needs *huge* amounts of diskspace): try running udpcast under strace (strace -fo log.send udp-sender ... and strace -fo log.recv udp-receiver ...), and
I am not familiar with strace, but I will get familiar with it.
I might combine this with a different idea I had: add a few lines of code to compare the file position with # of bytes udp-receiver thinks it wrong, and if they don't match, die. If I do this, it seems like the end of the strace would be at more or less exactly where the problem occurred.
Do you have any feel for how much disk space I might need, to strace udp-receiver on a file of 50 GB?
Another weird thing is that although the problem happens relatively "early" in the file, it only occurs for certain minimum file sizes...
Yes, this is very weird. I will hopefully find a way to run the whole test in a loop - I have a couple of machines which could pound on it 24x7 for a few days.
just as if the file was being corrupted after the fact (say, after 10GB have been transferred.) It might be interesting to do a cmp midway through and see if the difference is already there "from the
This seems unlikely to be at issue, since the trouble still occurs when I grab the output using:
udp-receiver --pipe "tee somefile" >/dev/null
Although my knowledge is incomplete, I don't think the OS will let udp-receiver reach through the pipe and "tee" to seek around on somefile.
I also don't see how udp-receiver could possibly seek backward in to its output, because of this:
$ grep seek *.c statistics.c: loff_t offset = lseek64(fd, 0, SEEK_CUR); statistics.c: off_t offset = lseek(fd, 0, SEEK_CUR);
... offhand I can't think of a way to move around in to a file without seek()ing.
And, do several runs with the same input file always produce the error at the exact same spot?
I will test this carefully, and report back. I think the answer is no, since in some test runs I udpcasted (do you mind your to three receivers, and each ends up with a different file length.