To try to track down the udpcast corrupt file problem, I ran some more tests. This time I used a ~50GB file, a sender, and only 1 receiver.
side bytes sender: 53687091200 receiver: 53686091776
In all of my large-file runs, udp-receiver comes up a bit "short", it missing some of the data, never "long".
I created a 50 GB test file with predictable text data in it, suing this ugly little program:
#include <stdio.h>
// 16 bytes per entry. // int main(void) { long long gb = 1024 * 1024 * 1024; long long m = 50 * gb; long long i; for(i = 0; i<m; i+= 16) { printf("%.15lld\n", i); } }
so that I could easily look at the files. I found that the received file ended with the same data as the sent file; in other words, the problem is *not* a matter of terminating early, or other finishing-out process.
Rather, it's much earlier. According to "cmp":
differ: byte 2098176010, line 131136001
That's a little under 2 GB of the way in to a 50 GB file.
Strangely, I ran repeated tests with 10 GB files, and didn't get any corruption.
Alain - it would warm my heart to see you ack these messages, even if you don't have a solution at hand.
Kyle Cordes wrote:
Rather, it's much earlier. According to "cmp": differ: byte 2098176010, line 131136001
The relevant part of the output file looks like the bit below. Look 4 lines in.
See my previous message for an explanation of how this file is generated - the essence is that each line (with \n) is 16 bytes long, and its contents are the offset of its first byte. In a correct file, each number is 16 bigger than the next.
000002098175936 000002098175952 000002098175968 000002098175984 000002098225152 000002098225168 000002098225184 000002098225200 000002098225216 000002098225232
2098225152 - 2098175984 - 16 = 49152
UDPcast dropped 49152 bytes at this point.
Kyle Cordes wrote:
To try to track down the udpcast corrupt file problem, I ran some more tests. This time I used a ~50GB file, a sender, and only 1 receiver.
side bytes sender: 53687091200 receiver: 53686091776
In all of my large-file runs, udp-receiver comes up a bit "short", it missing some of the data, never "long".
I created a 50 GB test file with predictable text data in it, suing this ugly little program:
#include <stdio.h>
// 16 bytes per entry. // int main(void) { long long gb = 1024 * 1024 * 1024; long long m = 50 * gb; long long i; for(i = 0; i<m; i+= 16) { printf("%.15lld\n", i); } }
so that I could easily look at the files. I found that the received file ended with the same data as the sent file; in other words, the problem is *not* a matter of terminating early, or other finishing-out process.
Rather, it's much earlier. According to "cmp":
differ: byte 2098176010, line 131136001
That's a little under 2 GB of the way in to a 50 GB file.
Strangely, I ran repeated tests with 10 GB files, and didn't get any corruption.
Alain - it would warm my heart to see you ack these messages, even if you don't have a solution at hand.
I do get your messages, but for the moment I am somewhat busy on some other project (preparing the release of mtools version 4 with Unicode support). However, in some two weeks time I'll be more available to check out what is going on.
The strange thing is, we do use udpcast for duplicating entire disks, most of which are larger than 50GB by now, and we never did notice any ill effect. A large piece of data missing in the middle would have been pretty obvious, but we've never have seen any of this so far.
So apparently, it only happens under certain circumstances... and we need to understand what exactly these circumstances which are triggering this are.
I appreciate your work on this subject, and I'm pretty confident that within a couple of more tests, you'll have identified what is going on (... making it easier for me to fix...)
One suggestion (careful: this may take some time, and needs *huge* amounts of diskspace): try running udpcast under strace (strace -fo log.send udp-sender ... and strace -fo log.recv udp-receiver ...), and try to locate the system calls around the place where the missing data occurs (strace output should have reads and writes whose parameter is your textual data. The stretch of output between the reading or writing 000002098175984 and 000002098225152 is the interesting one here...
Actually, to be precise, as udpcast reads and writes in largish chunks, you'll not see a read or write for every line. So the last read or write before the error will probably have a number less than 000002098175984, and the next write will have a number larger than 000002098225152, but you get the gist of it.
Another weird thing is that although the problem happens relatively "early" in the file, it only occurs for certain minimum file sizes... just as if the file was being corrupted after the fact (say, after 10GB have been transferred.) It might be interesting to do a cmp midway through and see if the difference is already there "from the beginning..." (for instance, you may start your cmp as soon as your receive file reached size 2GB...)
And, do several runs with the same input file always produce the error at the exact same spot?
Regards,
Alain
Alain Knaff wrote:
support). However, in some two weeks time I'll be more available to check out what is going on.
Thanks for your reply. I will continue to investigate (such as with the ideas you describe below), and hopefully by the time you are able to attack it, I'll have enough diagnostic info to point to the problem.
The strange thing is, we do use udpcast for duplicating entire disks, most of which are larger than 50GB by now, and we never did notice any
I had assumed this was the case, which is why I found the corruption so surprising!
(At the risk of sounding too critical, I was also surprised that udpcast doesn't do an end to end checksum or similar, it make me think of the oft-referenced 1981 paper: http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf )
One suggestion (careful: this may take some time, and needs *huge* amounts of diskspace): try running udpcast under strace (strace -fo log.send udp-sender ... and strace -fo log.recv udp-receiver ...), and
I am not familiar with strace, but I will get familiar with it.
I might combine this with a different idea I had: add a few lines of code to compare the file position with # of bytes udp-receiver thinks it wrong, and if they don't match, die. If I do this, it seems like the end of the strace would be at more or less exactly where the problem occurred.
Do you have any feel for how much disk space I might need, to strace udp-receiver on a file of 50 GB?
Another weird thing is that although the problem happens relatively "early" in the file, it only occurs for certain minimum file sizes...
Yes, this is very weird. I will hopefully find a way to run the whole test in a loop - I have a couple of machines which could pound on it 24x7 for a few days.
just as if the file was being corrupted after the fact (say, after 10GB have been transferred.) It might be interesting to do a cmp midway through and see if the difference is already there "from the
This seems unlikely to be at issue, since the trouble still occurs when I grab the output using:
udp-receiver --pipe "tee somefile" >/dev/null
Although my knowledge is incomplete, I don't think the OS will let udp-receiver reach through the pipe and "tee" to seek around on somefile.
I also don't see how udp-receiver could possibly seek backward in to its output, because of this:
$ grep seek *.c statistics.c: loff_t offset = lseek64(fd, 0, SEEK_CUR); statistics.c: off_t offset = lseek(fd, 0, SEEK_CUR);
... offhand I can't think of a way to move around in to a file without seek()ing.
And, do several runs with the same input file always produce the error at the exact same spot?
I will test this carefully, and report back. I think the answer is no, since in some test runs I udpcasted (do you mind your to three receivers, and each ends up with a different file length.
Alain Knaff wrote:
The strange thing is, we do use udpcast for duplicating entire disks, most of which are larger than 50GB by now, and we never did notice any ill effect. A large piece of data missing in the middle would have been pretty obvious, but we've never have seen any of this so far.
Alain,
By any chance do you typically use it like so on your large files?
udp-receiver | some-process ?
Contrary to my earlier findings, in ongoing testing I have found that if I used it like this:
udp-receiver --file foo
I sometimes get bad results; and things like this:
udp-receiver --pipe "lzop -d" --file foo
also sometimes get bad results.
but I noticed that my real scripts do this:
udp-receiver | lzop -d | pg_restore
and I tested like this:
udp-receiver | lzop -d >foo
... and I get correct results. To 5 or 6 receivers. Every night.
Also, I found that having lzop (or other common compression tool) in the loop acts as a guard against data integrity problems - if udp-receiver skip or damages data, it would fail lzop's checksums and make the whole process fail.
Thus, it looks like there is some issue that comes in to play with --file, but not when simply letting the data fall out on stdout.
I'm sitting the issue down for the moment, but later I may beat on it a little more to try to track down the specifics of the failure.
I also think that perhaps the "--pipe" and "--file" features are unnecessary; that udp-receiver would be better by being simpler, and simply assume that the user will redirect the output where they need it.
Kyle Cordes wrote:
Alain Knaff wrote:
The strange thing is, we do use udpcast for duplicating entire disks, most of which are larger than 50GB by now, and we never did notice any ill effect. A large piece of data missing in the middle would have been pretty obvious, but we've never have seen any of this so far.
Alain,
By any chance do you typically use it like so on your large files?
udp-receiver | some-process ?
Contrary to my earlier findings, in ongoing testing I have found that if I used it like this:
udp-receiver --file foo
I sometimes get bad results; and things like this:
udp-receiver --pipe "lzop -d" --file foo
also sometimes get bad results.
but I noticed that my real scripts do this:
udp-receiver | lzop -d | pg_restore
and I tested like this:
udp-receiver | lzop -d >foo
... and I get correct results. To 5 or 6 receivers. Every night.
Also, I found that having lzop (or other common compression tool) in the loop acts as a guard against data integrity problems - if udp-receiver skip or damages data, it would fail lzop's checksums and make the whole process fail.
Thus, it looks like there is some issue that comes in to play with --file, but not when simply letting the data fall out on stdout.
I'm sitting the issue down for the moment, but later I may beat on it a little more to try to track down the specifics of the failure.
I also think that perhaps the "--pipe" and "--file" features are unnecessary; that udp-receiver would be better by being simpler, and simply assume that the user will redirect the output where they need it.
I have a suspicion that there may be a bug in some versions of the Linux kernel as far as seek is concerned, that seek is not thread-safe.
Udpcast uses lseek(fd, 0, SEEK_CUR) to read the current file position for statistics printing. Theoretically, this should not be harmful, as it should have no influence of file position. But I've got the suspicion that what this really does it read the file position, do some stuff, and then _write_back_ that same position: leading to corruption if ever a read or write in a different thread happened in between (file position will be reset to just before the read).
Could you try out whether you still get the problem if you comment out the contents of the printFilePosition function in statistics.c ?
What if you replace that contents with:
static void printFilePosition(int fd) { if(fd != -1) { int fd2 = dup(fd); if(fd2 != -1) { #ifdef HAVE_LSEEK64 loff_t offset = lseek64(fd2, 0, SEEK_CUR); if(offset != -1) printLongNum(offset); #else off_t offset = lseek(fd2, 0, SEEK_CUR); if(offset != -1) fprintf(stderr, "%10d", offset); #endif close(fd2); } } }
(Trying to read the position from a _copy_ of the file descriptor)
Regards,
Alain
Alain Knaff wrote:
I have a suspicion that there may be a bug in some versions of the Linux kernel as far as seek is concerned, that seek is not thread-safe.
Udpcast uses lseek(fd, 0, SEEK_CUR) to read the current file position
I will perform the test you describe.
I've just released udpcast 20081116, which has a work-around for that llseek bug in the kernel. The bug has also been reported to the kernel mailinglist and will (hopefully) be fixed in one of the next kernels...
Moreover, udpcast now has a new --stat-period flag, which allows to specify how often statistics are refreshed (expressed in milliseconds, default is half a second)
Regards,
Alain
Alain Knaff wrote:
I've just released udpcast 20081116, which has a work-around for that llseek bug in the kernel. The bug has also been reported to the kernel mailinglist and will (hopefully) be fixed in one of the next kernels...
Moreover, udpcast now has a new --stat-period flag, which allows to specify how often statistics are refreshed (expressed in milliseconds, default is half a second)
Excellent, I'm glad to hear this is already taken care of, I'll scratch the followup test I had in mind off my list :-)
I wonder who I would need to ask, to get the updated UDPCast in to Debian/Ubuntu - the version in there now is from 2004 (!).
Kyle Cordes wrote:
Contrary to my earlier findings, in ongoing testing I have found that if I used it like this:
udp-receiver --file foo
I sometimes get bad results; and things like this:
udp-receiver --pipe "lzop -d" --file foo
also sometimes get bad results.
Is this also the case if you use the --pipe option with _both_ udp-receiver and udp-sender? It could well be that synchronization between write and lseek works correctly if both are performed by 2 different processes (which is the case when using --pipe, but you'll have to use it on both ends, because both udp-receiver and udp-sender use the same kind of code for "uncompressed" printout).
If so, I'll disable the "decompressed" file position printout for the case without pipe, as in that case it doesn't make sense anyways...
but I noticed that my real scripts do this:
udp-receiver | lzop -d | pg_restore
and I tested like this:
udp-receiver | lzop -d >foo
... and I get correct results. To 5 or 6 receivers. Every night.
Yes, in that case, udp-receiver will write to a pipe, which is not seekable, and which does is not affected by the bug.
Also, I found that having lzop (or other common compression tool) in the loop acts as a guard against data integrity problems - if udp-receiver skip or damages data, it would fail lzop's checksums and make the whole process fail.
exactly. However, if the bug still exists even with the --pipe option, it would unfortunately affect the data _before_ it is compressed by lzop (or _after_ it has been decompressed by lzop -d), so that would not help here...
Regards,
Alain