Hi there,
After a few email on the mailing list, I found the bug.
In fact, when FEC blocks are done they are never removed from the
freeBlocks leading at the end of a shortage of freeBlocks causing the
deadlock.
So IFAIK, the recent releases are affected when activating the FEC.
As the releasing code was almost existing, I've been making it generic.
I don't know if it's the cleaner way to solve it but at least, that solved
my deadlock issue.
diff --git a/receivedata.c b/receivedata.c
index 3f213d5..692260d 100644
--- a/receivedata.c
+++ b/receivedata.c
@@ -135,6 +135,8 @@ struct clientState {
#endif
};
+static void freeFecBlocks(struct clientState *clst, slice_t slice);
+
static void printMissedBlockMap(struct clientState *clst, slice_t slice)
{
int i, first=1;
@@ -460,7 +462,11 @@ static void cleanupSlices(struct clientState *clst,
unsigned int doneState)
clst->slices[pos].sliceNo, pos, &clst->slices[pos]);
#endif
pc_produce(clst->free_slices_pc, 1);
-
+
+ if (doneState == SLICE_FEC_DONE) {
+ freeFecBlocks(clst,slice);
+ }
+
/* if at end, exit this thread */
if(!bytes) {
clst->endReached = 2;
@@ -586,6 +592,23 @@ static void fec_decode_one_stripe(struct clientState
*clst,
}
+static void freeFecBlocks(struct clientState *clst, slice_t slice) {
+ int stripes = slice->fec_stripes;
+ struct fec_desc *fec_descs = slice->fec_descs;
+ int stripe;
+ for(stripe=0; stripe<stripes; stripe++) {
+ int i;
+ assert(slice->missing_data_blocks[stripe] >=
+ slice->fec_blocks[stripe]);
+ for(i=0; i<slice->fec_blocks[stripe]; i++) {
+ if (fec_descs[stripe+i*stripes].adr != NULL) {
+ freeBlockSpace(clst,fec_descs[stripe+i*stripes].adr);
+ fec_descs[stripe+i*stripes].adr=0;
+ }
+ }
+ }
+}
+
static THREAD_RETURN fecMain(void *args0)
{
struct clientState *clst = (struct clientState *) args0;
@@ -627,15 +650,7 @@ static THREAD_RETURN fecMain(void *args0)
}
slice->state = SLICE_FEC_DONE;
- for(stripe=0; stripe<stripes; stripe++) {
- int i;
- assert(slice->missing_data_blocks[stripe] >=
- slice->fec_blocks[stripe]);
- for(i=0; i<slice->fec_blocks[stripe]; i++) {
- freeBlockSpace(clst,fec_descs[stripe+i*stripes].adr);
- fec_descs[stripe+i*stripes].adr=0;
- }
- }
+ freeFecBlocks(clst,slice);
} else if(slice->state == SLICE_DONE) {
slice->state = SLICE_FEC_DONE;
}
--
1.7.2.5
Dear All,
I'm been spotting a serious issue we do have in production.
We are running in async mode with a fixed bandwidth and a fec 8x8
If the storage device gets lower than the network bandwith then after a
while udpcast locks itself.
I've been enabling the debug and discovered that the free blocks is loosing
1 byte very often (surely links to the FEC).
Once the free blocks is set to 1 the process is totally locks.
Aside this bug, I found that udpcast sends more the same data several time
while it got sent once. This is maybe related.
Any thoughts on this ?
Thanks,
17:01:26.073493 free blocks: got 4096 bytes
17:01:37.179820 free blocks: got 4095 bytes
17:01:37.180049 free blocks: got 4094 bytes
17:01:37.180210 free blocks: got 4093 bytes
17:01:37.180357 free blocks: got 4092 bytes
17:01:37.180502 free blocks: got 4091 bytes
17:01:37.180687 free blocks: got 4090 bytes
17:01:37.180834 free blocks: got 4089 bytes
17:01:37.180980 free blocks: got 4088 bytes
[...]
17:02:04.364385 free blocks: got 12 bytes
17:02:04.364595 free blocks: got 11 bytes
17:02:04.364794 free blocks: got 10 bytes
17:02:04.365114 free blocks: got 9 bytes
17:02:04.365310 free blocks: got 8 bytes
17:02:04.365544 free blocks: got 7 bytes
17:02:04.365739 free blocks: got 6 bytes
17:02:04.365979 free blocks: got 5 bytes
17:02:04.366247 free blocks: got 4 bytes
17:02:04.366472 free blocks: got 3 bytes
17:02:04.366687 free blocks: got 2 bytes
17:02:04.572361 free blocks: got 1 bytes
Hi all,
On a very weak system, while doing lots of IO, udp-receiver ends by
crashing with :
udp-receiver: receivedata.c:324: fakeSliceComplete: Assertion `slice !=
((void *)0)' failed.
Any ideas of what can generate such assertion ?
That's pretty easy to reproduce here.
Cheers,