We use udpcast in an environment somewhat based on systemimager but highly modified.
We have run in to a very sad situation where a particular model of switch decides it doesn't want to pass the multicast data packets and none of them get through. It is an extremely hard problem to duplicate because it seems to work 99% of the time.
The end result when we hit the problem is the udp-receiver "waits forever".
So I've begun to consider setting some additional timeouts in the system to try to work around this switch problem.
I investigated receive-timeout and start-timeout. Since I'm never able to reproduce the problem "on demand", I emulated the problem by using iptables to block the data stream packets.
I found that receive-timeout isn't in play in a situation where not one data channel packet has been sent. However, start-timeout is in play.
There seem to be a couple select()-like situations that use the start-timeout. Prior to the "Connected as" message in udp-receiver, if you hit start-timeout there, udp-receiver will exit with an exit code that can be captured by a script for a re-try. However, the selectWithConsole() call in dispatchMessage(), while returning a 0 on select timeout, the caller (netReceiverMain()) doesn't test the return value of the timed out transfer. It turns out selectWithConsole is where I hit the timeout for the "no multicast data packets transferred" problem.
After digging further, I realized this is likely because there are threads involved and this makes it more complicated to handle status.
I'm rusty with my C but I came up with a work around to get us going. I am not suggesting that this is a good solution, but it does solve it for me and could be used as an illustration of my problem. Maybe some of you experts can quickly come up with the correct solution.
Basically, I exit with 100 if zero bytes were transferred. Then I can test that easily in our scripts and re-try if needed in some sort of loop.
I realize the correct solution is to exit with an error code if the selectWithConsole() call in dispatchMessage() times out, but it looked hard to deal with when combined with my C-programming rust.
Other suggestions welcome. Attached is my "patch" for illustration reasons and not a suggested "fix" for the problem.
Erik