[Udpcast] Timing problem with starting PXE boot and udp-sender

Donald Teed dteed at artistic.ca
Fri Apr 30 15:55:22 CEST 2004

We had tried starting in both ways: server/sender first
and client/receiver first.

Anyway, the ' --rexmit-hello-interval 3000 '
option did fix our problem.  We are using full-duplex
and everything is running fine with other defaults.

It was never consistantly ignoring any client, so it was hard
to know exactly why the packet wasn't heard.  What we
observe with the rexmit-hello-interval is that out of a pool of 15
clients, it can take a few "phone calls" to reach the
other end.  In the last session I witnessed today, out
of 15, only 6 initially connected as "Ready", and
then another 2, then another 3, and so on until we had
a full allotment.  Combining this with min-clients switch
worked well to automate the launch reliably.

Thanks for the quick solution...

--Donald Teed

On Thu, 29 Apr 2004, Alain Knaff wrote:

> begin  Thursday 29 April 2004 21:20, Donald Teed quote:
> > If a CONNECT can trigger the rendez-vous, then if I notice a certain
> > number of machines not connecting, I should be able to simply
> > reboot them and have them try this again.  The wierd thing was
> > that we tried that, and the same 4 machines did not rendezvous
> > while 11 were standing by ready.
> You know, you didn't either confirm nor deny that you usually start up
> the sender after the receivers.
> So let's just suppose you always start up the sender after receivers,
> except for those where first PXE fails:
> In that case, you're observed behaviour is consistent with machines
> that NEVER send out that first CONNECT after reboot. If, due to some
> construction limitations, the card is not operational within the 5
> first seconds after driver activation, the first CONNECT would ALWAYS
> fall within that window. By rebooting the machines, you'd trigger
> another driver removal and re-insertion, which again would make the
> card unavailable during a short time, and the CONNECT would again be
> dropped.
> Interesting things to test (in order to confirm or deny the
> hypothesis):
>  1. Start the sender first
>     - do now _all_ machines fail? If yes, I think that's excellent
>     confirmation that the first CONNECT after reboot never makes
>     it...)
>     - do only some of the machines fail (... and always the same after
>     a _complete_ restart of the experience). If yes, the problem seems
>     not only be dependant on card model, but on each card invidually.
>     - do only some of the machines fail, and always different ones
>     after a complete restart of the experience? If yes, we do have a
>     true mystery ;-)
>  2. Run a tcpdump on the server, and see what packets you get (port
> 9000 and 9001) from which machines.
> >  That was what led me to conclude
> > there was a window of time to rendez-vous and it had elasped.
> Nope, there is no such window.
> > However on a third session the 4 missed machines were included in
> > a new batch and did get imaged OK.
> good.
> > > > I checked the options and I don't see any that are designed to increase
> > > > how long it will wait to see more machines responding as ready.
> > >
> > > There is  the "--rexmit-hello-interval 3000" option which instructs
> > > the sender to keep on resending its HELLO packets until transmission
> > > is started. The number is the interval, in milliseconds, between to
> > > HELLO packets. This might solve the issue.
> > >
> > > udp-sender --rexmit-hello-interval 3000 --file fileimage.gz
> >
> > OK, cool, that might be useful.
> >
> > There are a few things I need to test.  I can try substituting
> > the switch involved.
> Could help. But from what I've read in the various newsgroups, this
> particular problem (card initialization) has more to do with the cards
> themselves than the switch.
> >  In general the client machines are a
> > little unpredictable since they were carried around daily by
> > University students for 2 or 3 years.
> >
> > --Donald Teed
> Alain

More information about the Udpcast mailing list