On Friday 20 June 2003 16:31, Patterson, Michael wrote:
I have installed the 20030607 version of udpcast. Udpcast looks like a
great application, but I am having mixed success with it. I'm attempting
to multicast files within a 100 Mbit ethernet network to the servers of a
500+ node Beowulf. These nodes are connected by three class B subnets.
Three Cisco switches (two are 6513s and one is a 6509) route the traffic
across this network.
Udpcast is working for files of arbitrary size (I've tested up to 100
Mbytes) when multicasting to 80 or fewer nodes. For larger numbers of
nodes (say, 100) the transfer appears to procede correctly but the
receiving nodes then have difficulty disconnecting from the sending node.
The receiving nodes all eventually disconnect, but they can require several
minutes to do so. I've tried various combinations of options but none have
resolved this problem. Can anyone provide guidance here?
The command lines I am using are as follows:
New connection from 172.18.127.33 (#25) 00000019
New connection from 172.18.127.80 (#61) 00000019
Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7
avg=12407 Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7 nrRead=7
nrPart=9 avg=12407 Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7
nrRead=7 nrPart=9 avg=12407 Timeout notAnswered=[25,61] notReady=[25,61]
nrAns=7 nrRead=7 nrPart=9 avg=12407
This last line will repeat dozens to hundreds of times.
This means that for some reason, hosts 25 (172.18.127.33) and 61
(172.18.127.80) have trouble sending the final acknowledge message.
If you do the transfer several times, are these always the same
machines (i.e. 172.18.127.80 and 172.18.127.33) ?
Does such loss of ack also happen during the transmission or only at
the end (noticeable by Timeout messages during the transmission, while
transmission temporarily stops...)