Hello,
I have installed the 20030607 version of udpcast. Udpcast looks like a great application, but I am having mixed success with it. I'm attempting to multicast files within a 100 Mbit ethernet network to the servers of a 500+ node Beowulf. These nodes are connected by three class B subnets. Three Cisco switches (two are 6513s and one is a 6509) route the traffic across this network.
Udpcast is working for files of arbitrary size (I've tested up to 100 Mbytes) when multicasting to 80 or fewer nodes. For larger numbers of nodes (say, 100) the transfer appears to procede correctly but the receiving nodes then have difficulty disconnecting from the sending node. The receiving nodes all eventually disconnect, but they can require several minutes to do so. I've tried various combinations of options but none have resolved this problem. Can anyone provide guidance here?
The command lines I am using are as follows:
sender:
udp-sender --portbase ${PORT} --min-clients ${NODES} --nokbd -b 1024 --max-bitrate 10m --ttl 1 --full-duplex --mcast-addr ${IP_CAST} --file ${SOURCE_FILE} --log /tmp/udp.log --autostart ${P}
receiver:
udp-receiver --portbase ${PORT} --nokbd --ttl 1 --file ${DEST_FILE} &> /dev/null
A sample output from the sender is listed below. Thanks, -Mike
...........................................................
MC=x 0xbffff950 0xbffff970 Udp-sender 2003-06-11 UDP sender for /tmp/x.mat at 172.18.124.7 on eth0 Broadcasting control to 172.18.127.255 New connection from 172.18.127.99 (#0) 00000019 New connection from 172.18.127.96 (#1) 00000019 New connection from 172.18.127.92 (#2) 00000019 New connection from 172.18.127.95 (#3) 00000019 New connection from 172.18.127.100 (#4) 00000019 New connection from 172.18.127.90 (#5) 00000019 New connection from 172.18.127.97 (#6) 00000019 New connection from 172.18.127.93 (#7) 00000019 New connection from 172.18.127.89 (#8) 00000019 New connection from 172.18.127.87 (#9) 00000019 New connection from 172.18.127.88 (#10) 00000019 New connection from 172.18.127.94 (#11) 00000019 New connection from 172.18.127.64 (#12) 00000019 New connection from 172.18.127.67 (#13) 00000019 New connection from 172.18.127.91 (#14) 00000019 New connection from 172.18.127.70 (#15) 00000019 New connection from 172.18.127.65 (#16) 00000019 New connection from 172.18.127.56 (#17) 00000019 New connection from 172.18.127.45 (#18) 00000019 New connection from 172.18.127.51 (#19) 00000019 New connection from 172.18.127.59 (#20) 00000019 New connection from 172.18.127.12 (#21) 00000019 New connection from 172.18.127.39 (#22) 00000019 New connection from 172.18.127.50 (#23) 00000019 New connection from 172.18.127.13 (#24) 00000019 New connection from 172.18.127.33 (#25) 00000019 New connection from 172.18.127.74 (#26) 00000019 New connection from 172.18.127.63 (#27) 00000019 New connection from 172.18.127.85 (#28) 00000019 New connection from 172.18.127.72 (#29) 00000019 New connection from 172.18.127.38 (#30) 00000019 New connection from 172.18.127.36 (#31) 00000019 New connection from 172.18.127.98 (#32) 00000019 New connection from 172.18.127.52 (#33) 00000019 New connection from 172.18.127.77 (#34) 00000019 New connection from 172.18.127.66 (#35) 00000019 New connection from 172.18.127.48 (#36) 00000019 New connection from 172.18.127.73 (#37) 00000019 New connection from 172.18.127.58 (#38) 00000019 New connection from 172.18.127.41 (#39) 00000019 New connection from 172.18.127.34 (#40) 00000019 New connection from 172.18.127.40 (#41) 00000019 New connection from 172.18.127.27 (#42) 00000019 New connection from 172.18.127.46 (#43) 00000019 New connection from 172.18.127.102 (#44) 00000019 New connection from 172.18.127.18 (#45) 00000019 New connection from 172.18.127.78 (#46) 00000019 New connection from 172.18.127.37 (#47) 00000019 New connection from 172.18.127.28 (#48) 00000019 New connection from 172.18.127.55 (#49) 00000019 New connection from 172.18.127.22 (#50) 00000019 New connection from 172.18.127.44 (#51) 00000019 New connection from 172.18.127.17 (#52) 00000019 New connection from 172.18.127.57 (#53) 00000019 New connection from 172.18.127.20 (#54) 00000019 New connection from 172.18.127.47 (#55) 00000019 New connection from 172.18.127.68 (#56) 00000019 New connection from 172.18.127.54 (#57) 00000019 New connection from 172.18.127.62 (#58) 00000019 New connection from 172.18.127.42 (#59) 00000019 New connection from 172.18.127.26 (#60) 00000019 New connection from 172.18.127.80 (#61) 00000019 New connection from 172.18.127.71 (#62) 00000019 New connection from 172.18.127.83 (#63) 00000019 New connection from 172.18.127.25 (#64) 00000019 New connection from 172.18.127.81 (#65) 00000019 New connection from 172.18.127.35 (#66) 00000019 New connection from 172.18.127.29 (#67) 00000019 New connection from 172.18.127.23 (#68) 00000019 New connection from 172.18.127.43 (#69) 00000019 New connection from 172.18.127.49 (#70) 00000019 New connection from 172.18.127.21 (#71) 00000019 New connection from 172.18.127.19 (#72) 00000019 New connection from 172.18.127.15 (#73) 00000019 New connection from 172.18.127.53 (#74) 00000019 New connection from 172.18.127.86 (#75) 00000019 New connection from 172.18.127.30 (#76) 00000019 New connection from 172.18.127.60 (#77) 00000019 New connection from 172.18.127.32 (#78) 00000019 New connection from 172.18.127.24 (#79) 00000019 New connection from 172.18.127.82 (#80) 00000019 New connection from 172.18.127.31 (#81) 00000019 New connection from 172.18.127.79 (#82) 00000019 New connection from 172.18.127.75 (#83) 00000019 New connection from 172.18.127.14 (#84) 00000019 New connection from 172.18.127.84 (#85) 00000019 New connection from 172.18.127.16 (#86) 00000019 Starting transfer: 00000019 bytes= 8 000 184 re-xmits=000000 ( 0.0%) slice=0112 8 000 184 - 60 00000 ( 0.0%) slice=0112 8 000 184 - 80 Disconnecting #8 (172.18.127.89) Disconnecting #64 (172.18.127.25) Disconnecting #21 (172.18.127.12) Disconnecting #18 (172.18.127.45) Disconnecting #4 (172.18.127.100) Disconnecting #22 (172.18.127.39) Disconnecting #85 (172.18.127.84) Disconnecting #33 (172.18.127.52) Disconnecting #36 (172.18.127.48) Disconnecting #66 (172.18.127.35) Disconnecting #47 (172.18.127.37) Disconnecting #58 (172.18.127.62) Disconnecting #39 (172.18.127.41) Disconnecting #73 (172.18.127.15) Disconnecting #59 (172.18.127.42) Disconnecting #31 (172.18.127.36) Disconnecting #76 (172.18.127.30) Disconnecting #70 (172.18.127.49) Disconnecting #48 (172.18.127.28) Disconnecting #51 (172.18.127.44) Disconnecting #1 (172.18.127.96) Disconnecting #50 (172.18.127.22) Disconnecting #0 (172.18.127.99) Disconnecting #17 (172.18.127.56) Disconnecting #13 (172.18.127.67) Disconnecting #12 (172.18.127.64) Disconnecting #55 (172.18.127.47) Disconnecting #23 (172.18.127.50) Disconnecting #11 (172.18.127.94) Disconnecting #24 (172.18.127.13) Disconnecting #77 (172.18.127.60) Disconnecting #20 (172.18.127.59) Disconnecting #29 (172.18.127.72) Disconnecting #6 (172.18.127.97) Disconnecting #71 (172.18.127.21) Disconnecting #40 (172.18.127.34) Disconnecting #32 (172.18.127.98) Disconnecting #46 (172.18.127.78) Disconnecting #7 (172.18.127.93) Disconnecting #2 (172.18.127.92) Disconnecting #72 (172.18.127.19) Disconnecting #53 (172.18.127.57) Disconnecting #41 (172.18.127.40) Disconnecting #30 (172.18.127.38) Disconnecting #3 (172.18.127.95) Disconnecting #49 (172.18.127.55) Disconnecting #79 (172.18.127.24) Disconnecting #27 (172.18.127.63) Disconnecting #5 (172.18.127.90) Disconnecting #42 (172.18.127.27) Disconnecting #54 (172.18.127.20) Disconnecting #65 (172.18.127.81) Disconnecting #9 (172.18.127.87) Disconnecting #56 (172.18.127.68) Disconnecting #52 (172.18.127.17) Disconnecting #14 (172.18.127.91) Disconnecting #78 (172.18.127.32) Disconnecting #81 (172.18.127.31) Disconnecting #62 (172.18.127.71) Disconnecting #28 (172.18.127.85) Disconnecting #15 (172.18.127.70) Disconnecting #67 (172.18.127.29) Disconnecting #74 (172.18.127.53) Disconnecting #45 (172.18.127.18) Disconnecting #10 (172.18.127.88) Disconnecting #68 (172.18.127.23) Disconnecting #43 (172.18.127.46) Disconnecting #86 (172.18.127.16) Disconnecting #69 (172.18.127.43) Disconnecting #63 (172.18.127.83) Disconnecting #16 (172.18.127.65) Disconnecting #44 (172.18.127.102) Disconnecting #34 (172.18.127.77) Disconnecting #60 (172.18.127.26) Disconnecting #35 (172.18.127.66) Disconnecting #38 (172.18.127.58) Disconnecting #82 (172.18.127.79) Disconnecting #84 (172.18.127.14) Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7 nrRead=7 nrPart=9 avg=12407 Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7 nrRead=7 nrPart=9 avg=12407 Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7 nrRead=7 nrPart=9 avg=12407 Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7 nrRead=7 nrPart=9 avg=12407
This last line will repeat dozens to hundreds of times.
On Friday 20 June 2003 16:31, Patterson, Michael wrote:
Hello,
I have installed the 20030607 version of udpcast. Udpcast looks like a great application, but I am having mixed success with it. I'm attempting to multicast files within a 100 Mbit ethernet network to the servers of a 500+ node Beowulf. These nodes are connected by three class B subnets. Three Cisco switches (two are 6513s and one is a 6509) route the traffic across this network.
Udpcast is working for files of arbitrary size (I've tested up to 100 Mbytes) when multicasting to 80 or fewer nodes. For larger numbers of nodes (say, 100) the transfer appears to procede correctly but the receiving nodes then have difficulty disconnecting from the sending node. The receiving nodes all eventually disconnect, but they can require several minutes to do so. I've tried various combinations of options but none have resolved this problem. Can anyone provide guidance here?
The command lines I am using are as follows:
[...]
New connection from 172.18.127.33 (#25) 00000019
[...]
New connection from 172.18.127.80 (#61) 00000019
[...]
Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7 nrRead=7 nrPart=9 avg=12407 Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7 nrRead=7 nrPart=9 avg=12407 Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7 nrRead=7 nrPart=9 avg=12407 Timeout notAnswered=[25,61] notReady=[25,61] nrAns=7 nrRead=7 nrPart=9 avg=12407
This last line will repeat dozens to hundreds of times.
This means that for some reason, hosts 25 (172.18.127.33) and 61 (172.18.127.80) have trouble sending the final acknowledge message.
If you do the transfer several times, are these always the same machines (i.e. 172.18.127.80 and 172.18.127.33) ?
Does such loss of ack also happen during the transmission or only at the end (noticeable by Timeout messages during the transmission, while transmission temporarily stops...)
Alain