Hi all,
I have been using udpcast to image machines succesfully for a while now. Now we are expanding our infrastructure with more machines and extra networks, which I would like to image with the same server.
This seems to work a little, except for the fact that it does not complete correctly on the new network(card/interface).
The receiver seems to get allmost all data, but the udp-receiver seems to 'hang' at the end. The most frustrating part is the udp-sender that says "Transfer complete, disconnecting".
When I try it in unicast and sniff the network I see the following on a _succesfull_ image to networkcard X: 16:06:59.620836 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620841 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620846 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620851 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620856 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 144 16:06:59.621146 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 8 16:06:59.621198 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 144 16:06:59.621397 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 8 16:07:00.654842 IP 192.168.16.3.9033 > 192.168.19.255.9032: UDP, length: 28 16:07:01.146772 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 4
When I do the same (failing) image to networkcard Y: 15:27:13.413078 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413084 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413089 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413095 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.481157 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 144
And then nothing. This is the last part of the imaging process, right up till it completes at networkcard X, and just hangs on networkcard Y.
It seems to me the last few packets are missing and that's causing the hang.
Anyone experienced something like this before and/or has any pointers to what might cause this? I suspect a switch or similar device to block the last packets. I.e. a rate limiting setting or something.
Kind regards, - Ramon Bastiaans.
Ramon Bastiaans wrote:
Hi all,
I have been using udpcast to image machines succesfully for a while now. Now we are expanding our infrastructure with more machines and extra networks, which I would like to image with the same server.
This seems to work a little, except for the fact that it does not complete correctly on the new network(card/interface).
What exactly is "new"? New make of network card? New switch? More than one switch between sender and receiver? Maybe even a router? Is it possibly for you to test each change "one-by-one", to try to identify which particular change brings the problem.
The receiver seems to get allmost all data, but the udp-receiver seems to 'hang' at the end. The most frustrating part is the udp-sender that says "Transfer complete, disconnecting".
When I try it in unicast and sniff the network I see the following on a _succesfull_ image to networkcard X: 16:06:59.620836 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620841 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620846 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620851 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620856 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 144 16:06:59.621146 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 8 16:06:59.621198 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 144 16:06:59.621397 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 8 16:07:00.654842 IP 192.168.16.3.9033 > 192.168.19.255.9032: UDP, length: 28 16:07:01.146772 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 4
When I do the same (failing) image to networkcard Y: 15:27:13.413078 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413084 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413089 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413095 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.481157 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 144
It looks like the "return" traffic (from the receiver back to the sender) is not working correctly. It's somewhat bizarre, because apparently it did work at the beginning of the transfer (or else the transfer could not have taken place at all, unless you used asynchronous mode)
[...]
Anyone experienced something like this before and/or has any pointers to what might cause this? I suspect a switch or similar device to block the last packets. I.e. a rate limiting setting or something.
Kind regards,
- Ramon Bastiaans.
From the IP addresses, I assume you are using unicast mode (one single receiver).
It would help if you would tell us more about your network. What devices (switches, routers, etc.) are sitting between your sender and your receiver? What netmasks are involved? Where was the tcpdump observed (on sender? on receiver? on an unrelated box? If so, how was that box connected?) Are you reasonably sure that the trace is complete? (Many switches send unicast traffic only to the port where that machine is connected, unless you use a specifically configured monitoring port. That means that if you used an unrelated box to observe, the trace might be incomplete, depending on how the switch(es) has(ve) been set up)
Alain
Alain Knaff wrote:
What exactly is "new"? New make of network card? New switch? More than one switch between sender and receiver? Maybe even a router? Is it possibly for you to test each change "one-by-one", to try to identify which particular change brings the problem.
A extra networkcard in the server and 1 switch connected to it (for testing) with 4 machines connected to it (for testing). There are no routers between them.
It looks like the "return" traffic (from the receiver back to the sender) is not working correctly. It's somewhat bizarre, because apparently it did work at the beginning of the transfer (or else the transfer could not have taken place at all, unless you used asynchronous mode)
Indeed weird, only at the end the problem seems to arise. I am not running in asyncrhonous mode.
[...]
From the IP addresses, I assume you are using unicast mode (one single receiver).
It would help if you would tell us more about your network. What devices (switches, routers, etc.) are sitting between your sender and your receiver? What netmasks are involved? Where was the tcpdump observed (on sender? on receiver? on an unrelated box? If so, how was that box connected?) Are you reasonably sure that the trace is complete? (Many switches send unicast traffic only to the port where that machine is connected, unless you use a specifically configured monitoring port. That means that if you used an unrelated box to observe, the trace might be incomplete, depending on how the switch(es) has(ve) been set up)
Alain
Yes, I tried this particular case in unicast mode, to eliminate any possible multicast issues that might arise. I would figure a unicast setup should work, even if the switch has problems with multicasting.
The network on networkcard X is a /22 network (255.255.252.0), the network on networkcard Y is a /21 network (255.255.248.0).
The tcpdump was done on the sending side/image server, so I should be able to see the return packets in the tcpdump (I see return packets during transferring earlier on).
The setup pretty much looks like this right now:
server networkcard Y ->| switch |-> machine 1 |-> machine 2 |-> machine 3 |-> machine 4
Kind regards, - Ramon.
Hi,
I could be wrong, but the problem may be that you just have to wait.
Is the hard drive on the target machine still active (LED on) when it apparently hangs? If so it might still be working fine.
In our environment, the last part of the write is an empty part of the disk, which is highly compressed. Our 512 MB RAM notebooks can hold a lot of compressed and zeroed data in memory, and it works out to about 5 GB of data to be written to disk. For us, it is normal for there to be a delay of between 5 and 10 minutes between the disconnect and the completion of writing to disk. If one used a slower disk on the target, and had lots of memory, this could even be more extreme difference between disconnect time and completing the write to disk time.
If network card X and Y are on different target client machines, then it might have nothing to do with the network cards.
If the same machine is involved with these 2 network cards, then it might be a hardware or driver issue.
--Donald Teed
On Tue, 3 Jan 2006, Ramon Bastiaans wrote:
Hi all,
I have been using udpcast to image machines succesfully for a while now. Now we are expanding our infrastructure with more machines and extra networks, which I would like to image with the same server.
This seems to work a little, except for the fact that it does not complete correctly on the new network(card/interface).
The receiver seems to get allmost all data, but the udp-receiver seems to 'hang' at the end. The most frustrating part is the udp-sender that says "Transfer complete, disconnecting".
When I try it in unicast and sniff the network I see the following on a _succesfull_ image to networkcard X: 16:06:59.620836 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620841 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620846 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620851 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620856 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 144 16:06:59.621146 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 8 16:06:59.621198 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 144 16:06:59.621397 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 8 16:07:00.654842 IP 192.168.16.3.9033 > 192.168.19.255.9032: UDP, length: 28 16:07:01.146772 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 4
When I do the same (failing) image to networkcard Y: 15:27:13.413078 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413084 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413089 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413095 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.481157 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 144
And then nothing. This is the last part of the imaging process, right up till it completes at networkcard X, and just hangs on networkcard Y.
It seems to me the last few packets are missing and that's causing the hang.
Anyone experienced something like this before and/or has any pointers to what might cause this? I suspect a switch or similar device to block the last packets. I.e. a rate limiting setting or something.
Kind regards,
- Ramon Bastiaans.
-- There are really only three types of people:
Those who make things happen, those who watch things happen, and those who say, "What happened?"
ing. R. Bastiaans HPC - Systems Programmer
SARA - Computing and Networking Services Kruislaan 415 PO Box 194613 1098 SJ Amsterdam 1090 GP Amsterdam
Udpcast mailing list Udpcast@udpcast.linux.lu https://lll.lgl.lu/mailman/listinfo/udpcast
I figured out what caused it.
After replacing the switch and some more tcpdumping, I saw that one of the last packets sent by the server was to the broadcast adress of the network. However this particular broadcast address seemed weird to me.
It turns out, that networkcard Y had a wrong broadcast address set to it. Because of this, the receiver missed those last packets.
So it was a matter of misconfiguration on the server. Weird however that short/small transfers did work, and big transfers did not.
Anyway, it's solved now.
Cheers, - Ramon.
D Teed wrote:
Hi,
I could be wrong, but the problem may be that you just have to wait.
Is the hard drive on the target machine still active (LED on) when it apparently hangs? If so it might still be working fine.
In our environment, the last part of the write is an empty part of the disk, which is highly compressed. Our 512 MB RAM notebooks can hold a lot of compressed and zeroed data in memory, and it works out to about 5 GB of data to be written to disk. For us, it is normal for there to be a delay of between 5 and 10 minutes between the disconnect and the completion of writing to disk. If one used a slower disk on the target, and had lots of memory, this could even be more extreme difference between disconnect time and completing the write to disk time.
If network card X and Y are on different target client machines, then it might have nothing to do with the network cards.
If the same machine is involved with these 2 network cards, then it might be a hardware or driver issue.
--Donald Teed
On Tue, 3 Jan 2006, Ramon Bastiaans wrote:
Hi all,
I have been using udpcast to image machines succesfully for a while now. Now we are expanding our infrastructure with more machines and extra networks, which I would like to image with the same server.
This seems to work a little, except for the fact that it does not complete correctly on the new network(card/interface).
The receiver seems to get allmost all data, but the udp-receiver seems to 'hang' at the end. The most frustrating part is the udp-sender that says "Transfer complete, disconnecting".
When I try it in unicast and sniff the network I see the following on a _succesfull_ image to networkcard X: 16:06:59.620836 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620841 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620846 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620851 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 1472 16:06:59.620856 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 144 16:06:59.621146 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 8 16:06:59.621198 IP 192.168.16.3.9033 > 192.168.17.140.9032: UDP, length: 144 16:06:59.621397 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 8 16:07:00.654842 IP 192.168.16.3.9033 > 192.168.19.255.9032: UDP, length: 28 16:07:01.146772 IP 192.168.17.140.9032 > 192.168.16.3.9033: UDP, length: 4
When I do the same (failing) image to networkcard Y: 15:27:13.413078 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413084 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413089 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.413095 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 1472 15:27:13.481157 IP 192.168.144.2.9047 > 192.168.144.200.9046: UDP, length: 144
And then nothing. This is the last part of the imaging process, right up till it completes at networkcard X, and just hangs on networkcard Y.
It seems to me the last few packets are missing and that's causing the hang.
Anyone experienced something like this before and/or has any pointers to what might cause this? I suspect a switch or similar device to block the last packets. I.e. a rate limiting setting or something.
Kind regards,
- Ramon Bastiaans.
-- There are really only three types of people:
Those who make things happen, those who watch things happen, and those who say, "What happened?"
ing. R. Bastiaans HPC - Systems Programmer
SARA - Computing and Networking Services Kruislaan 415 PO Box 194613 1098 SJ Amsterdam 1090 GP Amsterdam
Udpcast mailing list Udpcast@udpcast.linux.lu https://lll.lgl.lu/mailman/listinfo/udpcast