On Thu, 23 Sep 2004, Ramon Bastiaans wrote:
I was wondering if anyone knows or could tell what the bottleneck is for udpcast multicast speeds?
[...]
Is it the udp protocol, or the multicast technique, or could it still be a hardware issue?
Any opinions on the subject are appreciated, perhaps some of the authors of udpcast could give some insight?
Disclaimer: I'm not an author of udpcast, but I have experience with multicasting large amounts of data in clusters. Furthermore, I wrote a reliable multicast protocol many years ago and more recently a tool similar to udpcast, which works technically different (Dolly [1]).
There are a number of possible bottlenecks in such a scenario: First, there are the trivial bottlenecks like disk speeds and network throughput. With Gigabit Ethernet the network will almost certainly not be the bottleneck. Second, there are the more complex bottlenecks like CPU, memory and PCI bus, or complexity.
Personlly I think that when using IP multicast (as udpcast does), the complexity of the whole protocol might be a limiting factor, because a single sender has to coordinate so many receivers. However, I don't have any data to substantiate this claim. The problem is that the sender has to send the data at the speed of the slowest receiver. The slowest receiver is not necessarily known in advance and it might also change during the transmission. Adapting the speed correctly is not an easy task.
Thus, for our own cloning tool Dolly -- I'm sorry for the shameless plug on this mailinglist -- we use TCP to transfer large data files (like whole partitions or disks) to many nodes in a cluster. Since TCP works only between a single sender and a single receiver, it can much better adapt to the maximal transmission throughput as well as to changing conditions. To link all the participating nodes together, we simply form a virtual ring with TPC connections. The data is then sent around this link concurrently. It sounds to be against intuition, but works remarkably well (and in fact better than any IP-multicast-based approach I have heard of so far).
For example, in a cluster of 16 nodes with 1-GHz PentiumII processors interconnected by Gigabit Ethernet, we could get up to approximately 60 MByte/s throughput with Dolly (for benchmark reasons and to eliminate the trivial bottleneck without actually accessing the disks). With udpcast we got about 45 MByte/s (also without accessing the disks) after tweaking with the parameters (sometimes udpcast simply stopped transmission).
Please note that I'm not saying udpcast is bad. It just has different application areas. Udpcast is much better (or even the only solution) if the network is not switched, asymmetric or even unidirectional. For a tightly interconnected, switched high-speed network in a cluster, Dolly achieves usually better throughput. Therefore, Dolly is by the way used as cloning tool for the Xibalba 128-node cluster at ETH Zurich [2].
In short, to find the bottleneck in such a scenario is more complex than it might seem at first. If you are interested, you will find some research papers at [3].
- Felix
[1] http://www.cs.inf.ethz.ch/CoPs/patagonia/#dolly [2] http://www.xibalba.inf.ethz.ch/ [3] http://www.cs.inf.ethz.ch/CoPs/patagonia/#relmat