Found a way to effectively implement udpcast with compression, and how to get around the "pipeline full" problem! Here's what we did, and the results!
On most of our hard drives, they're around 1/3 to 2/3 full, so we thought it was a waste of time to send all those empty blocks across the wire when multicasting, so I seriously looked into compression, opting for a fast compressor rather than an efficient one, basically so that when the sender hits the empty blocks full of zeroes, they'll get compressed and send very little data across the wire. I tried gzip, but it wasn't fast enough, and ended up using lzop for compression. In order for compression to be worthwhile, the sender's cpu power and the compression speed in combination have to be fast enough so that it can compress nearly as fast as the hard drive can read it. The receiver could probably have a somewhat slower cpu, as decompression takes less cpu crunching.
For testing sake, we plugged two P4 1.8GHZ machines w/ fast 80GB drives (7.2k rpm, ultra ata-100 seagates) into a 100Mbps switch. We piped the transmission through lzop, using the fastest compression setting possible. the problem we ran into was that when the sender would start hitting the unused blocks, compressing them and sending them on the wire, the receiver couldn't write them fast enough to the hard drive! This resulted in the "pipeline full" message, and then the connection ended up timing out. We ended up changing the sender timeout in the source code from the default (.2 seconds) to 60 seconds. Probably overkill, but thanks to a well written program (thanks Alain!) the sender just keeps trying until the receiver asks for more after it's caught up and the receiver has more room in its buffer. We wanted to make sure no matter what that none of the clients were dropped, thus the excessive timeout of 60 seconds. The "pipeline full" msg still pops up many times, but the clients aren't dropped, and the task finishes successfully!
So, here's the commandline we used on sender and receivers piping through compression:
udp-sender -p "lzop -1" -f /dev/hda udp-receiver -p "lzop -d" -f /dev/hda
----
and here are the results of copying an 80GB HDD to an 80GB HDD(approximately 28GB of actual data on the drive) between two P4 1.8GHZ machines across a 100Mbps switch(compressed versus uncompressed):
93 minutes w/ compression (approx. 50GB sent across the wire) - avg. 860 MBytes/min
114 minutes w/ out compr. (80GB sent across wire, of course;^) - avg. 700 MBytes/min
----
same hardware setup, but just copying a 4GB partition w/ approx 1.2GB of data on it:
3 min, 30 seconds w/ compression - avg. 1142 MBytes/min!
6 min, 7 seconds w/ out compression - avg. 660 MBytes/min
When piping through compression the Mbps measurement that shows onscreen on the receivers end is misleading because it's only showing compressed data that it's received, not what it's writing to the hard drive. This is especially noticable when the sender hits unused blocks as the Mbps drops drastically, but large sections of the hard drive are being copied per second! So it isn't an accurate measurement of performance when using compression, that's why I went with overall time measurements and then pulled some per minute averages from the numbers.
Some observations we made from the above tests. Obviously the more zeroes on the drive, the faster it goes. We suspect that with 50GB sent across the wire in the case of the compressed test for the 80GB drive, there was a fair amount of data in unused blocks from previously deleted files on the hard drive, otherwise it would have been much smaller(and much faster!). With the 1.8GHZ P4s the compression was slightly slower than the hard drive was reading used data blocks(probably around 80%), so speaking only of speed, compression wouldn't be worthwhile if the hard drive was completely full with the P4s used. But in most situations where hard drives have some empty space, it is a great improvement in overall speed! A nice side effect is less traffic on the LAN. Also, I would assume that with a P4 or equivalent in excess speeds of 2.2GHZ the compression should be able to completely keep up with the read data transfer rate of the hard drive. During these tests, it was noticed that during the transfer of unused blocks, on the receiving end the hard drive frequently couldn't write fast enough and so the buffer would fill up, making the sender wait before sending more data.
In summary: using computers with sufficient processing power in combination with fast compression on a 100Mbps network performs very well when multicasting a typical hard drive! Our image master can only copy drives at approx 500MB/min, and that's with the added hassle of shuffling the drives in and out of the computers with trays, etc. The udpcast method is faster, and the hard drives can stay in the computers! A very useful program, our thanks to the programmer!
Regards,
Daniel Petersen
PS. udpcast is working beautifully when transferring directly from hard drive to hard drive, but we're having some trouble when trying to dump to a file that will be larger than 2GB. We get the message "File size limit exceeded" when it hits the 2GB mark on the receivers, despite the fact that we're putting the file on a redhat box w/ an ext3 file system, which handles large files just fine. We think it might be udpcast bailing out as a fail safe feature. Alain, or those who might be "in the know", have you ever seen this and/or know how to solve the problem?
____________________________________________________________ Get 25MB of email storage with Lycos Mail Plus! Sign up today -- http://www.mail.lycos.com/brandPage.shtml?pageId=plus