Problem with recovery partition data

Donald Teed dteed at artistic.ca
Fri Apr 16 04:27:21 CEST 2004


I was hoping to have a bootable USB floppy and such
with something like tomsrtbt, but my first attempt
didn't work (I don't have access to the hardware
where I normally work).

We did a test that confirmed data was being changed.

udpcast received an image from a master disk.
We named that first.  Then it sent that image to
a fresh disk and that disk was used to receive
another image on the DHCP server ("second").

In summary, the first image had made one trip through
the process, and the second one had made three transfers
since the original (up, down, back up).

When cmp -l was used to compare the compressed image files,
they were mostly the same except for about 7 bytes around the beginning -
I didn't let it finish but waited for about 15 minutes
and there were no new differences reported.  So that
confirms what I suspected - that the partition table
was getting corrupted.

I'm still working on the theory that the tg3 versus
bcm5700 ethernet modules has something to do with this,
and it is a type of error specific to UDP that would not
show up in something using TCP/IP.

I'm hoping that the newer kernel will contain some bug
fixes for the tg3 module.  I'll switch to Slackware
so that I'll have more control over using kernels
straight from the man.

I see on Dell's site that even Ghost 7.5 has been a problem
with Broadcom's DOS ethernet drivers.

--Donald Teed

On Tue, 13 Apr 2004, Alain Knaff wrote:

> begin  Tuesday 13 April 2004 22:17, Donald Teed quote:
> > I read someone saying they had excellent performance with lzop, so
> > I switched to an initrd with lzop as the default at the same
> > time I picked up your latest updates on Monday.  So I guess
> > it can't be data corruption on that level.  I did another test
> > this afternoon - checked that the original did work with F11,
> > sent it, restored to the same machine and hard drive and F11
> > did not work.  This run was with the acpid stopped and KDE
> > sound server not starting on the server machine.
> >
> > Given that a dd style of imaging should be working, where do you
> > suggest I start looking for something that could be more robust?
> 
> The first thing I'd check is to do a compare (using the cmp command)
> between an original (working) hard disk, and a udpcasted disk.
> 
> In order to do this, build the two disks into a same machine one on
> the primary controller, one on the secondary, boot from a Knoppix CD,
> (or from an udpcast CD...) and run a compare:
> 
>  cmp -l /dev/hda /dev/hdc
> 
> This not only shows whether the disks are equal, but also where
> exactly the differences lie, if there are any.
> 
> It is important that between the udpcast and the compare operation, no
> boot was attempted from these disks (as it might have changed some
> bytes). Indeed, in Windows, even trivial activity such as moving some
> window changes the registry, which is enough to make the disk image
> different ;-)
> 
> You may verify, after the compare, whether F11 is broken on the
> udpcasted disk.
> 
> Maybe what's going on here is that the disk's serial number (can be
> displayed using hdparm -i /dev/hda) is stored somewhere in the
> partition table, maybe in some scrambled form. The F11 functionality
> would only works if the physical serial number, and the one in the
> partition table agree?
> 
> It might be interesting to check what happens if you play back an
> image to the disk where it originally came from:
> 
> Disk A ===> Disk B ===> Disk A
> 
> If that works, then I think it might be a serial number issue. (Maybe
> overwrite Disk A with sth else between the two transfers, or manually
> switch off the F11 functionality)
> 
> Another theory is that somehow the disk geometry as it appears to
> Linux is not the complete disk, and that one or more sectors near the
> end of the disk do not get copied, because for some reason the kernel
> (i.e. operating system binary) is mistaken about the exact size of the
> disk? If those sectors contain information that is critical for the
> F11 functionality, this might explain stuff.
> 
> It's rather obvious that this is unrelated to UDP level corruption, or
> else lzop would have complained very loudly.
> 
> It must be something that goes on locally in the machine.
> 
> Maybe also the manufacturer of the machine might have some info how
> the F11 functionality works, unless he considers that a trade secret,
> of course ;-)
> 
> Regards,
> 
> Alain
> 
> 



More information about the Udpcast mailing list