Feb 23 2009

transfering changes in files

Well I’ve recently stumbled on the fact, that rsync will transfer files when changed completely. So what if that file is really large (compared to the bandwidth) and there’s been only some few changes?
Wouldn’t we like to find out what was modified and just transmit the changes? This is the scheme I thought of, dunno maybe it’s a good idea:

Split the two files in a half and calculate the hash (as in cryptographic hash function) of those 4 parts. Skip the parts with a common hash and go on with the differing ones. Here comes the great (is it?) step: REPEAT (yeah, recursion AND modeling it as a tree. that’s good, right)

Of course we’d have to make some calculations how many splitting and hashing is going to happen, depending on available bandwidth and file size.

Maybe I will write a POC code in Ruby later, or maybe YOU will have an idea regarding this problem? Also, I’m wondering whether there’s already a better solution for this problem – I didn’t really take the time to do some research on this, so every comment is appreciated ;)


Wow…I just read the wikipedia-article on rsync and feel kinda weird. There’s already some algorithm which does exactly what I just wrote. This can’t be a coincident, I might have already read something, forgot it and hold it for my own idea.