Skip to content

A simple text file comparison utility for extremely large text files that fail under diff

License

Notifications You must be signed in to change notification settings

gtoal/bigfile-diff-compare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

bigfile-diff-compare

A simple text file comparison utility for extremely large text files that fail under diff

I was moving large numbers of files from drive to drive and due to boring problems not worth repeating here, a trivial rsync was not cutting it, so at some point I had to bite the bullet and get a file listing of the from and to directories, and compare the listings with 'diff'. Turned out that linux 'diff' was failing silently due to memory issues, and none of the alternatives that I read about at stackexchange etc were working that well either. So... I wrote my own file comparison program. The input text files are memory mapped, and the internal data is a pointer per line which is taken off the heap in two large calloc calls. It works and it's fast enough. The comparison logic isn't super clever but it's good enough for all the real-world problems I've used it for so far. I expect this to be used when there are relatively small differences in extremely large files.

Perhaps its biggest infelicity is that it will synchronise on a blank line. If that just isn't acceptable to you I think the fix would be around the call to "Match". I added a new program, 'smallcompare', later, which uses a different algorithm to produce better differences, but it is too slow for using on really large files. I just include it here in case it's useful but honestly you might as well use 'diff' for anything that smallcompare would be suitable for.

The code is a bit crude and it's sort of based on my recollection of Hamish Dewar's far more elegant "compare" program from Edinburgh University back in the 60's.

Graham Toal 202308290145

About

A simple text file comparison utility for extremely large text files that fail under diff

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages