Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage and performance optimizations #7

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

MichaelEischer
Copy link

The first commit of this series solves that problem, that long RCS histories of large files (nearly 30k revisions resulting in a 4 MB file) requires tremendous amount of memory (200GB RAM were not enough...). The solution is to keep only a hash digest for revisions which will no longer be used for diffing. This way commit coalescing is still possible by using the hash but requires a lot less memory.

The next three changes avoid some unnecessary string and array copies.

This is complemented by applying the diff using a linear scan to avoid lots of small array allocations. This change might be problematic as it introduces the new assumption that a diff always contains incrementing line numbers.

This has the potential to drastically reduce the memory usage for large
files with many revisions.

The text of a commit is no longer needed once it's child commit has been
processed. The memory usage optimization does not work for branches as
these can't be processed reasonably by rcs-fast-export anyways.
replace will already copy the array contents on its own
flatten will just ignore those empty arrays
This avoids the creation of intermediate array and speeds up the whole
conversion by approx. 30%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant