Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory usage when the number of trees is large (e.g. 500 trees) #31

Closed
ogrisel opened this issue Nov 1, 2018 · 20 comments
Closed
Labels
bug Something isn't working perf Computational performance issue or improvement

Comments

@ogrisel
Copy link
Owner

ogrisel commented Nov 1, 2018

As investigated in #30, when the number of trees is large, htop reports VIRT memory that exceeds the physical memory of the host and causes computer hangs that slows down computation.

What is weird is that RES memory seems to stay constant.

@ogrisel ogrisel changed the title Excessive memory issues when the number of trees is large (e.g. 500 trees) Excessive memory usage when the number of trees is large (e.g. 500 trees) Nov 1, 2018
@ogrisel
Copy link
Owner Author

ogrisel commented Nov 1, 2018

This problem can be reproduces even on very small training sets (e.g. 1e4 samples) but with n_trees > 500.

@ogrisel ogrisel added bug Something isn't working perf Computational performance issue or improvement labels Nov 1, 2018
@NicolasHug
Copy link
Collaborator

This should hopefully be fixed by #21 right? At least partially

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 1, 2018

This should hopefully be fixed by #21 right? At least partially

Memory use is still much higher with pygbm. Maybe we could try to reuse the grower instance over and over again with some kind of a grower.reset(gradients, hessians) method.

@NicolasHug
Copy link
Collaborator

with some kind of a .reset(gradients, hessians) method.

that's exactly what LGBM is doing BTW. I think it would make sense yes, 500 trees = 500 array allocations of size n_samples so it takes a toll.

But that means somehow the old grower instances aren't properly garbage collected?

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 1, 2018

But that means somehow the old grower instances aren't properly garbage collected?

This is what I suspect. Maybe numba is keeping some internal references on jitclass instances?

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 1, 2018

I tried with toy examples outside of pygbm and everything looks fine (no leak).

@NicolasHug
Copy link
Collaborator

So that would suggest that the problem doesn't come from some garbage collector issue?

I tried a quick hack

   # reset grower
   def reset(self, all_gradients, all_hessians, shrinkage=1.):
       self.splitting_context.reset(all_gradients, all_hessians)
       self.shrinkage = shrinkage
       del self.splittable_nodes  # just to make sure
       del self.finalized_leaves
       self.splittable_nodes = []
       self.finalized_leaves = []
       self.total_find_split_time = 0.  # time spent finding the best splits
       self.total_apply_split_time = 0.  # time spent splitting nodes
       self._intilialize_root()
       self.n_nodes = 1
    # reset split info
    def reset(self, all_gradients, all_hessians):
        self.all_gradients[:] = all_gradients
        self.all_hessians[:] = all_hessians
        self.sum_gradients = self.all_gradients.sum()
        self.sum_hessians = self.all_hessians.sum()
        self.ordered_gradients[:] = all_gradients.copy()
        self.ordered_hessians[:] = all_hessians.copy()

but I'm still observing the huge VIRT usage (about 12GB while I only have 8) on the benchmark from #30 after ~100 trees.

Note that this doesn't reuse the space allocated for the histograms so maybe it comes from there, but I doubt it. For a given node the histograms take up 4 * 28 * 255 * 3 bytes (int32, 28 features, 255 bins, 3 entries) and after 100 trees with 255 leaf nodes (i.e. 254 non-leaf nodes for which we have histograms) this is only 4 * 28 * 255 * 3 * 100 * 254 ~ 2GB.

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 3, 2018

https://mg.pov.lt/objgraph/ might be helpful at finding the cause of the memory growth.

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 3, 2018

I tried a bit with objgraph and the only new Python-level object between 2 iteration of gradient boosting is a TreePredictor instance. I have also patched the code to not create those instances and I still observe the large memory increase...

@NicolasHug
Copy link
Collaborator

Yeah but that makes sense right? I am 90% sure that if everything were correctly garbage collected, we wouldn't have any leak. And the TreePredictor objects are super small anyway. So if we assume that CPython's GC is working, that means there's something going on with the numba interaction.

putting this inside the main loop of GradientBoostingMachine

            print(objgraph.at(id(grower)))
            print(objgraph.at(id(grower.splitting_context)))

will give me

[0/5]
<pygbm.grower.TreeGrower object at 0x7fb0b2f40160>
None
[1/5] 20 leaf nodes, max depth 6 in 8.425s
<pygbm.grower.TreeGrower object at 0x7fb0b268ad68>
None
[2/5] 20 leaf nodes, max depth 12 in 4.230s
<pygbm.grower.TreeGrower object at 0x7fb0b26a6208>
None
[3/5] 20 leaf nodes, max depth 8 in 2.832s
<pygbm.grower.TreeGrower object at 0x7fb0b1b9fa90>
None

So this suggests that the splittingcontext isn't "tracked by the GC" at least according to objgraph, even though I'm not sure what it technically means.

objgraph.at(addr)

    Return an object at a given memory address.

    The reverse of id(obj):

    >>> at(id(obj)) is obj
    True

    Note that this function does not work on objects that are not tracked by the GC (e.g. ints or strings).

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 3, 2018

Interesting. This is really weird. We should try to come up with a minimal reproduction case by pruning the code of pygbm while reproducing the memory "leak" until it's no longer specific to pygbm concepts and reduced to basic numpy + numba primitives.

@NicolasHug
Copy link
Collaborator

NicolasHug commented Nov 3, 2018

Fun stuff incoming.

I patched GradientBoostingMachine.fit() so that it returns a list of the memory usage at each iteration:

        l = []
        p = psutil.Process()
        while True:  # <-- main training loop
            use = p.memory_info().rss
            print(f"{use / 1e6} MB")
            l.append(use)
            should_stop = self._stopping_criterion(
            ....

        return self, l

and then I plot this list for various n_samples sizes.

from sklearn.datasets import make_classification
from pygbm import GradientBoostingMachine
import matplotlib.pyplot as plt

sizes = [1e1, 1e2, 1e3, 1e4, 1e5]
n_trees = 1000
n_leaf_nodes = 20

fig, ax = plt.subplots(1)

for n_samples in sizes:
    n_samples = int(n_samples)

    X, y = make_classification(n_samples=n_samples, random_state=0)

    est = GradientBoostingMachine(learning_rate=1, max_iter=n_trees,
                                  max_leaf_nodes=n_leaf_nodes,
                                  random_state=0, scoring=None,
                                  verbose=1, validation_split=None)

    _, l = est.fit(X, y)
    ax.plot([x / 1e6 for x in l], '-', label=f'n_samples={n_samples}')

ax.set_ylabel('MB used')
ax.set_xlabel('iteration')
plt.legend()

plt.show()

(changes are on my leak branch)

leak

A few observations:

  • the GradientBoostingMachine instances aren't even GCed between the iterations, even if I force a del est
  • there's no leak for 10 samples (it's not clear from the plot but if you remove the other n_samples curves you'll see it)
  • the leak is pretty constant between each iteration, and then there's a clear 'elbow' which is kind of funny. The leak is less strong after the elbow, but still there.
  • the leak doesn't seem to depend on the training set size: all the slopes are approximately equal. So that would suggest that the arrays of size n_samples e.g. all_gradients in SplittingContext are properly GCed.
  • the curve for 1e6 samples is much more chaotic than the others.
  • there's some fun "wiggling" at the beggining of each curve, before the elbow.

@NicolasHug
Copy link
Collaborator

Oh and this is the plot for 1e7 samples

leak1e7

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 4, 2018

the GradientBoostingMachine instances aren't even GCed between the iterations, even if I force a del est

This could be cause by a cyclic reference. You can break cyclic references by forcing a import gc; gc.collect().

Note that I had already tried to add this inside the main loop of GradientBoostingMachine.fit to see if it would fix the original problem but it did not change the memory usage (just slow things down).

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 4, 2018

the leak is pretty constant between each iteration, and then there's a clear 'elbow' which is kind of funny. The leak is less strong after the elbow, but still there.

You should try with a smaller learning rate, e.g. 0.01. I noticed that with a large learning rate and many trees, the boosting algorithms can degenerate into producing small decision stumps (only 2 leaves). Maybe the additive gradient descent process has converged: gain would be too small to produce any gain improvement with larger trees. I have not checked.

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

The only datastructure that could cause this growth are the histograms stored on each grower nodes:

Assuming 500 approximately balanced trees (with 255 nodes with stored histograms), that would result in 22 GB if they are not collected:

>>> 500 * 255 * 28 * 256 * 8 * 3 / 1e9
21.93408

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

I think I found the culprit: it seems to be the histogram attribute on the SplitInfo jitclass. I am reworking the code to get rid of it. It also makes the code simpler.

@NicolasHug
Copy link
Collaborator

* 8

Shouldn't this be * 4? The histograms entry are all 32 bits.

But yeah that would make sense, the leak in my plots is approximately 1.7MB per iteration which is close the histogram size 20 * 28 * 256 * 4 * 3 / 1e6 = 1.72032.

Can't wait for the fix!! :)

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

Yes you are right: 4 bytes instead of 8.

I had a preliminary fix but forgot to commit it...

@ogrisel
Copy link
Owner Author

ogrisel commented Nov 5, 2018

@NicolasHug you can have a look at #36.

@ogrisel ogrisel closed this as completed Nov 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working perf Computational performance issue or improvement
Projects
None yet
Development

No branches or pull requests

2 participants