Excessive memory usage when the number of trees is large (e.g. 500 trees) #31

ogrisel · 2018-11-01T14:20:26Z

As investigated in #30, when the number of trees is large, htop reports VIRT memory that exceeds the physical memory of the host and causes computer hangs that slows down computation.

What is weird is that RES memory seems to stay constant.

ogrisel · 2018-11-01T14:28:30Z

This problem can be reproduces even on very small training sets (e.g. 1e4 samples) but with n_trees > 500.

NicolasHug · 2018-11-01T14:41:35Z

This should hopefully be fixed by #21 right? At least partially

ogrisel · 2018-11-01T14:51:21Z

This should hopefully be fixed by #21 right? At least partially

Memory use is still much higher with pygbm. Maybe we could try to reuse the grower instance over and over again with some kind of a grower.reset(gradients, hessians) method.

NicolasHug · 2018-11-01T15:04:50Z

with some kind of a .reset(gradients, hessians) method.

that's exactly what LGBM is doing BTW. I think it would make sense yes, 500 trees = 500 array allocations of size n_samples so it takes a toll.

But that means somehow the old grower instances aren't properly garbage collected?

ogrisel · 2018-11-01T16:16:30Z

But that means somehow the old grower instances aren't properly garbage collected?

This is what I suspect. Maybe numba is keeping some internal references on jitclass instances?

ogrisel · 2018-11-01T16:23:38Z

I tried with toy examples outside of pygbm and everything looks fine (no leak).

NicolasHug · 2018-11-03T02:10:39Z

So that would suggest that the problem doesn't come from some garbage collector issue?

I tried a quick hack

   # reset grower
   def reset(self, all_gradients, all_hessians, shrinkage=1.):
       self.splitting_context.reset(all_gradients, all_hessians)
       self.shrinkage = shrinkage
       del self.splittable_nodes  # just to make sure
       del self.finalized_leaves
       self.splittable_nodes = []
       self.finalized_leaves = []
       self.total_find_split_time = 0.  # time spent finding the best splits
       self.total_apply_split_time = 0.  # time spent splitting nodes
       self._intilialize_root()
       self.n_nodes = 1

    # reset split info
    def reset(self, all_gradients, all_hessians):
        self.all_gradients[:] = all_gradients
        self.all_hessians[:] = all_hessians
        self.sum_gradients = self.all_gradients.sum()
        self.sum_hessians = self.all_hessians.sum()
        self.ordered_gradients[:] = all_gradients.copy()
        self.ordered_hessians[:] = all_hessians.copy()

but I'm still observing the huge VIRT usage (about 12GB while I only have 8) on the benchmark from #30 after ~100 trees.

Note that this doesn't reuse the space allocated for the histograms so maybe it comes from there, but I doubt it. For a given node the histograms take up 4 * 28 * 255 * 3 bytes (int32, 28 features, 255 bins, 3 entries) and after 100 trees with 255 leaf nodes (i.e. 254 non-leaf nodes for which we have histograms) this is only 4 * 28 * 255 * 3 * 100 * 254 ~ 2GB.

ogrisel · 2018-11-03T07:51:00Z

https://mg.pov.lt/objgraph/ might be helpful at finding the cause of the memory growth.

ogrisel · 2018-11-03T10:07:44Z

I tried a bit with objgraph and the only new Python-level object between 2 iteration of gradient boosting is a TreePredictor instance. I have also patched the code to not create those instances and I still observe the large memory increase...

NicolasHug · 2018-11-03T16:35:40Z

Yeah but that makes sense right? I am 90% sure that if everything were correctly garbage collected, we wouldn't have any leak. And the TreePredictor objects are super small anyway. So if we assume that CPython's GC is working, that means there's something going on with the numba interaction.

putting this inside the main loop of GradientBoostingMachine

            print(objgraph.at(id(grower)))
            print(objgraph.at(id(grower.splitting_context)))

will give me

[0/5]
<pygbm.grower.TreeGrower object at 0x7fb0b2f40160>
None
[1/5] 20 leaf nodes, max depth 6 in 8.425s
<pygbm.grower.TreeGrower object at 0x7fb0b268ad68>
None
[2/5] 20 leaf nodes, max depth 12 in 4.230s
<pygbm.grower.TreeGrower object at 0x7fb0b26a6208>
None
[3/5] 20 leaf nodes, max depth 8 in 2.832s
<pygbm.grower.TreeGrower object at 0x7fb0b1b9fa90>
None

So this suggests that the splittingcontext isn't "tracked by the GC" at least according to objgraph, even though I'm not sure what it technically means.

objgraph.at(addr)

    Return an object at a given memory address.

    The reverse of id(obj):

    >>> at(id(obj)) is obj
    True

    Note that this function does not work on objects that are not tracked by the GC (e.g. ints or strings).

ogrisel · 2018-11-03T20:19:03Z

Interesting. This is really weird. We should try to come up with a minimal reproduction case by pruning the code of pygbm while reproducing the memory "leak" until it's no longer specific to pygbm concepts and reduced to basic numpy + numba primitives.

NicolasHug · 2018-11-03T22:42:40Z

Fun stuff incoming.

I patched GradientBoostingMachine.fit() so that it returns a list of the memory usage at each iteration:

        l = []
        p = psutil.Process()
        while True:  # <-- main training loop
            use = p.memory_info().rss
            print(f"{use / 1e6} MB")
            l.append(use)
            should_stop = self._stopping_criterion(
            ....

        return self, l

and then I plot this list for various n_samples sizes.

from sklearn.datasets import make_classification
from pygbm import GradientBoostingMachine
import matplotlib.pyplot as plt

sizes = [1e1, 1e2, 1e3, 1e4, 1e5]
n_trees = 1000
n_leaf_nodes = 20

fig, ax = plt.subplots(1)

for n_samples in sizes:
    n_samples = int(n_samples)

    X, y = make_classification(n_samples=n_samples, random_state=0)

    est = GradientBoostingMachine(learning_rate=1, max_iter=n_trees,
                                  max_leaf_nodes=n_leaf_nodes,
                                  random_state=0, scoring=None,
                                  verbose=1, validation_split=None)

    _, l = est.fit(X, y)
    ax.plot([x / 1e6 for x in l], '-', label=f'n_samples={n_samples}')

ax.set_ylabel('MB used')
ax.set_xlabel('iteration')
plt.legend()

plt.show()

(changes are on my leak branch)

A few observations:

the GradientBoostingMachine instances aren't even GCed between the iterations, even if I force a del est
there's no leak for 10 samples (it's not clear from the plot but if you remove the other n_samples curves you'll see it)
the leak is pretty constant between each iteration, and then there's a clear 'elbow' which is kind of funny. The leak is less strong after the elbow, but still there.
the leak doesn't seem to depend on the training set size: all the slopes are approximately equal. So that would suggest that the arrays of size n_samples e.g. all_gradients in SplittingContext are properly GCed.
the curve for 1e6 samples is much more chaotic than the others.
there's some fun "wiggling" at the beggining of each curve, before the elbow.

NicolasHug · 2018-11-03T23:07:25Z

Oh and this is the plot for 1e7 samples

ogrisel · 2018-11-04T09:35:35Z

the GradientBoostingMachine instances aren't even GCed between the iterations, even if I force a del est

This could be cause by a cyclic reference. You can break cyclic references by forcing a import gc; gc.collect().

Note that I had already tried to add this inside the main loop of GradientBoostingMachine.fit to see if it would fix the original problem but it did not change the memory usage (just slow things down).

ogrisel · 2018-11-04T09:39:03Z

the leak is pretty constant between each iteration, and then there's a clear 'elbow' which is kind of funny. The leak is less strong after the elbow, but still there.

You should try with a smaller learning rate, e.g. 0.01. I noticed that with a large learning rate and many trees, the boosting algorithms can degenerate into producing small decision stumps (only 2 leaves). Maybe the additive gradient descent process has converged: gain would be too small to produce any gain improvement with larger trees. I have not checked.

ogrisel · 2018-11-05T08:57:56Z

The only datastructure that could cause this growth are the histograms stored on each grower nodes:

Assuming 500 approximately balanced trees (with 255 nodes with stored histograms), that would result in 22 GB if they are not collected:

>>> 500 * 255 * 28 * 256 * 8 * 3 / 1e9
21.93408

ogrisel · 2018-11-05T12:24:38Z

I think I found the culprit: it seems to be the histogram attribute on the SplitInfo jitclass. I am reworking the code to get rid of it. It also makes the code simpler.

NicolasHug · 2018-11-05T12:51:00Z

* 8

Shouldn't this be * 4? The histograms entry are all 32 bits.

But yeah that would make sense, the leak in my plots is approximately 1.7MB per iteration which is close the histogram size 20 * 28 * 256 * 4 * 3 / 1e6 = 1.72032.

Can't wait for the fix!! :)

ogrisel · 2018-11-05T13:10:27Z

Yes you are right: 4 bytes instead of 8.

I had a preliminary fix but forgot to commit it...

ogrisel · 2018-11-05T13:53:47Z

@NicolasHug you can have a look at #36.

ogrisel changed the title ~~Excessive memory issues when the number of trees is large (e.g. 500 trees)~~ Excessive memory usage when the number of trees is large (e.g. 500 trees) Nov 1, 2018

ogrisel added bug Something isn't working perf Computational performance issue or improvement labels Nov 1, 2018

ogrisel mentioned this issue Nov 5, 2018

[WIP] Do not store histogram on SplitInfo #36

Merged

ogrisel closed this as completed Nov 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive memory usage when the number of trees is large (e.g. 500 trees) #31

Excessive memory usage when the number of trees is large (e.g. 500 trees) #31

ogrisel commented Nov 1, 2018

ogrisel commented Nov 1, 2018

NicolasHug commented Nov 1, 2018

ogrisel commented Nov 1, 2018 •

edited

Loading

NicolasHug commented Nov 1, 2018

ogrisel commented Nov 1, 2018

ogrisel commented Nov 1, 2018

NicolasHug commented Nov 3, 2018

ogrisel commented Nov 3, 2018 •

edited

Loading

ogrisel commented Nov 3, 2018

NicolasHug commented Nov 3, 2018

ogrisel commented Nov 3, 2018

NicolasHug commented Nov 3, 2018 •

edited

Loading

NicolasHug commented Nov 3, 2018

ogrisel commented Nov 4, 2018

ogrisel commented Nov 4, 2018 •

edited

Loading

ogrisel commented Nov 5, 2018 •

edited

Loading

ogrisel commented Nov 5, 2018

NicolasHug commented Nov 5, 2018

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018

Excessive memory usage when the number of trees is large (e.g. 500 trees) #31

Excessive memory usage when the number of trees is large (e.g. 500 trees) #31

Comments

ogrisel commented Nov 1, 2018

ogrisel commented Nov 1, 2018

NicolasHug commented Nov 1, 2018

ogrisel commented Nov 1, 2018 • edited Loading

NicolasHug commented Nov 1, 2018

ogrisel commented Nov 1, 2018

ogrisel commented Nov 1, 2018

NicolasHug commented Nov 3, 2018

ogrisel commented Nov 3, 2018 • edited Loading

ogrisel commented Nov 3, 2018

NicolasHug commented Nov 3, 2018

ogrisel commented Nov 3, 2018

NicolasHug commented Nov 3, 2018 • edited Loading

NicolasHug commented Nov 3, 2018

ogrisel commented Nov 4, 2018

ogrisel commented Nov 4, 2018 • edited Loading

ogrisel commented Nov 5, 2018 • edited Loading

ogrisel commented Nov 5, 2018

NicolasHug commented Nov 5, 2018

ogrisel commented Nov 5, 2018

ogrisel commented Nov 5, 2018

ogrisel commented Nov 1, 2018 •

edited

Loading

ogrisel commented Nov 3, 2018 •

edited

Loading

NicolasHug commented Nov 3, 2018 •

edited

Loading

ogrisel commented Nov 4, 2018 •

edited

Loading

ogrisel commented Nov 5, 2018 •

edited

Loading