Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initializer support method for object shape optimization #44

Conversation

nevinera
Copy link
Contributor

@nevinera nevinera commented Dec 21, 2023

Ruby 3.2 introduced a performance optimization that relies on observing and tracking "object shape" - see here for an explanation. Memery is already better than every other memoization pattern anyone actually uses, since there's only one dynamically created instance variable for all of the memoized values, but people worry about the potential impact of heavy memoization on their applications (and I do memoize all the things).

So here's an extra method setup_memery_cache! that can optionally be called in an initializer and just sets the instance variable up - really people could just call clear_memery_cache! and have the same effect, but that seems less clear than having something actually intended for use in this way.

Probably suffices to fix #43, though I'll let @jrochkind make that call?

Usage

class MySuperPerformanceCriticalClass
  include Memery

  def initialize
    setup_memery_cache!
  end

  memoize def calculate
    do_other_stuff
  end
end

This method isn't strictly necessary even for its _own goal_, since
calling `clear_memery_cache!` in the initializer would work just as
well. But a well-named method will be less confusing.

If there were a way to do this _automatically_ it would be better, but I
couldn't come up with anything that wasn't extremely dirty, and every
trick I thought of had edges one could run into when I thought further.
Better to be safe, and leave it opt-in and simple.
And give explanation and links about object-shape optimization
@AlexWayfer
Copy link
Contributor

Hello!

I have a fork alt_memery, and I'm interested in these details, but with a bit research I've found no significant difference with setup, maybe even in worth direction.

Can you help me to discover do we (Memery gems) really need for this? Write some specific benchmarks, or improve mines one:

# frozen_string_literal: true

require 'bundler/setup'
Bundler.setup :system

require 'pry-byebug'

require 'benchmark'
require 'benchmark/ips'
require 'benchmark/memory'

require_relative 'lib/memery'


puts '```ruby'
puts File.read(__FILE__).gsub("\t", '  ')
puts '```'
puts
puts '### Output'
puts
puts '```'


class ClassWithoutSetup
  include Memery

  memoize def foo
    42 ** 42
  end
end

class ClassWithSetup
  include Memery

  def initialize
    setup_memery_cache!
  end

  memoize def foo
    42 ** 42
  end
end


def without_setup
  ClassWithoutSetup.new.foo
end

def with_setup
  ClassWithSetup.new.foo
end


def test
  exit if p(pp([
    without_setup,
    with_setup
  ]).uniq.size) > 1
end

test

Benchmark.ips do |x|
  x.report('without_setup') { without_setup }
  x.report('with_setup') { with_setup }

  x.compare!
end

Benchmark.memory do |x|
  x.report('without_setup') { 100.times { without_setup } }
  x.report('with_setup') { 100.times { with_setup } }

  x.compare!
end


puts '```'

Output

[150130937545296572356771972164254457814047970568738777235893533016064,
 150130937545296572356771972164254457814047970568738777235893533016064]
1
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
Warming up --------------------------------------
       without_setup    41.625k i/100ms
          with_setup    41.508k i/100ms
Calculating -------------------------------------
       without_setup    426.943k (± 2.2%) i/s -      2.164M in   5.072283s
          with_setup    404.474k (± 4.2%) i/s -      2.034M in   5.038128s

Comparison:
       without_setup:   426943.1 i/s
          with_setup:   404474.2 i/s - same-ish: difference falls within error

Calculating -------------------------------------
       without_setup    95.200k memsize (   752.000  retained)
                         1.100k objects (     6.000  retained)
                         0.000  strings (     0.000  retained)
          with_setup    95.200k memsize (   432.000  retained)
                         1.100k objects (     4.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
       without_setup:      95200 allocated
          with_setup:      95200 allocated - same

@nevinera
Copy link
Contributor Author

nevinera commented Dec 30, 2023

@AlexWayfer - it'll take some work to set up a meaningful benchmark; the simple cases you are testing won't get any benefit, because they won't trigger the "complex object shape" cases in the JIT, as there are no other instance variables in place being first set in arbitrary orders. The problem cases come from the large numbers of object-shapes you get when a complex class can have many different instance variables first defined in arbitrary order, producing combinatorially many object shapes, and causing the JIT to classify the class as "complex shaped". Memery only barely contributes to that problem, since it adds only one instance variable - traditional @foo ||= style memoization was a much larger issue.

But I'll spend a bit of time trying to rig up test cases that can benchmark the difference - the awkward bit is that there will only be a difference in the case that Memery's one instance variable is the one that pushes a class over the boundary into "shape too complex".

This is a niche feature, and far more people think they need it than actually do.

Edit: The other reason the benchmark isn't showing a difference is because it's actually benchmarking the wrong thing - the memoized method is never an instance-variable lookup, which is what the object-shape optimization is intended to speed up (well, that and instance-variable writing). It's essentially an improvement for the inline-cache on getters and setters, because hash lookups are slow. This post has a pretty good explanation I think.

@nevinera
Copy link
Contributor Author

Well, I'm increasingly confused. I can't even get 3.3.0 to mark a class as "too complex", as described here currently (my current attempt is there), so I'm going to move this to draft until I've got something showing value here.

@nevinera nevinera marked this pull request as draft December 30, 2023 05:29
@jrochkind
Copy link

jrochkind commented Dec 30, 2023

I feel like in some conversation somewhere on the internet, I heard someone say that 3.3.0 had additional improvements that made the original problem less of a problem... but perhaps I misunderstood, that did make me frustrated by how much attention the original problem was getting! The whole thing is confusing to me, I really don't like a need for unintuitive optimizations, so it would certainly be nice if the need has gone away.

It woudl be interesting, @nevinera, if you had an easier time reproducing on 3.2.0, if you felt like spending time on it.

That memery only adds one instance variable would, in any case, make it a lot less of a problem, I guess? It could be worth pointing out in the README that memery only uses one iVar.

@nevinera
Copy link
Contributor Author

I couldn't find anything like that in the patch notes, but it wouldn't surprise me. I'll try and repro on 3.2 this evening.

I do have a readme update to that effect included already - if the actual change turns out to be unnecessary, I'll reduce this pr to just that bit :-)

@nevinera
Copy link
Contributor Author

Ah, so I reproduced the original benchmark here (from this post), and it shows a substantial difference on 3.2 and no significant difference on 3.3. It looks like there's no real value from this setup method anymore, though I'm still going to track down the change that produced that impact :-)

I'll prune this down to just an appropriate README change.

@AlexWayfer
Copy link
Contributor

Ah, so I reproduced the original benchmark here (from this post), and it shows a substantial difference on 3.2 and no significant difference on 3.3.

Sorry, but I see no Memery usage here. Could you provide more precise benchmark for the library in which you're trying contribute to?

@nevinera
Copy link
Contributor Author

nevinera commented Dec 30, 2023

Ah, so I reproduced the original benchmark here (from this post), and it shows a substantial difference on 3.2 and no significant difference on 3.3.

Sorry, but I see no Memery usage here. Could you provide more precise benchmark for the library in which you're trying contribute to?

I don't think you're following the conversation fully - that benchmark is showing that there is no substantial performance difference for getters/setters based on complex object shapes. The relevance to Memery is not that it affects Memery's memoization performance, it's that Memery performs memoization, which because of its lazily-defined nature, causes the object-shape of classes it's used in to diversify (though not wildly, since it only adds one ivar).

But since that benchmark shows that ruby 3.3 fixed the problem this PR was attempting to address, there's no point in still addressing it (not that there was a huge point in the first place). I'm surprised not to have seen more noise about the fix, given how much concern there was about the issue when it got noticed in 3.2..

This whole ticket was not about Memery's performance, it was about helping people that use Memery avoid it having an impact on other performance aspects of their classes. Read that first post I linked in the quoted bit in full for the details, it was a pretty interesting issue :-)

@nevinera
Copy link
Contributor Author

It's hard to be confident without a (very slow) compiled git-bisect, but I'm about 70% confident this PR is what fixed the problem. There is still a reason to prefer to avoid complex shapes, since they inflate the shape tree unnecessarily, but the single ivar used by memery is never an issue that will produce a truly observable impact.

@AlexWayfer
Copy link
Contributor

I don't think you're following the conversation fully - that benchmark is showing that there is no substantial performance difference for getters/setters based on complex object shapes. The relevance to Memery is not that it affects Memery's memoization performance, it's that Memery performs memoization, which because of its lazily-defined nature, causes the object-shape of classes it's used in to diversify (though not wildly, since it only adds one ivar).

Okay, I guess you're right, I'm still not following the conversation fully, sorry.

For example, I can understand that:

The relevance to Memery is not that it affects Memery's memoization performance

But not this:

it's that Memery performs memoization, which because of its lazily-defined nature, causes the object-shape of classes it's used in to diversify (though not wildly, since it only adds one ivar).

Okay, it's up to you, I just wanted to be… kind of informed, and, either there is a language barrier, or my stupidity. Sorry again. Don't want to bother you.


This whole ticket was not about Memery's performance, it was about helping people that use Memery avoid it having an impact on other performance aspects of their classes.

I believe, if it's not relevant to Memery's performance, then it's not worth to mention it in the Memery's README.

For example: JRuby can give you some advantages in parallel calculations relatively to MRI. Is it truth? Yes. Is it relevant to Memery? I don't think so.

If somebody wants to debug, investigate, be informed in such complex details — OK, they'll be. But… in a simple library? Meh.


I've re-read your message. It's not about memoization performance, but… Memery uses lazy-define memoization (instance variables), what causes… not a single class? I mean, proto-class. And you're mentioning, that single ivar doesn't affect it a much, but still want to add some documentation to README? For what? Just to duplicate blog posts?

@nevinera
Copy link
Contributor Author

nevinera commented Dec 30, 2023

The entire topic is very abstract, I'm afraid I thought you already knew about it, or I'd have explained more - sorry about that >.< There was a talk by Aaron Patterson at last rubyconf about yjit that spent quite a bit of time on this, if you have time - it's here. I'll do my best, but his explanation will give you more :-)

Okay, so - the object shape graph was added in ruby 3.2, and gave us a lot of the performance benefits we got in 3.2 and 3.3. Roughly speaking, every time an instance variable is added/defined on an instance, it marks a transition from the current object shape to a new object shape - if we start with nothing, then define @foo = 1, we'll have object shape 1; then if we define @bar = 2 we get to object shape 2, and then @baz = 3 makes object shape 3 (which I'll call foo/bar/baz). Doesn't matter what the class is, it's shape is foo/bar/baz.

Now, most of the point of that whole thing is that we know a bunch about an object when we look up its object-shape, so being able to look up the object shapes quickly (and generate them quickly) is important; if we defined a class like:

class Thing
   attr_accessor :foo, :bar, :baz
end

and then let stuff start using that class, that class can end up with a solid 15 different shapes, because other callsites might set those attributes in arbitrary order. We'll have the shape foo/bar/baz, but also bar/baz/foo, foo/baz/bar, foo/bar (baz might never get set), etc. And.. in a normal rails app, some of the central models can get like 3-400 instance variables on them easily, usually dozens of them "memoization" ivars.

So what's going on in the 3.2 benchmark there is a demonstration that, if you push things just a little, to the point that a class has a substantial number of distinct instance variables getting set, the class gets "too complex", and it affects the performance of looking up the getters and setters on the class (since it needs to do that using a hash table instead of an inline lookup array cache each time). And people got very worried about that, because it was a demonstration that they could get major performance benefits by wildly changing the ruby style they all loved, their traditional memoization idioms, etc. There were a few blog posts making the rounds telling everyone to declare all their instance variables in the initializer, things like that.

Those posts were what prompted this initial PR - it was not very valuable even on ruby 3.2, since Memery only used one ivar, but people kept bringing it up in various contexts and eventually I thought "I'll just add something for those people to use". But I think now the answer is "I need to write a newer blog post that points out that the prior ones are no longer accurate", and we can just update the README so people stop being worried.

I believe, if it's not relevant to Memery's performance, then it's not worth to mention it in the Memery's README.

I disagree, because I've had three different groups challenge the use of Memery on the basis of that one blog post about object shape already. It says something quite complicated, but the takeaway at the end was basically "care about memoizing, it can be bad for performance now", and explaining why it's not can take.. well. Scroll up :-)

@nevinera
Copy link
Contributor Author

Closing in favor of #45, which just updates the readme

@nevinera nevinera closed this Dec 30, 2023
@nevinera nevinera deleted the add-initializer-support-method-for-object-shape-optimization branch December 30, 2023 21:32
@jrochkind
Copy link

But I think now the answer is "I need to write a newer blog post that points out that the prior ones are no longer accurate",

@nevinera Please!

There was an awful lot of discussion recommending a performance implication that, if it ever mattered, doesn't seem to anymore. A bit frustrating.

I agree that because of all that, it's worth a sentence in the README here to avoid people the trouble you just went to. Also mentioning that memery uses only a single iVar, which is already great.

@nevinera
Copy link
Contributor Author

nevinera commented Dec 30, 2023

There was an awful lot of discussion recommending a performance implication that, if it ever mattered, doesn't seem to anymore.

Well - I suspect it matters slightly, in the sense that the total size of the object-shape tree (and the local size as well) affects how long the lookup takes. Apparently it's using an RB tree now, but for a real system we're probably talking about tens of millions of object shapes.. But I'd want a much more thorough set of benchmarks than what I just did before I'd call that concluded :-)

On the other hand, Memery/Memoist/etc are actually losing out on some of the performance gains here - we got a lot in 3.3/yjit from not having to do hash lookups to find instance variables on objects, but Memery uses a hash to store the memoized values.. But I've hardly ever worked with real-world application logic that had its performance problems at that layer, so eh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support (opt-in?) pattern that complies with recent ruby object shape performance optimization?
3 participants