-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize non-atomic memory allocation #14679
base: master
Are you sure you want to change the base?
Conversation
A logical follow-up to this PR would be to expose a non-clearing variant of |
end | ||
end | ||
|
||
pre_initialize_aggregate(type, struct_type, type_ptr) | ||
end | ||
|
||
def pre_initialize_aggregate(type, struct_type, ptr) | ||
memset ptr, int8(0), size_t(struct_type.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.pre_initialize
must clear the memory because it is intended for use with uninitialized memory, regardless of where that memory came from. If the change in #allocate_aggregate
is still needed then it should not call #pre_initialize_aggregate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was removed because pre_initialize_aggregate
is also used in allocate_aggregate
.
The memset
specific to the .pre_initialize
primitive was moved here:
memset ptr, int8(0), size_t(struct_type.size) |
So everything should still work the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the naming #pre_initialize_aggregate
isn't accurate anymore, because it now does less than the corresponding primitive. I suggest renaming it to something more descriptive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe this is also fine and the .pre_initialize
primitive itself shouldn't clear the memory, so the caller can save every bit of redundant memset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine, but there should be a comment that states that pre_initialize_aggregate
expects the memory to have already been set to zero, or have a clear
argument, so the intent becomes clear.
Nice speedup! Though, I'm not sure about naming. The C I'd prefer to expose something more explicit, for example just |
The Also, I don't really see why
The The main invariant of calloc (the memory is cleared) is implemented - however that may be. |
Closes #14677
Closes #14678
This PR adds two new well-known functions used in the compiler:
__crystal_calloc64
__crystal_calloc_atomic64
These functions are analogues to
__crystal_malloc64
and__crystal_malloc_atomic64
, but they guarantee that any memory allocated using them is cleared.This can be used as an optimization as crystal mustn't clear this memory as required with memory allocated using the
malloc
versions.If the
__crystal_calloc*
functions cannot be found, the old behaviour is used.Additionally, two new
GC
methodscalloc
andcalloc_atomic
have been added with the same behaviour.The description of the
GC
methodmalloc
(which clears memory in bdwgc and doesn't clear memory with no GC) has been updated to reflech that it does not always clear any memory. Unless the underlying GC is changed, this is not a breaking change.In the case of bdwgc, only non-atomic memory allocations got faster.
Code:
Results:
As can be seen from these results, large memory allocations profit a lot while small memory allocations only see a small improvement.
Also, it may be interesting to see how often LLVM can remove the memset completely.
More advanced benchmarks must still be done.