-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share empty Text values #493
Conversation
This needs a regression test: please generate empty texts by various means and check unsafe pointer equality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of these checks could be moved to before the main loop of their respective functions.
src/Data/Text/Internal.hs
Outdated
@@ -252,6 +252,6 @@ pack xs = runST $ do | |||
go (off + d) cs | |||
len <- go 0 xs | |||
arr <- A.unsafeFreeze marr | |||
return (Text arr 0 len) | |||
return (text arr 0 len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to add a case pack [] = empty
, skipping the A.new
altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8a60cdf
Thanks for the comments! I haven't had a lot of time to work on this recently, but I'll get back to it soon, and push sone commits to address your points |
I've pushed some changes. There's still a bit to do to optimise things and update docs/etc. I've added a I've added a bunch of regression tests in their own file but do let me know if you'd like me to move them somewhere else. |
Also do you have any tips for getting stable benchmark results. I'm aware of compiling with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's some great work!
Any opinions about where these should live? I'm thinking either
Data.Text.Unsafe
orData.Text.Internal
.
I would say Data.Text.Internal
.
I don't have an answer regarding the benchmarking issue, but yeah I also find that the results tend to be quite volatile.
Pass |
@TeofilC we are heading to a release, so it's high time to wrap up once you get a spare moment. |
Thanks for the heads up @Bodigrim . I'll try to dedicate some time today |
I've pushed another round of changes. I tried to put the non-empty variants into |
Done in 22675d7 |
Can you do the global Rank2Types to RankNTypes renaming in a separate PR? |
Sure that makes sense. Split it out here: #502 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to go
text.cabal
Outdated
@@ -170,6 +170,7 @@ library | |||
Data.Text.Internal.Read | |||
Data.Text.Internal.Search | |||
Data.Text.Internal.StrictBuilder | |||
Data.Text.Internal.Transformation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stop pollution of the global namespace. Please make this internal module only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in c3fe10f
Is there any performance change according to benchmarks? |
I'm going to revert 51dd99c. |
My last comment got it wrong there was something slowing the lazy filter down but it wasnt that. The correct thing to do is f51caa3. We need to be careful to not check for emptiness twice, but we also need to be careful not to check for emptiness inside the inner loop. With this change the core before and after this branch is basically identical. I'm rerunning the benchmark now, but my impression is that that was the only regression, and it's now fixed. And additionally toLower/toUpper improved significantly (strange) |
To be honest, I'm not quite sold on this approach. It is a big amount of changes, and still the result is fairly fragile and requires a commitment to maintain the invisible invariant in every future change. If it is really important to have only one heap object, representing an empty text, then the systematic way to achieve it is to redefine data Text = Empty | NEText Array Int Int together with a pattern synonym for backwards compatibility. I suggest we do not rush with this change in the current release cycle. @TeofilC could you elaborate a bit more on your use case? What's the reason that your application ends up with a multitude of empty texts? |
I'm sympathetic to that. I I initially thought this would be a relatively small change but it has really grown into something quite large. And as you say this is all a bit fragile and hard to enforce. I think having an explicit empty constructor would be quite good and that would create symmetry with the Lazy case. Though I thought GHC was much better at optimising single constructor data types, so maybe it would be quite expensive. To be honest, I think my use case could be satisfied without this change. We have a lot of empty text because we parse a lot of data that has that. We could avoid this by improving our parsing code. Investigating that made me think about the general case and I thought other people would benefit from the memory savings if this was optimised at the library level. So I won't be disappointed if this doesn't work out. In any case it sounds reasonable not to rush this |
FWIW I think this library already had attempts at enforcing an empty text invariant to avoid leaking backing arrays. What this MR does is enforce that more strictly and improve the benefits by sharing heap objects as well. In my mind, this is only a "best effort" invariant. It's something we try to enforce but can't guarantee. I think that makes it a bit less onerous but gets much of the benefit. I'd also be happy to split up this MR if that helps |
Here's the benchmark results I promised. My system is still not as quiet as I'd like. benchmark results
|
Thanks, @TeofilC, let's return to this in a couple of weeks, once we are done with the upcoming release. |
@TeofilC Thinking more about it, I think I'm fine with this PR as long as we get performance right. It seems I cannot reproduce your results with GHC 9.6.1, e. g., case conversions are either equal or slower:
Could you please double-check? I'm running |
Thanks for taking another look. I'm hoping to come back to this soon |
Thanks for your patience folks. I've finally gotten around to looking at this again. It turns out that the performance regressions of toLower etc were simply caused by differences in eta expansion with respect to HEAD. Applying the same eta-expansion to these functions fixed the regressions. I've also rebased the branch and squashed the commits. Let me know if you'd like anything changed on this branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookg good, sorry that it took me so long to review.
@TeofilC could you please rebase one last time?
Try to use the same heap object to represent all empty Text values. There are already attempts to enforce something like this through the `text` smart constructor, and in various functions by special casing the empty case. This patch expands on these attempts and adds some tests to ensure that empty Text values produced by this library are represented by the same heap object. Despite these efforts, we cannot guarantee that this will be the case in all situations and users of the library shouldn't rely on this behaviour for the correctness of their code. Resolves haskell#492.
No worries about the delay at all, thanks for taking a look @Bodigrim ! It should be rebased now |
(I assume that @Lysxia is happy with this patch, given the earlier approval and no material changes since then) |
Thanks @TeofilC! |
Sadly this PR broke |
This MR ensures that empty Text values are shared.
This required:
null empty
and try to use null more.Resolves #492