Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread-safety for union types #15085

Open
straight-shoota opened this issue Oct 15, 2024 · 7 comments
Open

Thread-safety for union types #15085

straight-shoota opened this issue Oct 15, 2024 · 7 comments
Labels

Comments

@straight-shoota
Copy link
Member

Since the introduction of the multi-threading preview (#8112), we have been aware that the representation of union types is not thread-safe.

Quoting from the introductory blog post:

In Crystal, unions between value-types and reference-types are represented as a tuple of a type id and the value itself. A union of Int32 | AClass is guaranteed to not have a nil value. But due to interleaving, the representation of a nil can appear, and a null pointer exception (a segfault in this case) will happen.
[...]
For now you will need to be aware that shared unions that can appear in class variables, instance variables, constants or closured variables are not safe (but will be).

The following example program demonstrates the issue. Parallel access from multiple threads can result in the class variable's type id being set by the first thread (to Foo), but the value is set by the second thread (the integer 0). Reading this combination results in a null pointer in a reference type that is not nilable.
Hence the program consistently runs into invalid memory access (within a couple hundred iterations or so). Note: It doesn't matter if the code is compiled with -Dpreview_mt or without. It uses the non-public Thread API for explicitness.

class Foo
  class_property foo : Int64 | Foo = 0

  INSTANCE = Foo.new

  @value = 123
end

Thread.new do
  loop do
    Foo.foo = Foo::INSTANCE
  end
end

Thread.new do
  loop do
    Foo.foo = 0
  end
end

loop do
  p Foo.foo # Segfault: Invalid memory access (signal 11) at address 0x4
end

It should be clear that crashing the process is unaceptable. The runtime must ensure sound behaviour of union types in a multi-threaded environment.

This topic as also been mentioned in MT schedulers impact (forum.crystal-lang.org). There it has been suggested that doing an atomic compare-and-swap could work for some unions, but not all (especially not for larger structs, @RX14). Another suggested approach is a spin-lock utilizing unused bits of the type id (@funny-falcon).

@yxhuvud
Copy link
Contributor

yxhuvud commented Oct 15, 2024

It should be clear that crashing the process is unaceptable. The runtime must ensure sound behaviour of union types in a multi-threaded environment.

Why is it unacceptable? This is a case of shared memory access and I'd not expect that to be multi thread safe by default, as the performance overhead would be insane. If someone wants that, why is not not reasonable that they handle that themselves?

Also note that this is not in any way unique to union types - this is just as much a problem for big structs that is not part of a union type.

That said - it can perhaps be reasonable to make certain kinds of variables thread safe by default. Not in any way limited to union types. Class properties in particular seems particularly suited to be protected against multi threaded usage due to their global nature.

@straight-shoota
Copy link
Member Author

straight-shoota commented Oct 15, 2024

This is issue is particularly relevant because it's about corrupted metadata in the type system. The type id indicating a different type than the data it describes is at a different level of severity than "just" inconsistent data.

Crystal's type system has a strong promise of safety, and this should hold up under all conditions.

A common idiom in Crystal is assigning the value of an ivar or the result of a method call to a local variable so that the type system can reason about it without outside influences (for example if foo = @foo.as?(Bar) - now foo is restricted to Bar). The very reason this exists is to ensure sound typing in a multi-threaded environment. One requirement for this to work is that reading the instance variable is a sound operation (in terms of the type system). This must be guaranteed for all types.
If we could not uphold this, we'd need to start questioning the understaning of type safety in Crystal.

This is a case of shared memory access and I'd not expect that to be multi thread safe by default, as the performance overhead would be insane. If someone wants that, why is not not reasonable that they handle that themselves?

Safety comes first. Crystal usually tries to make the default behaviour as safe as possible to avoid unexpected footguns.
More performant alternatives with less safety gurantees are usually opt-in, not opt-out.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Oct 15, 2024

@yxhuvud we're not talking about "obvious" cases like mutating an Array, I'd remember to protect accesses (usually) but updating one value? That's usually thread safe... until it's not because Crystal decided to create an union? Nasty. That could go unnoticed and lead to modify whatever in a program 😨

Performance hit should be taken with a grain of salt (and measured). Globals or even closured data may not be mutated that much that the impact would lead to terrible performance. Static analysis could detect safe cases and skip some safety checks. Heck, look at this article the impact of always checking a generational number every time it dereferences a pointer was only 10% (before static analysis and manual opt-in to skip checks) 😲

@HertzDevil
Copy link
Contributor

HertzDevil commented Oct 16, 2024

we're not talking about "obvious" cases like mutating an Array, I'd remember to protect accesses (usually) but updating one value? That's usually thread safe... until it's not because Crystal decided to create an union? Nasty. That could go unnoticed and lead to modify whatever in a program

The "usually" part is hard to justify when any sufficiently large aggregate will have the same issue; after all, an Int64 | Foo is really just an {Int32, Int64[1]} at the ABI level. IMO both circumstances are equally obvious here; restricting thread-level parallelism to spawn does not magically relieve the programmer of the responsibility to take care of concurrent accesses, whether they manifest via the type system or not.

That said, MT safety looks justifiable for every use of class_getter / class_property in the standard library, most of them due to #14905. This also goes for types that fit into a single register and aren't polymorphic, like Log.progname : String. So I wouldn't mind if class_* is MT-safe by default, as long as the same concurrent control isn't baked into the ABI level, i.e. this will still require manual synchronization by the programmer:

class Foo
  @@foo : Int64 | Foo = 0

  INSTANCE = Foo.new

  @value = 123

  Thread.new do
    loop do
      @@foo = INSTANCE
    end
  end

  Thread.new do
    loop do
      @@foo = 0
    end
  end

  loop do
    p @@foo # this is expected to break
  end
end

the intent being that if a faster alternative is available, such as grouping multiple accesses into a critical section, then the programmer should be able to express precisely that without the hidden abstraction costs.

@ysbaddaden
Copy link
Contributor

We do lots of expensive bound checks and lots more checks everywhere to avoid undefined behavior as C is plagued with, but union types, which are core to the crystal language, and whose codegen is an internal implementation detail of the compiler, are left to all the developers to take care of 😞

Getters are one thing, but what about a closured union that becomes shared across threads?

We can explore ways to make it safe, ways to opt out of safety (then the developer becomes responsible for the safety) and explore ways to skip the checks, then we can decide how to proceed.

@yxhuvud
Copy link
Contributor

yxhuvud commented Oct 16, 2024

Again - this has nothing to do with union types - this is just as problematic when accessing big structs. If they can change while reading the value then you are out of luck.

Consider the original example but changing foo property to instead be an instance of type

struct Foo
  property foo : Int64 | Foo = 0
end

Would that be type safe under the suggested scheme? I don't see how short of implementing a borrow checker.

Getters are one thing, but what about a closured union that becomes shared across threads?

The only way to reach sanity in the case of shared closures is to a: make sure it is properly initialized before accessing it from other threads and b: Don't change the value. If we can figure out a way to enforce properties like that then that could be a way to achieve safety.

edit: oh, and c: Make sure the value actually is there, in case it refer to a local variable. Unless there is something that make sure it doesn't go out of scope (like structured concurrency) there really isn't anything guaranteeing the originating scope hasn't returned.

@crysbot
Copy link

crysbot commented Oct 18, 2024

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/charting-the-route-to-multi-threading-support/7320/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

5 participants