Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to distribute THREAD_MODEL=posix builds? #326

Closed
abrown opened this issue Sep 15, 2022 · 46 comments
Closed

How to distribute THREAD_MODEL=posix builds? #326

abrown opened this issue Sep 15, 2022 · 46 comments

Comments

@abrown
Copy link
Collaborator

abrown commented Sep 15, 2022

When attempting to use wasi-sdk to compile a program that might use threads (or shared memories, etc.), the program will fail to link because the libc used has not been compiled with all of the right features (i.e., --shared-memory which relies on --features='atomics,bulk-memory'):

$ cat << EOF > test.c
#include <stdio.h>
int main() {
  printf("...");
  return 0;
}
EOF

$ /opt/wasi-sdk/wasi-sdk-16.0/bin/clang -O3 --target=wasm32-wasi \
    --sysroot=/opt/wasi-sdk/wasi-sdk-16.0/share/wasi-sysroot \
    -mthread-model posix -Xlinker --features='atomics,bulk-memory' -Xlinker --shared-memory \
    test.c -o test.wasm -v
...
 "/opt/wasi-sdk/wasi-sdk-16.0/bin/wasm-ld" -m wasm32 -L/opt/wasi-sdk/wasi-sdk-16.0/share/wasi-sysroot/lib/wasm32-wasi /opt/wasi-sdk/wasi-sdk-16.0/share/wasi-sysroot/lib/wasm32-wasi/crt1-command.o --features=atomics,bulk-memory --shared-memory /tmp/test-86de01.o -lc /opt/wasi-sdk/wasi-sdk-16.0/lib/clang/14.0.4/lib/wasi/libclang_rt.builtins-wasm32.a -o test.wasm
/opt/wasi-sdk/wasi-sdk-16.0/bin/wasm-ld: /lib64/libtinfo.so.6: no version information available (required by /opt/wasi-sdk/wasi-sdk-16.0/bin/wasm-ld)
wasm-ld: error: --shared-memory is disallowed by errno.o because it was not compiled with 'atomics' or 'bulk-memory' features.
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)

There should be some way to package up the output of a make THREAD_MODEL=posix build in the released binaries and helpfully link in the right things when, e.g., -pthreads or -mthread-model=posix are used. Any thoughts on how to move forward on this?

@abrown
Copy link
Collaborator Author

abrown commented Sep 15, 2022

cc: @mingqiusun, @penzn

@abrown abrown changed the title How to package How to distribute THREAD_MODEL=posix builds? Sep 15, 2022
@sbc100
Copy link
Member

sbc100 commented Sep 15, 2022

I would suggest that, at least initially you just build a completely separate sysroot. e.g. wasi-sysroot-mt?

In he long run we could come up with some kind of scheme where clang you select the library subdirectory within the sysroot differently, just like it does for architecture. e.g. -L<sysroot>/lib/wasm32-wasi/ would become something like -L<sysroot>/lib/wasm32-wasi/mt/. That why we could have single sysroot (and indeed a single architecture within that sysroot) that contains both a the mutlithreaded and singlethreaded libraries.

But I think two different sysroots should for a the time being.

@abrown
Copy link
Collaborator Author

abrown commented Sep 19, 2022

Here's what we discussed today: wasi-libc would get distributed with a new subdirectory mt containing a whole new sysroot, the output of make THREAD_MODEL=posix. Then, we would need to change clang to append mt to the sysroot path when it detects flags that require a reentrant build (e.g., -pthread). @sunfishcode, @sbc100, does that sound right? (And maybe it's just a restatement of what @sbc100 just said above?)

@sbc100
Copy link
Member

sbc100 commented Sep 19, 2022

I think we discussed not a separate sysroot just a new directory inside the existing sysroot (e.g. lib/was32-unknown-unknown/mt)? This will require a clang patch.

The completely separate sysroot is what we could use to try this out without first landing the clang patch.

@abrown
Copy link
Collaborator Author

abrown commented Sep 19, 2022

Oh, ok, then I was misunderstanding what was said. So the contents of mt would instead contain the threads-compatible versions of all of the lib artifacts (e.g., crt1.o, libc.a, libm.a, libwasi-emulated-*.a, etc.)?

@sbc100
Copy link
Member

sbc100 commented Sep 19, 2022

Basically yes, although crt1.o tends to be handled separately ... it might be special handling. Take a look a the current clang driver code for now it picks the lib directly along with how it picks crt1.o.

@penzn
Copy link
Collaborator

penzn commented Sep 23, 2022

I think we discussed not a separate sysroot just a new directory inside the existing sysroot (e.g. lib/was32-unknown-unknown/mt)?

It can also be treated as a 'feature' of the target, and maybe triple can become wasm32-unknown-unknown-mt or wasm32-unknown-mt. The upside of that, in my opinion, is that it better reflects that 'mt' and 'non-mt' object files are more or less incompatible (mixing them leads to link errors, in all or some cases).

@sbc100
Copy link
Member

sbc100 commented Sep 23, 2022

Sure, if you think we can shoehorn it into the tipple that would great!

@sbc100
Copy link
Member

sbc100 commented Sep 23, 2022

That might be a bit tricky tough... what happens if you build with -triple=wasm32-unknown + -pthread .. should that fail because you didn't specify a triple that supports threading?

@sbc100
Copy link
Member

sbc100 commented Sep 23, 2022

I guess when the triple gets expanded from what the user specified on the command line and the missing components are added we could inject the -mt suffiix if the -pthread flag is present.

@penzn
Copy link
Collaborator

penzn commented Sep 23, 2022

I guess when the triple gets expanded from what the user specified on the command line and the missing components are added we could inject the -mt suffiix if the -pthread flag is present.

Probably, I'll look at the driver and see what would be easier.

@alexcrichton
Copy link
Collaborator

One thing that may be worth pointing out here from a Rust-perspective is that Rust can probably only really support this if the string is a brand new target. Rust additionally has precompiled artifacts (the Rust standard library) and I don't think that infrastructurally it's really possible to change the Rust compiler and distribution infrastructure for just the wasm target to have two sysroots in the similar manner that's being proposed for C here.

Personally I have always hoped to avoid having a threaded and non-threaded target for all wasm targets but at the current point in time I think it's the best solution at least for Rust. I'm sure, though, that whatever naming scheme is used for a C target (if that's settled on) will want to be mirrored in Rust as well.

@sunfishcode
Copy link
Member

To add to what Alex said here, adding a new triple ends up having a lot of administrative overhead.

I'd like to second the above ideas of having threading-enabled artifacts to be available in the same sysroot, under the same target triple, and teaching the clang and rust drivers how to find them, so that we don't need a whole separate target.

On a tangential note, I expect I'll have a hard time remembering the name "mt" stands for; could we name the artifacts something with "thread" in the name?

@alexcrichton
Copy link
Collaborator

To clarify, though, as far as I know that's not a viable strategy for Rust (using the same target with separate sysroots). Implementing that for Rust would involve a fair amount of change to the release process, building, etc, that's all wasm-specific and doesn't have any precedent. I would be quite skeptical that unless someone is already signed up to do the work that this would never get done because it's too different from what's already there. Put another way, if a new target isn't made I don't think that Rust will, in any near-to-mid timeframe, gain support for precompiled artifacts with multithreading.

Broadly speaking I do personally care about Rust having good multithreading support for wasm, but not to the point that I can justify refactoring Rust's own distribution mechanism to accomodate this use case. I very much agree that adding new targets is a very big hammer but that's what I mean by I don't see a better solution at this time. Rust has an intermediate solution, though, that C doesn't where it's easier, at least on the nightly Rust channel, to compile the standard library from source with atomics/threads enabled.

@sunfishcode
Copy link
Member

I'm not advocating for separate sysroots. I'm advocating for a single sysroot that has muliple libc.a's, etc., at different paths within the sysroot, and having the Rust driver know how to select the ones it needs. I haven't looked into it in detail, but I'm imagining this is something that doens't require major refactoring.

@penzn
Copy link
Collaborator

penzn commented Sep 27, 2022

Thanks for clarification, @alexcrichton! In my opinion, adding a new target is not that big of a deal. I personally lean towards that, but it will ultimately depend on how easy it is to implement that.

To recap the three options so far were:

  • New target
  • A separate sysroot within the same target
  • Additional libraries within the single sysroot (and single target)

We can probably treat the middle one as the least useful, given the Rust situation. @alexcrichton would Rust be able to pick different versions of standard library based on threads being enabled or disabled?

(Edit) As an aside, the way wasm target handles features is a different from what native toolchain does, there you don't have to recompile entire libc with different flags to allow it to be used in multi-threaded builds.

@alexcrichton
Copy link
Collaborator

I'm not aware of any precedent in Rust targets to enable picking different libraries based on features. Rust is also a much different ecosystem for distributing the standard library than C, and Rust further does not have any precedent to draw on for distributing "two targets in one" which is effectively what this is. The first might be easy-ish to solve with a "oh let's just add a few if cases here and there" but the second I believe is significantly more involved since it deals with the distribution infrastructure of the Rust compiler which is quite nontrivial.

The closest precedent I can think of is the "soft float" and "hard float" targets that Rust has. I believe there's arm-unknown-linux-gnueabihf and arm-unknown-linux-gnueabi which only differ in how floats are implemented (the former with hardware support and the latter with software emulation). That is naturally two different targets though which is ideally what is trying to be avoided (which I agree would be best to avoid).

I haven't looked into it in detail, but I'm imagining this is something that doens't require major refactoring.

In general Rust is not nearly as "loose" as C compilers with compiler options and target-specific internals. Most of Rust internals are shaped along the lines of "basically all targets look like this" so the are relatively few target-specific options unless every other target can be configured with the same options (e.g. target features are naturally target-specific but every target has some set of options). AFAIK there's generally not much appetite in rustc development for significant configurations for target-specific options unless there's significant motivation to do such. Given the nascent stage of multithreading for wasm (and wasm in general to an extent) I don't predict that it would be easy to land lots of wasm-specific bits inside of rustc at this time.

@penzn
Copy link
Collaborator

penzn commented Sep 27, 2022

To me this makes sense. Even from C compiler's point of view threaded and non-threaded builds would be different targets, you can't really mix them in one link line.

This brings me expanding on the aside above, that the way wasm toolchain handles these features is a different from what native toolchain does, there you don't have to recompile libc or user objects with different flags to allow them to be used in multi-threaded builds. You only need to do it to the extent required to support multi-threading, but in wasm case linking would fail if not all of the objects have been built with thread support (as an extreme example, even the ones not accessing memory at all). I feel this is a bit too restrictive, but don't have any better idea about how to do this yet. Though from this perspective it is probably better to keep two separate targets for C as well.

@alexcrichton
Copy link
Collaborator

In terms of target features vs native toolchaisn I think it depends a bit on the feature. For example the precompiled wasi-libc doesn't use simd, but you're allowed to use simd in your own object files and have that all link together. Personally I view threads as a significant underlying change since, at least for the Rust standard library, all the internals need to change. The Mutex<T> type, for example, does not use atomics on normal wasm builds but does use atomics on threaded builds. Other examples are that memory allocation doesn't take a lock in normal builds while it does take a lock in threaded builds.

Overall I personally see that so much is different for a threaded build that it sort of justifies the existence of a separate target because the standard libraries are so significantly different internally.

@sunfishcode
Copy link
Member

I was picturing something like RUSTFLAGS="-C target-feature=threads, such that Rust code could do #[cfg(target_feature = "threads")] to let implementation code be conditional on threading support, and cargo will automatically recompile user code and libraries as needed so that everything is compiled the same way. From there, it's just a matter of having rust pick the right libc.a from the sysroot, if we give it multiple libc.a's to pick from.

If we do a new target, then it's #[cfg(target_os = "wasi-threads")] or whatever instead of #[cfg(target_feature = "threads")],

@sunfishcode
Copy link
Member

and we can make that work, but it's not clear to me what the advantage is.

@sbc100
Copy link
Member

sbc100 commented Sep 28, 2022

For option (3) above: I assume that its up to the rust driver to decide how to construct the -L path to pass to the linker, similar to how the clang driver does it. Something like -L<sysroot>/lib/<target>.

Wouldn't option (3) be mostly about adding a little complexity to the code that creates that linker flag? I imagine the complexity needed in the rust driver to handle this would be pretty much identical to that needed in the clang driver.

@alexcrichton
Copy link
Collaborator

@sunfishcode most of what you describe already works today. You can compile with -Ctarget-feature=+atomics and it sets target_feature = "atomics". That's how the standard library implements locks on wasm. You mention "it's just a matter of ..." and that's precisely what I'm trying to explain is the significantly difficult part of implementing this in Rust. Again no other target in Rust has any precedent for selecting a different precompiled libstd based on anything other than the target string. While I'm sure this is technically possible to add to Rust my prediction is that this would be a significant undertaking which has a surprising amount of complexity. To reiterate again I'm not sure if upstream rustc would even be willing to consider extensive changes for just one target (wasm).

A new target does not imply a new target_os. For example the two arm-*-gnueabi{,hf} targets I mentioned have the exact same rustc --print cfg output. A hypothetical wasm32-wasi-atomics target would look exactly like wasm32-wasi to compiled code except that it would bake in that atomics is enabled (as you mentioned). Basically I don't think adding a new target is as big of an impact on the code itself as I think you may be thinking (although it's still of course a big impact on the "conceptual complexity" as it's a whole separate target).

To reiterate again as well though, the complexity is not in building a -L flag or updating the rustc driver to pick a different sysroot. C and Rust are different in this regard where the standard rust-lang/rust repository is responsible for building and distributing sysroots of targets. AFAIK no one other than the Rust repository is doing that. This means that the complexity is not only in building -L arguments but additionally building the entire standard library and the sysroot itself. It's fine to have a bunch of custom logic in just this one wasi-libc repository but for rust-lang/rust it would be a large wad of custom logic for just wasi that's almost entirely untested that others must maintain. I don't think that upstream Rust would have much of an appetite for such a change.

In any case though I mostly wanted to bring up how avoidance of a new target string will cause significant complexities for Rust. Whether or not wasi-libc wants to take that into account is outside of my wheelhouse and there's no reason that C and Rust could have a different story. Personally I'd like them to be the same but that's outside my wheelhouse.

@paravoid
Copy link

paravoid commented Nov 18, 2022

This is somewhat tangential, but perhaps relevant to this discussion:

Debian, and by extension Ubuntu, has been shipping wasi-libc (as well as libstd-rust-dev-wasm32) for some time now. I am a Debian Developer, but not the maintainer of these packages, nor have I been involved at all in packaging these two, or any other relevant packages like LLVM/Clang. However, I have been working recently on adding compiler-rt packages for wasm32/wasm64, as well as libc++ for wasm32-wasi, all generated by the llvm-toolchain-NN source packages (which are shipped in Debian, Ubuntu, and apt.llvm.org). See Debian Salsa MR#97 and Debian Salsa MR#103). I believe that this will alleviate the need for using a sysroot on these platforms, and, more broadly, the WASI SDK as a whole. In other words, once these land, it will be possible to just run clang -target wasm32-wasi foo.c and clang++ --target=wasm32-wasi foo.cpp without needing to download anything from the web, or specify any other arguments. (As an aside: there are some aspects that I'm uncertain about, like how to name these packages, as noted in a comment on the second MR above; if there is a place to discuss these further like a realtime chat or something, I'd be happy to collaborate. Otherwise, feel free to respond there, or hit me up at @debian.org, or over IRC on libera/OFTC, if you have any thoughts or feedback!)

Given the above, I'm interested in what the future holds for WASI, including how we could handle these new pthread builds.

  • From a distribution standpoint, having entirely different sysroots as suggested at some point above, would be problematic as things are right now. Debian has made the decision (so far) to not ship a sysroot, but rather ship wasi-libc under /usr, specifically /usr/include/wasm32-wasi.
  • A separate target would probably work. But, it means we would have to replicate everything, and ship multiple variants of every package, with and without multi-threading support. So while I haven't thought this through yet, I share the sentiments expressed above about the overhead of such a solution.
  • "Different paths within the same sysroot" + "patching clang to look there" option seems to be relatively straightforward, and would probably work as-is without the need to do anything too special.

Finally, this may be a very naive question but perhaps worth asking: in a future where a (p)threaded ecosystem exists and is stable, is there value in shipping the single-threaded one? In other words, given it's early days for WASI in Debian, would it make sense for Debian to just switch the "wasi-libc" package (and libc++ etc.) to a THREAD_MODEL=posix build, and not ship the "single" builds at all?

This is all pretty new to me, so excuse my ignorance! Would love to hear your feedback and thoughts -- thanks!

@sbc100
Copy link
Member

sbc100 commented Nov 18, 2022

On the general approach of avoiding a sysroot:

One big different between the normal "multi-lib" approach of Debian/Ubuntu, and wasi-sdk, is that wasi not linux. Its more like a true cross-compile to different OS/platform. Treating it as just another multi-lib might not make sense. Is there any precedence for non-linux targets in the "multi-lib" setup? For example, is that how you package the android SDK? (actually android is linux, so that isn't a good example, but still).

One issue that might come up is that, normally in a multi-lib environment the /usr/include directory is added to the end of the include path. I think it might confuse your users if #include <sys/inotify.h> ends up include the linux version of this header when compiling for wasi. (indeed /include is currently added the end of the include path by clang when targeting wasm32).

@sbc100
Copy link
Member

sbc100 commented Nov 18, 2022

Regarding the ongoing need to support both a threaded and non-threaded version the SDK, I think that will likely continue for several years. There are performance advantages to building specifically for single-thread wasm, even when you know you target supports threads.

Regarding "shipping multiple variants of every package", surely that will be true whether you use a separate target or a sub directory with a target? Any package that contains a library file will need to have a separate package for you, right? The only unnecessary duplication that I can see is that libc headers, which in theory can be shared.

If you are really don't want to ship multiple versions of the libc headers (becaused they happen to be the same for threaded and non-threaded SDKs) we could maybe have clang's default search path contains several entries from most to least specific. e.g.

 /usr/include/wasm32-wasi-threads
 /usr/include/wasm32-wasi
 /usr/include

That way if you want to have a single set of shared headers you could install them in /usr/include/wasm32-wasi and if you want to have specific headers for the threads-enabled library you can install them in /usr/include/wasm32-wasi-threads.

However all of this assumes the header trees will be identical between flavors. This maybe true today, but I'm not sure we would want to guarantee this. As a hypothetical example, I can imagine for the non-threaded version of the SDK we may not want to include pthread.h at all... giving users a compile failure if they try to include it.

@paravoid
Copy link

(This is a bit off-topic to the current issue, but it's super helpful to me and the insight is invaluable! Let me know if I should move this somewhere else, and where!)

One big different between the normal "multi-lib" approach of Debian/Ubuntu, and wasi-sdk, is that wasi not linux. Its more like a true cross-compile to different OS/platform. Treating it as just another multi-lib might not make sense. Is there any precedence for non-linux targets in the "multi-lib" setup? For example, is that how you package the android SDK? (actually android is linux, so that isn't a good example, but still).

The decision to place wasi-libc in /usr/include/wasm32-wasi was made by the wasi-libc maintainer when it was first introduced, seemingly in ~March 2020, so I can't really speak to it. To respond directly to your question, from what I can tell from a few Android packages, they are installed in /usr/include/android, and (e.g.) /usr/lib/x86_64-linux-gnu/android/. A good WASI-equivalent example however may actually be mingw, which seems to be placed under /usr/x86_64-w64-mingw32/include, /usr/x86_64-w64-mingw32/lib etc. So you are raising a good point!

One issue that might come up is that, normally in a multi-lib environment the /usr/include directory is added to the end of the include path. I think it might confuse your users if #include <sys/inotify.h> ends up include the linux version of this header when compiling for wasi. (indeed /include is currently added the end of the include path by clang when targeting wasm32).

This is actually right on the spot! So the LLVM WebAssembly driver really assumes a --sysroot argument is present, and would include paths such as (absolute) /include/, and actually do so even on the wasm32-unknown-unknown target. So, I had to patch that driver (in a kind of a hacky way) as you can see here: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/ec6fe52cd2ee1b009266edc58d7f522ec480ce0e/debian/patches/wasm/wasm-sysroot-usr.diff. Once we settle on where things should be placed, I could work with upstream on better, upstreamable patches.

I went back and forth on whether /usr/include should be in the include path or not, and if you look closely you'll see that this was actually changed between the MR's revisions. As you (so instictively!) imagined, including it results into some unintended consequences and weird behaviors where e.g. the host's glibc may be accidentally be included. On the flip side, including it also means that a bunch of other stuff could potentially work out of the box, without having to ship a separate -dev package for every library out there (imagine things like e.g. liblzma-dev or something).

@sbc100
Copy link
Member

sbc100 commented Nov 21, 2022

Yes, I think we are getting a bit off topic here. Perhaps we should open an issue in the upstream llvm repo about if/when to include /include on the include path?

On the specific issue of how to package the pthread flavor of wasi-sdk, I think we have most agreed at this point to make it a separate triple. This would mean that if you installed both flavors they would both contain (most likely) the same headers. What do you think of the idea of having clang search
/usr/include/wasm32-wasi-threads followed by the less specific /usr/include/wasm32-wasi, then the common headers to be stored in the latter directory?

@infinity0
Copy link

infinity0 commented Nov 22, 2022

The decision to place wasi-libc in /usr/include/wasm32-wasi was made by the wasi-libc maintainer when it was first introduced

wasi-libc Debian maintainer here, I did this as it felt natural and to avoid conflict with regular /usr/include. Currently wasi-libc's own Makefile defines:

245:SYSROOT_LIB := $(SYSROOT)/lib/$(MULTIARCH_TRIPLE)
246:SYSROOT_INC = $(SYSROOT)/include
247:SYSROOT_SHARE = $(SYSROOT)/share/$(MULTIARCH_TRIPLE)

Generally by analogy with the FHS for /lib and /share I think it makes sense to look in $(SYSROOT)/include/$(MULTIARCH_TRIPLE) for cross-compilers; $(SYSROOT)/include is reserved for the host architecture.

MinGW's usage of $(SYSROOT)/$(MULTIARCH_TRIPLE)/include I would imagine is a hack around legacy build scripts and tools that don't support cross-compiling in as general of a way as LLVM does, the hack to be used by setting sysroot = $(ACTUAL_SYSROOT)/$(MULTIARCH_TRIPLE). As further evidence of this, note that the packages in Debian that use this path are nearly all -dev packages and they also use $(SYSROOT)/$(MULTIARCH_TRIPLE)/lib; but the actual mingw runtime packages use $(SYSROOT)/lib/$(MULTIARCH_TRIPLE).

If we ever have an entire OS written in wasm32-wasi, then sure the host compiler can look for wasm32-wasi include files in $(SYSROOT)/include.

@sbc100
Copy link
Member

sbc100 commented Nov 22, 2022

$(SYSROOT)/include is reserved for the host architecture.

Where does this assertion come from? As far as I cant tell this is not currently the case. When cross compiling both gcc and clang seems add $(SYSROOT)/include as well as $(SYSROOT)/include/$(MULTIARCH_TRIPLE) to the include path.

$ arm-linux-gnueabihf-g++ ~/test/hello.c -v  -c
...
ignoring nonexistent directory "/usr/local/include/arm-linux-gnueabihf"
ignoring nonexistent directory "/usr/lib/gcc-cross/arm-linux-gnueabihf/12/include-fixed"
ignoring nonexistent directory "/usr/include/arm-linux-gnueabihf"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc-cross/arm-linux-gnueabihf/12/../../../../arm-linux-gnueabihf/include/c++/12
 /usr/lib/gcc-cross/arm-linux-gnueabihf/12/../../../../arm-linux-gnueabihf/include/c++/12/arm-linux-gnueabihf/.
 /usr/lib/gcc-cross/arm-linux-gnueabihf/12/../../../../arm-linux-gnueabihf/include/c++/12/backward
 /usr/lib/gcc-cross/arm-linux-gnueabihf/12/include
 /usr/lib/gcc-cross/arm-linux-gnueabihf/12/../../../../arm-linux-gnueabihf/include
 /usr/include
$ clang -target arm-linux ~/test/hello.c -v  -c
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/llvm-14/lib/clang/14.0.6/include
 /usr/local/include
 /usr/include
End of search list.

(Note the /use/include at the end in both cases).

So it seems that the /include and /usr/include are considered to be fallbacks to be searched after the arch-specific directories. I guess that might make sense when you are cross compiling to another architecure, but maybe not when you are cross compiling to an entirely different OS I guess?

I think your clang patch has the right idea. It adds <sysroot>/include to the path, but only when an explicit sysroot is given, not when the actualy host system root is used? Maybe we should upstream that?

@infinity0
Copy link

infinity0 commented Nov 22, 2022

$(SYSROOT)/include is reserved for the host architecture.

Where does this assertion come from?

I meant this prescriptively, though I should probably have been more precise and said "triplet". The existing practise of "falling back" to $(SYSROOT)/include that goes back several decades, in the context of today's multiarch paths, should be re-interpreted as "reusing host-triplet files to avoid duplication, assuming they are compatible with the target-triplet". So it makes sense for arm-linux-gnueabihf-g++ to look in /usr/include on a x86_64-linux-gnu platform, but it does not make sense for wasm32-wasi-clang to do the same.

IMO the behaviour should not be different between giving an explicit or an implicit sysroot, because this results in non-uniform and surprising behaviour and makes things awkward when e.g. scripting and passing through default values, which sometimes might need to be done explicitly. Another flag like --sysroot-is-target (or --sysroot-no-multiarch, whatever) could be used to force the path to use $(SYSROOT)/include directly.

To summarise, my order of preference for the behaviour is this:

  1. When cross-compiling to an incompatible triplet, don't use $(SYSROOT)/include at all. (e.g. linux -> wasi, glibc -> musl, linux -> windows, etc, etc)
    • the user can explicitly opt into searching $(SYSROOT)/include using a flag that is orthogonal to --sysroot.
  2. Always fall back to $(SYSROOT)/include after $(SYSROOT)/include/$(MULTIARCH_TRIPLET).
  3. Treat the sysroot differently if --sysroot is given vs not given. (Bad bad bad).

BTW GCC apparently has separate --sysroot and -isysroot options, IMO this is really ugly, but if clang's #1 priority is to follow whatever GCC does then this would have to be the approach.

@sbc100
Copy link
Member

sbc100 commented Nov 22, 2022

I guess I don't quite see why we shouldn't assume that "sysroot-is-target" when --sysroot is passed? Thats basically what your clang patch does today doesn't it?

I'm also not sure how easy it will be for clang to decide in host triplet is "compatible", perhaps just comparing some parts of the host and target triplets?

I don't have particularly strong opinions, but it would seem a little redundant for sysroot builders (e.g. NDK, emscripten, wasi-sdk) to have <sysroot>/include not used at all and for them to have to ship all their headers in the triplet subdirectory. I guess we already do this for libraries so maybe we just accept this?

@sunfishcode
Copy link
Member

I like the idea of just accepting this. It's advantageous for WASI to be able to "fit in" within existing cross-compilation filesystem layouts.

@infinity0
Copy link

I guess I don't quite see why we shouldn't assume that "sysroot-is-target" when --sysroot is passed?

I'm saying these two things are orthogonal and in general one should preserve orthogonality in flag behaviours. If you assume "sysroot-is-target", this should be done regardless of whether --sysroot was passed explicitly or a default value was chosen for it. This general principle of programming makes it easier to compose lower-level tools into higher-level tools, as the behaviours are more predictable.

Take your example about sysroot builders, yes if they only focus on wasm32-wasi today it make seem "redundant" but if they extend their functionality or someone composes them with another tool, then it's not redundant and in fact it would be less messy.

Thats basically what your clang patch does today doesn't it?

I think you're confusing me with @paravoid , we are both Debian developers but I'm giving an alternative opinion here. I didn't write any patches for clang, but I do maintain rustc in Debian as well (where the question about include paths is moot).

@sbc100
Copy link
Member

sbc100 commented Nov 22, 2022

Regarding the duplication if headers, assuming we stick with our decision to treat wasm32-wasi-pthreads as a separate target, how should we handle the header redundancy? Should we:

a. Ship the headers twice, once under each target subdirectory
b. Ship the headers once in wasm32-wasi and have clang search or headers under the specific target (wasm32-wasi-pthreads) first and fall back to wasm32-wasi?

(Any other options?)

I imagine there is some precedent for (a) ? e.g. /usr/include/x86_64-linux-gnu/ and /usr/include/i386-linux-gnu/ contains a lot of identical files?

@sunfishcode
Copy link
Member

The wasi-sdk headers appear to be about 9 MiB installed. That's not free, but if it's significantly simpler to duplicate the headers than to figure out some kind of sharing configuration, I propose that we just duplicate it.

@sbc100
Copy link
Member

sbc100 commented Dec 17, 2022

Can we close this now that #331 has landed?

@abrown
Copy link
Collaborator Author

abrown commented Dec 19, 2022

Will #331 be shipped in the next version of wasi-sdk? I'm not too sure what the release process is but the original intent of this issue is that future versions of wasi-libc would have both targets. I think we can close this once that is figured out.

@sbc100
Copy link
Member

sbc100 commented Dec 19, 2022

Yes the next version should have two different target triples in the sysroot I think... although there maybe a little more work to make that happen still.

@sbc100
Copy link
Member

sbc100 commented Dec 19, 2022

the question of how we should distribute is kind of answered though: under a different target triple.

@abrown
Copy link
Collaborator Author

abrown commented Dec 21, 2022

I agree, but I'm concerned with solving the end problem: @sbc100 or @sunfishcode, what else needs to happen for the THREAD_MODEL=posix builds to be released in the next version of wasi-sdk? If it's just a wasi-sdk thing at this point, I'll open an issue there with whatever you suggest and we can close this one.

One other point, though: as I was writing up the proposal to add this new target to Rust, I began to wonder whether wasm32-wasi-pthread is in fact the right name for the target. I wonder if perhaps the Rust target might make more sense as wasm32-wasi-threads and then I foresee some confusion (use wasm32-wasi-threads for Rust, use wasm32-wasi-pthread for C, etc.). Can we quickly bikeshed the name of this target before we release anything?

@sbc100
Copy link
Member

sbc100 commented Dec 21, 2022

I'd be fine with calling the clang triple wasm32-wasi-threads if you prefer .. internally withing clang/llvm I believe we do use threadmodel=posix, but I guess that is an implementation detail.

@sbc100
Copy link
Member

sbc100 commented Dec 21, 2022

It interesting that using pthread in the singular form makes sense to me in the triple but not thread in the singular.

@sbc100
Copy link
Member

sbc100 commented Dec 21, 2022

(earlier comment deleted)

@sunfishcode
Copy link
Member

The current release process isn't very automated. I take the build artifacts from CI and manually upload them to the release. So to add more artifacts here, we just need to (a) make sure CI produces them, and (b) that whoever does the next release knows to upload those artifacts.

@abrown
Copy link
Collaborator Author

abrown commented Jan 19, 2023

Ok, I think the "how" is resolved at this point so we can close this. @yamt has WebAssembly/wasi-sdk#274 open to add wasm32-wasi-threads support to wasi-sdk. With that in place, the right answer for "how to distribute THREAD_MODEL=posix builds will be to use some new-ish version of wasi-sdk and tell the compiler to target wasm32-wasi-threads — all of that can be documented in the wasi-sdk repository.

@abrown abrown closed this as completed Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants