Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient String Concatenation When Compiling C to WASM/WAT #389

Open
XinyuShe opened this issue Mar 4, 2024 · 2 comments
Open

Inefficient String Concatenation When Compiling C to WASM/WAT #389

XinyuShe opened this issue Mar 4, 2024 · 2 comments

Comments

@XinyuShe
Copy link

XinyuShe commented Mar 4, 2024

I've encountered an issue while compiling C code to WASM, and subsequently converting it to WAT. The issue pertains to the way string concatenation is handled in the WAT output.
b.zip
Here's a snippet of my C source code snippet:

char src[50] = "Hello, ";
char dest[50] = "World!";
strcat(src, dest);

After compiling this C code to WASM and then converting it to WAT, I expected to find both strings 'Hello, ' and 'World!' in the data section of the WAT file. However, I could only find 'Hello, ' in the data section.

Instead of finding 'World!' as a contiguous string in the data section, I found it concatenated character by character in the function body, like so:

i32.const 87
local.set 44
local.get 4
local.get 44
i32.store8 offset=16
i32.const 111
local.set 45
local.get 4
local.get 45
i32.store8 offset=17
i32.const 114
local.set 46
local.get 4
local.get 46
i32.store8 offset=18
i32.const 108
local.set 47
local.get 4
local.get 47
i32.store8 offset=19
i32.const 100
local.set 48
local.get 4
local.get 48
i32.store8 offset=20
i32.const 33
local.set 49
local.get 4
local.get 49
i32.store8 offset=21

image

I'm puzzled by this behavior. Storing the strings in the data section seems to be a more efficient approach than concatenating them character by character in the function body. Is there a specific reason for this implementation? Could this be an optimization issue with the compiler?

@sunfishcode
Copy link
Member

It's probably a target-independent optimization in upstream LLVM doing this. Are you compiling with -O2? If so, it may be worth trying with -Oz or -Os instead.

@XinyuShe
Copy link
Author

XinyuShe commented Mar 5, 2024

Hi, thanks for your suggestion! @sunfishcode
I try O0,O1,O2,O3,Os,Oz one by one, but only O0 has string 'Hello, ', and no one has string 'World!'
here is my cmd:

 clang -O0 --target=wasm32-wasi -o b_o.wasm b.c ; wasm2wat b_o.wasm -o b_o.wat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants