Skip to content

Compiled code and handmade WASM interop #24157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
turran opened this issue Apr 21, 2025 · 16 comments
Open

Compiled code and handmade WASM interop #24157

turran opened this issue Apr 21, 2025 · 16 comments

Comments

@turran
Copy link

turran commented Apr 21, 2025

Hello all,

After successfully compiling a project, I'd like to optimize it. The program has some x86 asm code which I have ported to wasm by writing the corresponding wast file. So far, so good.

My question is how to interop the compiled project with the handmade .wasm file:

  1. How to include the .wasm file as part of the linking phase?
  2. How to call the .wasm function from C? Should it be enough to declare an extern function in C?
  3. How to access the memory of the allocated structs in C in WASM? I understand all allocations are done with a custom allocator into a single memory chunk. Is there some kind of description on what is actually passed as pointers to a wasm function? In my understanding, it should be just an offset of the actual memory chunk, is that correct? I can assume that void func(uint8* src, uint*8 dst, int n) On the .wasm side, src and dst are i32 offsets of the imported memory?
  4. When passing a shared memory, it is required to know the size beforehand, how to retrieve it to generate the proper (import 'foo' 'bar' (memory 1 SIZE shared))? What are 'foo and 'bar' here to properly reference the heap?

Thanks for helping me understand how emscripten/llvm work at this level.

@sbc100
Copy link
Collaborator

sbc100 commented Apr 21, 2025

If you want to link you wasm assmembly code into an emscripten project then the simplest way to do this would be write in using the LLVM assembly format and including it your project as a .s or .S file. See the assembly files that are part of emscripten for examples of how to do this:

./system/lib/libc/emscripten_memset_bulkmem.S
./system/lib/libc/emscripten_memcpy_bulkmem.S
./system/lib/wasm_worker/wasm_worker_initialize.S
./system/lib/libunwind/src/UnwindRegistersSave.S
./system/lib/libunwind/src/UnwindRegistersRestore.S
./system/lib/pthread/emscripten_thread_state.S
./system/lib/compiler-rt/stack_limits.S
./system/lib/compiler-rt/emscripten_tempret.s
./system/lib/compiler-rt/stack_ops.S

Alternatively, if the project in question has C/C++ fallbacks for the x86 assembly then that would likely be simpler than trying to write hand written wasm assembly.

@turran
Copy link
Author

turran commented Apr 21, 2025

Thanks @sbc100. Actually the code is written in WAST already which is much easier to code. I don't think writing llvm asm is an option here.

About the fallbacks, yes it does have it but I wanted to optimize it with v128 SIMD ops.

Checking emscripten sources there seems to be some internal logic to pass the heap to the wasm module, like

var b = wasmMemory.buffer;
I guess I can access such memory by importing it from my wasm module and access the shared memory buffer, am I correct?

Disassembling the generated emscripten wasm code, it is not crystal clear how the memory is accessed, but on the .js code seems that's the way to do it.

Am I on the correct path?

@sbc100
Copy link
Collaborator

sbc100 commented Apr 21, 2025

Thanks @sbc100. Actually the code is written in WAST already which is much easier to code. I don't think writing llvm asm is an option here.

Are you sure? Can you share the wast file so we can check it out together? I would hope it would be relatively easy to convert from one to the other is most cases.

About the fallbacks, yes it does have it but I wanted to optimize it with v128 SIMD ops.

Another alternative then would be write using the wasm simd C intrinsics, but it sounds like you already write the raw wast so that I likely not attractive to you either.

Checking emscripten sources there seems to be some internal logic to pass the heap to the wasm module, like

emscripten/src/runtime_shared.js

Line 64 in 41a730a

var b = wasmMemory.buffer;
I guess I can access such memory by importing it from my wasm module and access the shared memory buffer, am I correct?
Disassembling the generated emscripten wasm code, it is not crystal clear how the memory is accessed, but on the .js code seems that's the way to do it.

Am I on the correct path?

It sounds like you are proposing some kind of dynamic linking of two wasm modules, one produced by you direclty and one produced by emscripten. While this may be feasible its certainly not easy and not the simplest way to solve this kind of problem.

By far the simplest way to solve this (which will also lead to better performance) is to build your code as an object file and have emscripten link it into your program statically (i.e. at static link time). However, to produce an object file you really want to write your assembly in the llvm format. As well as being simple this will likely be the most performant option since it will allow wasm-opt to optimize the whole program as one.

@turran
Copy link
Author

turran commented Apr 21, 2025

Are you sure? Can you share the wast file so we can check it out together? I would hope it would be relatively easy to convert from one to the other is most cases.

Sure, it is not ready yet. Once it is, I will

Another alternative then would be write using the wasm simd C intrinsics, but it sounds like you already write the raw wast so that I likely not attractive to you either.

Isn't an option either. To give you more context, I'm porting https://gitlab.freedesktop.org/gstreamer/orc/ to WASM by providing a WASM target. Orc is basically a loop optimizer using different SIMD instructions (mmx, sse, avx, avx512, neon, etc) and it does so by either generating assembly code to link statically with, or generating the actual machine code for doing JIT execution. Currently, I'm on the assembly approach, which is easier to code, doing WAT. Later, once it works, I'll need to do the actual WASM bytecode. The two approaches provide different challenges. I'm currently trying to understand emscripten/llvm internals to be able to glue Orc there. At the end, on the static approach, I'll need to link to my new WASM (by doing a wat2wasm) and provide a way to pass C variables to it. Maybe from your comments, It will be more feasible to do the JIT directly and provide the glue myself, still the same questions remain as I don't know how to access the heap/pointers and provide them to the WASM module.

It sounds like you are proposing some kind of dynamic linking of two wasm modules, one produced by you direclty and one produced by emscripten. While this may be feasible its certainly not easy and not the simplest way to solve this kind of problem.

Yes, it seems so.

By far the simplest way to solve this (which will also lead to better performance) is to build your code as an object file and have emscripten link it into your program statically (i.e. at static link time). However, to produce an object file you really want to write your assembly in the llvm format. As well as being simple this will likely be the most performant option since it will allow wasm-opt to optimize the whole program as one.

I see, I understand now. I thought that the dynamic linking was against the wasm itself, not the intermediate object.
Any other thoughts or source files I can check?

@sbc100
Copy link
Collaborator

sbc100 commented Apr 21, 2025

I thought that the dynamic linking was against the wasm itself, not the intermediate object.

I'm afraid I don't quite understand the question. Can you elaborate?

@turran
Copy link
Author

turran commented Apr 21, 2025

I'm afraid I don't quite understand the question. Can you elaborate?

I apologize, yes. You were referring to the "llvm asm" option as the easiest one, but given that it is not possible, I'm wondering what more complex ways to achieve this are, if any.

@sbc100
Copy link
Collaborator

sbc100 commented Apr 21, 2025

The more complex way that it sounds like you are proposing would be to somehow to try to dynamically link wasm module that was not build by emscripten with and emscripten-built module. To do this I think you have two main choices:

  1. Build you main module with -sMAIN_MODULE=2 and then make your code looks like an emscripten side module (basically just a normal wasm module with a .dylink metadata section).
  2. Build statically but then try to implement some kind of dynamic linking in userspace. This sounds like it would result in a lot bespoke and fragile code but its certainly not impossible.

@turran
Copy link
Author

turran commented Apr 21, 2025

I see, thanks for the information. I'll check https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md as it seems it describes the current ABI to load wasm modules. Thanks!

@turran
Copy link
Author

turran commented Apr 23, 2025

Answering myself after some findings

How to include the .wasm file as part of the linking phase?

Check https://emscripten.org/docs/compiling/Dynamic-Linking.html#load-time-dynamic-linking simply do a
emcc -sMAIN_MODULE main.c libsomething.wasm
being libsomething.wasm the library already generated. If you want to use emscripten to build it, use -sSIDE_MODULE

How to call the .wasm function from C? Should it be enough to declare an extern function in C?

As long as it is defined in the side module (library) it should be found

I'm having issues with the relocatable feature, doing a wat2wasm with dynamic linking annotations as explained here https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md and calling wasm-objdump it gives me the following information

libtest01.wasm:	file format wasm 0x1

Section Details:

Type[2]:
 - type[0] (i32) -> nil
 - type[1] (i32, i32, i32, i32, i32) -> nil
Import[2]:
 - func[0] sig=0 <console.log> <- console.log
 - memory[0] pages: initial=1 <- memory.buffer
Function[1]:
 - func[1] sig=1 <orc_add2_rshift_sub_s16_11_op>
Export[1]:
 - func[1] <orc_add2_rshift_sub_s16_11_op> -> "orc_add2_rshift_sub_s16_11_op"
Code[1]:
 - func[1] size=527 <orc_add2_rshift_sub_s16_11_op>

It seems wat2wasm is not honoring the annotation. Maybe a bug?
But doing a wat2wasm with -r gives me this

libtest01.wasm:	file format wasm 0x1

Section Details:

Type[2]:
 - type[0] (i32) -> nil
 - type[1] (i32, i32, i32, i32, i32) -> nil
Import[2]:
 - func[0] sig=0 <console.log> <- console.log
 - memory[0] pages: initial=1 <- memory.buffer
Function[1]:
 - func[1] sig=1 <orc_add2_rshift_sub_s16_11_op>
Export[1]:
 - func[1] <orc_add2_rshift_sub_s16_11_op> -> "orc_add2_rshift_sub_s16_11_op"
Code[1]:
 - func[1] size=527 <orc_add2_rshift_sub_s16_11_op>
Custom:
 - name: "linking"
  - symbol table [count=2]
   - 0: F <console.log> func=0 [ undefined binding=global vis=default ]
   - 1: F <orc_add2_rshift_sub_s16_11_op> func=1 [ exported no_strip binding=local vis=hidden ]

Compiling the main module with the side module gives me this output

emcc -sMAIN_MODULE test01.c libtest01.wasm -o test
error: undefined symbol: orc_add2_rshift_sub_s16_11_op (referenced by root reference (e.g. compiled C/C++ code))
warning: To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0`
warning: _orc_add2_rshift_sub_s16_11_op may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library
Error: Aborting compilation due to previous errors
emcc: error: '/home/jl/w/github/gst.wasm/build/gst.wasm_web_wasm32/emsdk/node/18.20.3_64bit/bin/node /home/jl/w/github/gst.wasm/build/gst.wasm_web_wasm32/emsdk/upstream/emscripten/src/compiler.mjs /tmp/tmpjnsw022t.json' failed (returned 1)

If I do an emcc -sSIDE_MODULE test02.c with a naive symbol, to confirm the wat2wasm compatibility it gives me

wasm-objdump -x a.out.wasm 

a.out.wasm:	file format wasm 0x1

Section Details:

Custom:
 - name: "dylink.0"
 - mem_size     : 0
 - mem_p2align  : 0
 - table_size   : 0
 - table_p2align: 0
Type[2]:
 - type[0] () -> nil
 - type[1] (i32, i32, i32, i32, i32) -> nil
Import[4]:
 - global[0] i32 mutable=1 <- env.__stack_pointer
 - global[1] i32 mutable=0 <- env.__memory_base
 - global[2] i32 mutable=0 <- env.__table_base
 - memory[0] pages: initial=0 <- env.memory
Function[3]:
 - func[0] sig=0 <__wasm_call_ctors>
 - func[1] sig=0 <__wasm_apply_data_relocs>
 - func[2] sig=1 <orc_add2_rshift_sub_s16_11_op>
Export[3]:
 - func[0] <__wasm_call_ctors> -> "__wasm_call_ctors"
 - func[1] <__wasm_apply_data_relocs> -> "__wasm_apply_data_relocs"
 - func[2] <orc_add2_rshift_sub_s16_11_op> -> "orc_add2_rshift_sub_s16_11_op"
Code[3]:
 - func[0] size=2 <__wasm_call_ctors>
 - func[1] size=2 <__wasm_apply_data_relocs>
 - func[2] size=69 <orc_add2_rshift_sub_s16_11_op>

Which is different to what wat2wasm is doing. I'm a bit confused, maybe some version compatibility problem? Compiling the new .wasm side module (the one generated with emcc itself) does work.

@sbc100
Copy link
Collaborator

sbc100 commented Apr 23, 2025

wat2wasm -r is designed to be able to produce object files that can then be fed into the static linker. It does not produce emscripten dynamic libraries (side modules). (It is also very limited it what it can do and not well maintained/tested).

I don't know of any way to build and emscripten an dynamic library other than using emscirpten itself (or perhaps using wasm-ld directly).

Even if you did find a way to make a dynamic library from your wat file remember that dynamic linking comes at a cost, especially with wasm/emscripten. There is code size cost and a runtime cost when compared to static linking. Unless you really really need to the code to be loaded dynamically I would not recommend this approach.

@turran
Copy link
Author

turran commented Apr 24, 2025

wat2wasm -r is designed to be able to produce object files that can then be fed into the static linker. It does not produce emscripten dynamic libraries (side modules). (It is also very limited it what it can do and not well maintained/tested).

And is it possible to feed emcc with an object file (.wasm file) that is generated from wat2wasm -r?
On my tests, I get

error: undefined symbol: orc_add2_rshift_sub_s16_11_op (referenced by root reference (e.g. compiled C/C++ code))
warning: To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0`
warning: _orc_add2_rshift_sub_s16_11_op may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library
Error: Aborting compilation due to previous errors

which seems to be exported from the custom "linking" section

Seems that my situation is similar to WebAssembly/wabt#1658

@sbc100
Copy link
Collaborator

sbc100 commented Apr 24, 2025

wat2wasm -r is designed to be able to produce object files that can then be fed into the static linker. It does not produce emscripten dynamic libraries (side modules). (It is also very limited it what it can do and not well maintained/tested).

And is it possible to feed emcc with an object file (.wasm file) that is generated from wat2wasm -r? On my tests, I get

error: undefined symbol: orc_add2_rshift_sub_s16_11_op (referenced by root reference (e.g. compiled C/C++ code))
warning: To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0`
warning: _orc_add2_rshift_sub_s16_11_op may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library
Error: Aborting compilation due to previous errors

which seems to be exported from the custom "linking" section

Seems that my situation is similar to WebAssembly/wabt#1658

It should work, but it would not be surprising to me if wat2wasm -r has bit rotted. Its not well maintained or tested. I'm tempted to simply remove the -r feature, unless somebody (perhaps you?) whats to volunteer to maintain it.

@turran
Copy link
Author

turran commented Apr 24, 2025

It should work, but it would not be surprising to me if wat2wasm -r has bit rotted. Its not well maintained or tested. I'm tempted to simply remove the -r feature, unless somebody (perhaps you?) whats to volunteer to maintain it.

To be honest, I don't know where to start. Is it some emscripten wrong behavior with the "linking" custom section, or wat2wasm not following https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md ...

@sbc100
Copy link
Collaborator

sbc100 commented Apr 24, 2025

Can you share the object produced by wat2wasm -r (the one that should be defining orc_add2_rshift_sub_s16_11_op).. I can probably tell you what is wrong it it.

As we go down the rabbit hole though I would once again advice you to write your code in llvm assembly format to avoid this issue.

@sbc100
Copy link
Collaborator

sbc100 commented Apr 24, 2025

In addition to being easily convertible to a valid object file, the LLVM assembly format also has some added advantages over wat such as support the C pre-processor and supporting symbolic names for your static data.

@turran
Copy link
Author

turran commented Apr 25, 2025

Can you share the object produced by wat2wasm -r (the one that should be defining orc_add2_rshift_sub_s16_11_op).. I can probably tell you what is wrong it it.

I think I've found the issue but can't explain if it is a correct behavior or not. Basically,

For a code like

  (func (export "orc_add2_rshift_sub_s16_11_op")  (param $d1 i32) (param $s1 i32) (param $s2 i32) (param $s3 i32) (param $n i32)

wat2wasm -r generates the following "linking" table

- 1: F <orc_add2_rshift_sub_s16_11_op> func=1 [ exported no_strip binding=local vis=hidden ]

But for the following code (without export)

 (func $orc_add2_rshift_sub_s16_11_op (param $d1 i32) (param $s1 i32) (param $s2 i32) (param $s3 i32) (param $n i32)

The generated object file has

- 1: F <orc_add2_rshift_sub_s16_11_op> func=1 [ binding=global vis=default ]

The difference is on the binding and vis by just using the export statement. With the second form, it links correctly with Emscripten.

As we go down the rabbit hole though I would once again advice you to write your code in llvm assembly format to avoid this issue.
In addition to being easily convertible to a valid object file, the LLVM assembly format also has some added advantages over wat such as support the C pre-processor and supporting symbolic names for your static data.

Yes, and I appreciate your patience and help with this topic. As my particular requires building a "compiler" myself, I'd like to understand further the alternatives and how things work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants