Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when loading the example plugin on Windows #125

Closed
AregevDev opened this issue Dec 24, 2023 · 29 comments · Fixed by #149
Closed

Crash when loading the example plugin on Windows #125

AregevDev opened this issue Dec 24, 2023 · 29 comments · Fixed by #149

Comments

@AregevDev
Copy link

AregevDev commented Dec 24, 2023

Hello,
I am using the crate setup example from the repo and Neovim crashes when loading the plugin DLL file by requiring the module. I am copying the DLL directly to NeoVim's bin directory.

The nvim process crashes with virtually no way to debug it.
I tried the latest git version of nvim-oxi.
I am on Windows, I added the /force:UNRESOLVED flags to my cargo config:

[target.x86_64-pc-windows-msvc]
rustflags = [
  "-C", "link-arg=/FORCE:UNRESOLVED",
]

[target.aarch64-pc-windows-msvc]
rustflags = [
  "-C", "link-arg=/FORCE:UNRESOLVED",
]
@AregevDev
Copy link
Author

I tested my plugin in WSL and it works as intended.
I noticed the size of the .so in a release build is much bigger than the Windows .dll equivalent, maybe the problem is in the link arguments.

@Kethku
Copy link

Kethku commented Feb 17, 2024

I'm encountering this as well. Crashes immediately upon requiring the module. I'm running on NVIM v0.10.0-dev-2135+g98a4ed0a1 and current git main for the nvim-oxi dependency

@A-Lamia
Copy link

A-Lamia commented Feb 17, 2024

I'm also having the same issue with crashing instantly with 0 errors or warnings on NVIM v0.10.0-dev-2384+g848fc8ede, for fun i was trying to port my config to use nvim-oxi and just gave up after 20 minutes or so, this could entirely be an issue I'm causing as I'm only somewhat familiar with rust but it seems based on this thread there might be an issue ?

@noib3
Copy link
Owner

noib3 commented Feb 17, 2024

I'll try debugging this today. Does it also crash on 0.9?

@Kethku
Copy link

Kethku commented Feb 17, 2024

Yes. It does still crash on 0.9

@noib3
Copy link
Owner

noib3 commented Feb 17, 2024

I've tested a few examples on 0.9, and they all work as expected.

Some functions are currently segfaulting on nightly. I'm working on a PR that'll fix those but I don't think they're related to this. AFAICT this is only an issue on Windows.

Windows has been problematic for a long time. There already are 2 PRs open (#85 and #122) that are trying to make CI pass, but neither has solved the problem yet. Personally, I don't have access to a Windows machine and I'm not familiar with how linking is supposed to work there.

If anyone can figure out the correct enchantments I'd be happy to merge a PR. Until then, this issue will probably remain open.

@noib3 noib3 changed the title NeoVim crashes when loading the example plguin Crash when loading the example plugin on Windows Feb 17, 2024
@AregevDev
Copy link
Author

AregevDev commented Mar 6, 2024

I was able to make it work by building NeoVim from source and linking to libnvim with all of it's dependencies.

My build.rs

fn main() {
    println!("cargo:rustc-link-search=native=PATH_TO_NEOVIM_REPO\\neovim\\build\\lib");
    println!("cargo:rustc-link-search=native=PATH_TO_NEOVIM_REPO\\neovim\\build\\.deps\\usr\\lib");

    println!("cargo:rustc-link-lib=Iphlpapi");
    println!("cargo:rustc-link-lib=User32");
    println!("cargo:rustc-link-lib=DbgHelp");
    println!("cargo:rustc-link-lib=Ole32");
    println!("cargo:rustc-link-lib=Netapi32");
    println!("cargo:rustc-link-lib=Shell32");


    println!("cargo:rustc-link-lib=libcharset");
    println!("cargo:rustc-link-lib=libiconv");
    println!("cargo:rustc-link-lib=libintl");
    println!("cargo:rustc-link-lib=libuv");
    println!("cargo:rustc-link-lib=lua51");
    println!("cargo:rustc-link-lib=luajit");
    println!("cargo:rustc-link-lib=luv");
    println!("cargo:rustc-link-lib=msgpack-c");
    println!("cargo:rustc-link-lib=msgpack-c_import");
    println!("cargo:rustc-link-lib=termkey");
    println!("cargo:rustc-link-lib=tree-sitter");
    println!("cargo:rustc-link-lib=unibilium");
    println!("cargo:rustc-link-lib=vterm");

    println!("cargo:rustc-link-lib=libnvim");
}

I believe the reason nvim-oxi can't interface with the precompiled NeoVim DLLs is because they don't have __declspec(dllexport) on them as they weren't meant to be seen. You have to explicitly add this attribute when dealing with dynamic linking on Windows.

We can have a build.rs file that pulls NeoVim depending on the selected version feature and build it from source (this will require CMake installed) Or we can publish raw Windows binaries for each supported NeoVim version (for the stables ones, 0.8 and 0.9 at least) We can let the user decide as well via a feature.

Keep in mind that this solution may introduce version mismatch bugs (say I compile a 0.9 nvim runtime with my plugin and load it into NeoVim 0.8) We need a way to circumvent that / Warn the user somehow.

Let me know how you want to solve this issue and I'll gladly work on it and submit a PR.

image

@theofabilous
Copy link
Contributor

I believe the reason nvim-oxi can't interface with the precompiled NeoVim DLLs is because they don't have __declspec(dllexport) on them as they weren't meant to be seen. You have to explicitly add this attribute when dealing with dynamic linking on Windows.

I don't think this is the case. AFAIK every non-static function is exported with __declspec(dllexport) via generated header files. I'm not sure if this has always been the case though. You can also check that the windows neovim .exe exports a lot of symbols on windows with MSVC's DUMPBIN cli tool via the /EXPORTS option


I was able to make it work by building NeoVim from source and linking to libnvim with all of it's dependencies.
(...)
We can have a build.rs file that pulls NeoVim depending on the selected version feature and build it from source (this will require CMake installed) Or we can publish raw Windows binaries for each supported NeoVim version (for the stables ones, 0.8 and 0.9 at least) We can let the user decide as well via a feature.

I'm almost entirely sure this wouldn't work. The statically-linked libnvim library will be referencing its own symbols and state which aren't the same as the ones in the actual neovim instance. Trying to use any of the C api functions that indirectly reference internal state (i.e. probably every single api function) probably won't work: e.g editing a buffer with a Buffer handle, like with nvim_buf_set_lines, would fail because the handle indirectly references data in neovim that doesn't correspond to that of libnvim

Assuming the picture shown in your comment (with 42 printed on the command line) is the result of the example in the README, the only reason why it worked is that the example only uses the lua apis, which are loaded dynamically and called by the lua library bundled with neovim, and passing its own lua_State to the functions.

This means, at least, that the crate's symbols are exported correctly (because neovim is able to find the entrypoint function, hence why 42 was successfully printed in your screenshot)

@AregevDev
Copy link
Author

AregevDev commented Mar 7, 2024

You are right, any call to nvim API segfaults with an access violation error. I tried debugging the nvim process in VS.
devenv_WTHsOHMTmd

It seems like nvim.exe links libnvim statically as well. So I can't really access the nvim symbols, can I?
dumpbin.exe indeed shows the nvim_* symbols as exported.

@AregevDev
Copy link
Author

AregevDev commented Mar 7, 2024

I believe the only solution is to load everything dynamically using something like libloading. This is really painful to do since nvim-oxi writes it's bindings by hand.

I was able to make a simple proof of concept using libloading and bindgen to load the lua51 DLL from the neovim directory and call some functions from it. We can write a header that defines exactly whatever symbols we use, generate bindings for them and replace the current ffi approach.

I believe this is a better approach than making the compiler assume that the symbols are there, even on platforms that it works.
Let me know what you think about it.

@AregevDev
Copy link
Author

This requires major refactoring. @noib3, please let me know what you think about it. I believe I can come up with something like that.

@AregevDev
Copy link
Author

AregevDev commented Mar 7, 2024

Some of the header definitions, like the NeoVim APIs, are auto-generated. Should I include them in the project and run bindgen on them or should I just resort to writing the function definitions by hand?

@noib3
Copy link
Owner

noib3 commented Mar 7, 2024

I still don't understand why the current approach doesn't work on Windows.

AFAIK every non-static function is exported with __declspec(dllexport) via generated header files.

I believe so. If I build Neovim from source and open one of the auto-generated header files in /build/include/api I can see that every function is prefixed by __declspec(dllexport), for example:

// IWYU pragma: private, include "nvim/api/autocmd.h"
#define DEFINE_FUNC_ATTRIBUTES
#include "nvim/func_attr.h"
#undef DEFINE_FUNC_ATTRIBUTES
#ifndef DLLEXPORT
#  ifdef MSWIN
#    define DLLEXPORT __declspec(dllexport)
#  else
#    define DLLEXPORT
#  endif
#endif
DLLEXPORT Array nvim_get_autocmds(Dict(get_autocmds) *opts, Error *err) FUNC_API_SINCE(9);
DLLEXPORT Integer nvim_create_autocmd(uint64_t channel_id, Object event, Dict(create_autocmd) *opts, Error *err) FUNC_API_SINCE(9);
DLLEXPORT void nvim_del_autocmd(Integer id, Error *err) FUNC_API_SINCE(9);
DLLEXPORT void nvim_clear_autocmds(Dict(clear_autocmds) *opts, Error *err) FUNC_API_SINCE(9);
DLLEXPORT Integer nvim_create_augroup(uint64_t channel_id, String name, Dict(create_augroup) *opts, Error *err) FUNC_API_SINCE(9);
DLLEXPORT void nvim_del_augroup_by_id(Integer id, Error *err) FUNC_API_SINCE(9);
DLLEXPORT void nvim_del_augroup_by_name(String name, Error *err) FUNC_API_SINCE(9);
DLLEXPORT void nvim_exec_autocmds(Object event, Dict(exec_autocmds) *opts, Error *err) FUNC_API_SINCE(9);
#include "nvim/func_attr.h"

I believe the only solution is to load everything dynamically using something like libloading

I'm not merging this.

@AregevDev
Copy link
Author

AregevDev commented Mar 7, 2024

On Windows, every symbol needs to be known at link time I believe. You don't link straight to a DLL, you need a .lib file with the symbol definitions. Maybe I'll try creating that for lua, nvim and libuv.

I'm not merging this.

I see, unfortunate.

@noib3
Copy link
Owner

noib3 commented Mar 7, 2024

Isn't that what /FORCE:UNRESOLVED is for?

@AregevDev
Copy link
Author

I don't think so? /force:unresolved suppresses linker errors. It doesn't necessarily tell where the symbols are IIUC.

@AregevDev
Copy link
Author

AregevDev commented Mar 7, 2024

I made it! I built NeoVim from source and linked against the .lib definitions. The API example works as expected.
image

@theofabilous
Copy link
Contributor

This requires packaging the .lib with nvim-oxi right? That would also require rebuilding neovim and regenerating it for every new release/build (every day for nightly), for both x86 and aarch64. I'm not sure this is the best solution

@AregevDev
Copy link
Author

AregevDev commented Mar 7, 2024

Not necessarily, the .lib files only contain function definitions to interface with the symbols inside nvim.exe at link time.
After thorough testing, I can confirm that both nvim.lib and lua51.lib work interchangeably in NeoVim nightly and 0.9, everything works as long as you are compiling with the correct version feature.

If storing the .lib files is an issue, I can write a script that generates a nvim.lib using dumpbin.exe. However, the user will need NeoVim installed to build the crate.

I can use https://github.com/SecSamDev/dumpbin inside build.rs

@theofabilous
Copy link
Contributor

theofabilous commented Mar 8, 2024

Not sure if I understand correctly. Do you mean that the same .lib file works with both 0.9 and nightly? I don't think that should normally work since .lib import libraries (as I understand them) map symbol names to file offsets (addresses) and addresses can change when binaries change. I.e. they need to match up exactly with the neovim .exe so I don't think bundling those files is a safe bet.

If storing the .lib files is an issue, I can write a script that generates a nvim.lib using dumpbin.exe. However, the user will need NeoVim installed to build the crate.

Can it be generated from just an .exe or would users need to build from source?

On another note, I think I've found an alternative solution by tweaking rust's linker attributes. CI seems to be passing on windows, I'll look into submitting a PR soon

@AregevDev
Copy link
Author

AregevDev commented Mar 8, 2024

Yeah, I generate those from nvim.exe. The user should have it in PATH or supply an installation directory.

dumpbin.exe is also required. Rust on MSVC requires you to have VS installed. However, I couldn't find a clean way to detect the dumpbin executable on a system. I tried using the registry but it didn't quite work.

@AregevDev
Copy link
Author

On another note, I think I've found an alternative solution by tweaking rust's linker attributes. CI seems to be passing on windows, I'll look into submitting a PR soon

Oh? What flags worked for you?

@AregevDev
Copy link
Author

On another note, I think I've found an alternative solution by tweaking rust's linker attributes. CI seems to be passing on windows, I'll look into submitting a PR soon

@theofabilous Credits where credits due.

I used Rust's built-in #[link] attribute above each extern C. When setting the link-type to raw-dylib, rustc automatically generates an import library. This solved the issue without providing our import libraries. Most importantly, it now relies on the NeoVim executable, making the fix compatible with all versions.

I added a mechanism to locate the NeoVim installation directory, it will look for NeoVim in the PATH and then in an environment variable NVIM_DIR.

These changes are reflected in my PR.

@theofabilous
Copy link
Contributor

Hey, sorry for the delayed response. Just took a look at your PR and yes, my solution was in fact to use #[link(kind = "raw_dylib")]. Thanks for the shoutout haha

However there are some caveats, I think that your PR as it stands wouldn't necessarily correctly link the extern nvim symbols for all windows machines, and raw_dylib is only valid on windows (which is why all of CI is failing, it's a build time error) so requires a cfg_attr . I had some other related tweaks on my local branch, but since the core solution is similar to yours, are you good with me making a PR rebased onto your branch?

@AregevDev
Copy link
Author

I would prefer sticking with my PR, but ok go ahead.

I found an issue with using #[link] attribute. For some reason, NeoVim can't see the plugin entry point. I tried comparing the two plugin DLLs, they have different RVAs for their entry point, which is concerning.

I went down the rabbit hole of looking at the implementation for raw-dylib. It seems like it saves the lib files in a temporary directory only if -C save-temps is set. No matter where I looked, I couldn't find the generated import libraries. I want to see if they are any different.

Can you reproduce it on your machine @theofabilous?

@theofabilous
Copy link
Contributor

If the plugin entry point can't be found by neovim, I don't think it's related to #[link]/raw-dylib because those pertain to the rust library importing symbols from other places, whereas neovim finding the entry function is related to how the rust library exports its symbols. I don't see how using that attribute somewhere would affect neovim's overall ability to find the entry point.

I tried comparing the two plugin DLLs, they have different RVAs for their entry point, which is concerning.

I don't know how DLL's are organized, but this would make sense since using raw-dylib basically inlines an .lib inside the DLL which could offset the next section(s) (if any). If the binaries are different, then it makes sense that the RVAs are different, because some other data before/after it presumably changed too. If its only the RVAs that changed (and nothing else) then yes I agree that is concerning.

If you're looking to investigate why neovim can't find the entry function, I would suggest looking into the output of DUMPBIN with the /EXPORTS option on the output files in target/{debug,release}/.... But no I did not encounter this issue on my windows machine.

It seems like it saves the lib files in a temporary directory only if -C save-temps is set. No matter where I looked, I couldn't find the generated import libraries. I want to see if they are any different.

You can just look at the actual generated DLL. As I mentioned above, raw-dylib essentially sticks the .lib inside the DLL. But again, I'm not sure how the import library is pertinent to issues related to exporting the entry point symbol to neovim.

@theofabilous
Copy link
Contributor

I would prefer sticking with my PR, but ok go ahead.

Ok, thats fine. What would you like me to do? Most of CI (obv not neovim-nightly, see #134) is passing on my branch with the exception of libuv stuff.

@noib3
Copy link
Owner

noib3 commented Mar 9, 2024

@theofabilous could you open a PR? Imo there's no point in blocking this over whose PR gets merged.

@theofabilous
Copy link
Contributor

@noib3 just opened a draft. Note that last I checked, the clippy check fails due to some new lint that was introduced in the nightly toolchain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants