Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./lodestar: line 7: 2710 Illegal instruction (core dumped) #7074

Closed
q9f opened this issue Sep 9, 2024 · 27 comments · Fixed by #7164
Closed

./lodestar: line 7: 2710 Illegal instruction (core dumped) #7074

q9f opened this issue Sep 9, 2024 · 27 comments · Fixed by #7164
Labels
meta-bug Issues that identify a bug and require a fix.

Comments

@q9f
Copy link
Contributor

q9f commented Sep 9, 2024

Describe the bug

My lodestar v1.21.0/ae1f9d5 fails to run after an upgrade.

Sep-09 11:58:43.501[]                 info: Lodestar network=mainnet, version=v1.21.0/ae1f9d5, commit=ae1f9d5d8896cf8e5e1dc42cfeea5fbb4b69e99a
Sep-09 11:58:45.801[]                 info: Connected to LevelDB database path=/home/user/.local/share/lodestar/mainnet/chain-db
Sep-09 11:59:21.984[]                 info: Initializing beacon from a valid db state slot=9920736, epoch=310023, stateRoot=0xace5ae1e69658be542e301a6ef019f0da248da32cce3cfe7406340459e0a7401, isWithinWeakSubjectivityPeriod=true
Sep-09 11:59:24.654[monitoring]       info: Started monitoring service remote=beaconcha.in, machine=ireco, interval=60000
Sep-09 11:59:25.085[chain]            info: Historical state worker started
Sep-09 11:59:25.113[eth1]             info: Eth1 provider urls=http://192.168.1.200:8551,http://192.168.1.201:8551
Sep-09 11:59:25.124[execution]        info: Execution client urls=http://192.168.1.200:8551,http://192.168.1.201:8551
./lodestar: line 7:  2710 Illegal instruction     (core dumped) node --trace-deprecation --max-old-space-size=8192 ./packages/cli/bin/lodestar.js "$@"

Expected behavior

No illegal instruction.

Steps to reproduce

  ./lodestar beacon \
    --network "mainnet" \
    --suggestedFeeRecipient "0xmyAddress" \
    --execution.urls "http://192.168.1.201:8551" \
    --execution.urls "http://192.168.1.200:8551" \
    --jwt-secret "/path/to/jwtsecret.hex" \
    --rest.namespace "*" \
    --rest.address "192.168.1.211" \
    --rest.port "9596" \
    --discv5 \
    --port "9001" \
    --subscribeAllSubnets \
    --monitoring.endpoint 'https://beaconcha.in/api/v1/client/metrics?apikey=FooBarBaz'

Additional context

No response

Operating system

Linux

Lodestar version or commit hash

ae1f9d5

@q9f q9f added the meta-bug Issues that identify a bug and require a fix. label Sep 9, 2024
@q9f
Copy link
Contributor Author

q9f commented Sep 9, 2024

Same happens on validator.

Sep-09 12:06:35.103[]                 info: Lodestar network=mainnet, version=v1.21.0/ae1f9d5, commit=ae1f9d5d8896cf8e5e1dc42cfeea5fbb4b69e99a
Sep-09 12:06:35.106[]                 info: Connecting to LevelDB database path=/home/user/.local/share/lodestar/mainnet/validator-db
./lodestar: line 7:  1471 Illegal instruction     (core dumped) node --trace-deprecation --max-old-space-size=8192 ./packages/cli/bin/lodestar.js "$@"
~ user@draco
❯ node --version
v20.17.0

~ user@draco
❯ nvm list | tail -n1
lts/iron -> v20.17.0

@philknows
Copy link
Member

Do you happen to have debug: logs with this too?

@wemeetagain
Copy link
Member

@q9f what's the machine's cpu arch?

@matthewkeil
Copy link
Member

Are you running in docker or bare metal? What version did you upgrade from to help narrow the search.

@q9f
Copy link
Contributor Author

q9f commented Sep 10, 2024

Do you happen to have debug: logs with this too?

No, I can turn it on though, it won't go further than this.

@q9f what's the machine's cpu arch?

x64 (Intel® Celeron® Processor N5105 )

Linux franz 6.10.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 09 Sep 2024 02:38:45 +0000 x86_64 GNU/Linux

Are you running in docker or bare metal? What version did you upgrade from to help narrow the search.

Bare metal, managing node through nvm and building from git tag's source. I upgraded from 1.20.1 to 1.21.0.

@q9f
Copy link
Contributor Author

q9f commented Sep 10, 2024

Do you happen to have debug: logs with this too?

Debug logs seem to be useless here.

~/lodestar tags/v1.21.0* user@franz
❯ ./lodestar validator --network mainnet --logLevel trace
Sep-10 13:20:41.392[]                 info: Lodestar network=mainnet, version=v1.21.0/ae1f9d5, commit=ae1f9d5d8896cf8e5e1dc42cfeea5fbb4b69e99a
Sep-10 13:20:41.395[]                 info: Connecting to LevelDB database path=/home/user/.local/share/lodestar/mainnet/validator-db
./lodestar: line 7:  3540 Illegal instruction     (core dumped) node --trace-deprecation --max-old-space-size=8192 ./packages/cli/bin/lodestar.js "$@"
~/lodestar tags/v1.21.0* user@merak
❯ ./lodestar beacon --network mainnet --logLevel trace
Sep-10 13:22:19.806[]                 info: Lodestar network=mainnet, version=v1.21.0/ae1f9d5, commit=ae1f9d5d8896cf8e5e1dc42cfeea5fbb4b69e99a
Sep-10 13:22:20.874[]                 info: Connected to LevelDB database path=/home/user/.local/share/lodestar/mainnet/chain-db
Sep-10 13:22:57.257[]                 info: Initializing beacon from a valid db state slot=9920768, epoch=310024, stateRoot=0xc479c2156d85d7c49b499a9e05bf06e3724af7f62817622066a29bf97c88ef4f, isWithinWeakSubjectivityPeriod=true
Sep-10 13:22:59.718[chain]            info: Historical state worker started
Sep-10 13:22:59.738[eth1]             info: Eth1 provider urls=http://localhost:8551
Sep-10 13:22:59.749[execution]        info: Execution client urls=http://localhost:8551
./lodestar: line 7:  3329 Illegal instruction     (core dumped) node --trace-deprecation --max-old-space-size=8192 ./packages/cli/bin/lodestar.js "$@"

Let me try to find the core dump.

@q9f
Copy link
Contributor Author

q9f commented Sep 10, 2024

Running lodestar through node directly without args says "illegal hardware instruction"

~/lodestar tags/v1.21.0* user@franz
❯ node ./packages/cli/bin/lodestar.js validator           
Sep-10 13:27:00.060[]                 info: Lodestar network=mainnet, version=v1.21.0/ae1f9d5, commit=ae1f9d5d8896cf8e5e1dc42cfeea5fbb4b69e99a
Sep-10 13:27:00.063[]                 info: Connecting to LevelDB database path=/home/user/.local/share/lodestar/mainnet/validator-db
zsh: illegal hardware instruction (core dumped)  node ./packages/cli/bin/lodestar.js validator

@q9f
Copy link
Contributor Author

q9f commented Sep 10, 2024

Core dump:

           PID: 4566 (node)                                                                                                                                                                                                                                                                  
           UID: 1000 (user)                                                                                                                                                                                                                                                                  
           GID: 984 (users)                                                                                                                                                                                                                                                                  
        Signal: 4 (ILL)                                                                                                                                                                                                                                                                      
     Timestamp: Tue 2024-09-10 13:27:00 CEST (5min ago)                                                                                                                                                                                                                                      
  Command Line: node ./packages/cli/bin/lodestar.js validator                                                                                                                                                                                                                                
    Executable: /home/user/.nvm/versions/node/v20.17.0/bin/node                                                                                                                                                                                                                              
 Control Group: /user.slice/user-1000.slice/user@1000.service/tmux-spawn-21a9b3c8-0af0-4cf9-a7e9-7776c462912b.scope                                                                                                                                                                          
          Unit: user@1000.service                                                                                                                                                                                                                                                            
     User Unit: tmux-spawn-21a9b3c8-0af0-4cf9-a7e9-7776c462912b.scope                                                                                                                                                                                                                        
         Slice: user-1000.slice                                                                                                                                                                                                                                                              
     Owner UID: 1000 (user)                                                                                                                                                                                                                                                                  
       Boot ID: 9b894083e30f41ab9b50d433d4354f3f                                                                                                                                                                                                                                             
    Machine ID: 8358b82579f94b58aa3b6a47b5cd2806                                                                                                                                                                                                                                             
      Hostname: franz                                                                                                                                                                                                                                                                        
       Storage: /var/lib/systemd/coredump/core.node.1000.9b894083e30f41ab9b50d433d4354f3f.4566.1725967620000000.zst (present)                                                                                                                                                                
  Size on Disk: 22.3M                                                                                                                                                                                                                                                                        
       Message: Process 4566 (node) of user 1000 dumped core.                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                             
                Module /home/user/lodestar/node_modules/@node-rs/crc32-linux-x64-gnu/crc32.linux-x64-gnu.node without build-id.                                                                                                                                                              
                Module /home/user/lodestar/node_modules/@node-rs/crc32-linux-x64-gnu/crc32.linux-x64-gnu.node                                                                                                                                                                                
                Module /home/user/lodestar/node_modules/@napi-rs/snappy-linux-x64-gnu/snappy.linux-x64-gnu.node without build-id.                                                                                                                                                            
                Module /home/user/lodestar/node_modules/@napi-rs/snappy-linux-x64-gnu/snappy.linux-x64-gnu.node                                                                                                                                                                              
                Module /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node without build-id.                                                                                                                                                              
                Module /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node                                                                                                                                                                                
                Stack trace of thread 4566:                                                                                                                                                                                                                                                  
                #0  0x00007ae38cd569a1 n/a (/home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node + 0xa29a1)                                                                                                                                                                ELF object binary architecture: AMD x86-64                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                             
GNU gdb (GDB) 15.1                                                                                                                                                                                                                                                                           
Copyright (C) 2024 Free Software Foundation, Inc.                                                                                                                                                                                                                                            
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>                                                                                                                                                                                                                
This is free software: you are free to change and redistribute it.                                                                                                                                                                                                                           
There is NO WARRANTY, to the extent permitted by law.                                                                                                                                                                                                                                        
Type "show copying" and "show warranty" for details.                                                                                                                                                                                                                                         
This GDB was configured as "x86_64-pc-linux-gnu".   
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/user/.nvm/versions/node/v20.17.0/bin/node...
[New LWP 4566]
[New LWP 4570]
[New LWP 4567]
[New LWP 4575]
[New LWP 4576]
[New LWP 4568]
[New LWP 4569]
[New LWP 4573]
[New LWP 4571]
[New LWP 4574]
[New LWP 4572]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) 
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `node ./packages/cli/bin/lodestar.js validator'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00007ae38cd569a1 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
[Current thread is 1 (Thread 0x7ae38dc88840 (LWP 4566))]

(gdb) bt
#0  0x00007ae38cd569a1 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
#1  0x00007ae38cd3d218 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
#2  0x00007ae38cd4b6e9 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
#3  0x00007ae38cd3de13 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
#4  0x00007ae38cd3d4f6 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
#5  0x00007ae38cd3d5cf in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
#6  0x00007ae38cce5ace in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
#7  0x0000000000c55a69 in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) ()
#8  0x0000000000f5f8ff in v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) ()
#9  0x0000000000f6016d in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, unsigned long*, int) ()
#10 0x0000000000f60635 in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
#11 0x000000000196adf6 in Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit ()
#12 0x00000000018dcd1c in Builtins_InterpreterEntryTrampoline ()
#13 0x00000b5c19ac04e9 in ?? ()
#14 0x000016129046c499 in ?? ()
#15 0x0000000500000000 in ?? ()
#16 0x00000b5c19ac05b9 in ?? ()
#17 0x00002a5f14d88879 in ?? ()
#18 0x00000b5c19ac04e9 in ?? ()
#19 0x00000b5c19ac04e9 in ?? ()
#20 0x000016721c6996d1 in ?? ()
#21 0x000019f99b197a31 in ?? ()
#22 0x00000b5c19ac05b9 in ?? ()
#23 0xffffffff00000000 in ?? ()
#24 0x0000002000000000 in ?? ()
#25 0x0000000000000000 in ?? ()

(gdb) thread apply all backtrace full

Thread 11 (Thread 0x7ae38d6506c0 (LWP 4572)):
#0  0x00007ae38d782a19 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ae38d78e858 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00000000018c806a in uv__sem_wait (sem=<optimized out>) at ../deps/uv/src/unix/thread.c:639
        r = <optimized out>
#3  uv_sem_wait (sem=0x55f1400 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>) at ../deps/uv/src/unix/thread.c:695
No locals.
#4  0x0000000000e05445 in node::inspector::(anonymous namespace)::StartIoThreadMain(void*) ()
No symbol table info available.
#5  0x00007ae38d78639d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#6  0x00007ae38d80b49c in ?? () from /usr/lib/libc.so.6
No symbol table info available.

Thread 10 (Thread 0x7ae3654006c0 (LWP 4574)):
#0  0x00007ae38d782a19 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ae38d785479 in pthread_cond_wait () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00000000018c8239 in uv_cond_wait (cond=cond@entry=0x5601b00 <cond>, mutex=mutex@entry=0x5601ac0 <mutex>) at ../deps/uv/src/unix/thread.c:793
No locals.
#3  0x00000000018b496c in worker (arg=0x0) at ../deps/uv/src/threadpool.c:76
        w = <optimized out>
        q = <optimized out>
        is_slow_work = <optimized out>
#4  0x00007ae38d78639d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007ae38d80b49c in ?? () from /usr/lib/libc.so.6
No symbol table info available.

Thread 9 (Thread 0x7ae386a006c0 (LWP 4571)):
#0  0x00007ae38d782a19 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ae38d785479 in pthread_cond_wait () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00000000018c8239 in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:793
No locals.
#3  0x0000000000d3f72b in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
No symbol table info available.
#4  0x00007ae38d78639d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007ae38d80b49c in ?? () from /usr/lib/libc.so.6
No symbol table info available.

Thread 8 (Thread 0x7ae365e006c0 (LWP 4573)):
#0  0x00007ae38d782a19 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ae38d785479 in pthread_cond_wait () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00000000018c8239 in uv_cond_wait (cond=cond@entry=0x5601b00 <cond>, mutex=mutex@entry=0x5601ac0 <mutex>) at ../deps/uv/src/unix/thread.c:793
No locals.
#3  0x00000000018b496c in worker (arg=0x0) at ../deps/uv/src/threadpool.c:76
        w = <optimized out>
        q = <optimized out>
        is_slow_work = <optimized out>
#4  0x00007ae38d78639d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007ae38d80b49c in ?? () from /usr/lib/libc.so.6
No symbol table info available.

Thread 7 (Thread 0x7ae387e006c0 (LWP 4569)):
#0  0x00007ae38d782a19 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ae38d785479 in pthread_cond_wait () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00000000018c8239 in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:793
No locals.
#3  0x0000000000d3f72b in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
No symbol table info available.
#4  0x00007ae38d78639d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007ae38d80b49c in ?? () from /usr/lib/libc.so.6
No symbol table info available.

Thread 6 (Thread 0x7ae38cc006c0 (LWP 4568)):
#0  0x00007ae38d782a19 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ae38d785479 in pthread_cond_wait () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00000000018c8239 in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:793
No locals.
#3  0x0000000000d3f72b in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
No symbol table info available.
#4  0x00007ae38d78639d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007ae38d80b49c in ?? () from /usr/lib/libc.so.6
No symbol table info available.

Thread 5 (Thread 0x7ae3640006c0 (LWP 4576)):
#0  0x00007ae38d782a19 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ae38d785479 in pthread_cond_wait () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00000000018c8239 in uv_cond_wait (cond=cond@entry=0x5601b00 <cond>, mutex=mutex@entry=0x5601ac0 <mutex>) at ../deps/uv/src/unix/thread.c:793
No locals.
#3  0x00000000018b496c in worker (arg=0x0) at ../deps/uv/src/threadpool.c:76
        w = <optimized out>
        q = <optimized out>
        is_slow_work = <optimized out>
#4  0x00007ae38d78639d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007ae38d80b49c in ?? () from /usr/lib/libc.so.6
No symbol table info available.

Thread 4 (Thread 0x7ae364a006c0 (LWP 4575)):
#0  0x00007ae38d782a19 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ae38d785479 in pthread_cond_wait () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00000000018c8239 in uv_cond_wait (cond=cond@entry=0x5601b00 <cond>, mutex=mutex@entry=0x5601ac0 <mutex>) at ../deps/uv/src/unix/thread.c:793
No locals.
#3  0x00000000018b496c in worker (arg=0x0) at ../deps/uv/src/threadpool.c:76
        w = <optimized out>
        q = <optimized out>
        is_slow_work = <optimized out>
#4  0x00007ae38d78639d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007ae38d80b49c in ?? () from /usr/lib/libc.so.6
No symbol table info available.

Thread 3 (Thread 0x7ae38d6006c0 (LWP 4567)):
#0  0x00007ae38d80b750 in epoll_pwait () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00000000018cd0f6 in uv__io_poll (loop=loop@entry=0x71c87d8, timeout=2797) at ../deps/uv/src/unix/linux.c:1368
        lfields = 0x7ae388000b70
        events = {{events = 1, data = {ptr = 0xc, fd = 12, u32 = 12, u64 = 12}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}} <repeats 883 times>, {events = 0, data = {ptr = 0x8d74b5f900000000, fd = 0, u32 = 0, u64 = 10192971937697759232}}, {events = 31459, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 2374643129, data = {ptr = 0x8d5ffc9000007ae3, fd = 31459, u32 = 31459, u64 = 10187138577540872931}}, {events = 31459, data = {ptr = 0x7ae38d5ffbc0, fd = -1923089472, u32 = 2371877824, u64 = 135117748042688}}, {events = 2371877808, data = {ptr = 0x8d75454c00007ae3, fd = 31459, u32 = 31459, u64 = 10193129524342848227}}, {events = 31459, data = {ptr = 0xffffffffffffffff, fd = -1, u32 = 4294967295, u64 = 18446744073709551615}}, {events = 0, data = {ptr = 0x8d8a2db700000000, fd = 0, u32 = 0, u64 = 10199014570136174592}}, {events = 31459, data = {ptr = 0x7ae38d8a2da0, fd = -1920324192, u32 = 2374643104, u64 = 135117750807968}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 1, data = {ptr = 0x75, fd = 117, u32 = 117, u64 = 117}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x7ae38d5ffb74, fd = -1923089548, u32 = 2371877748, u64 = 135117748042612}}, {events = 2371877752, data = {ptr = 0x400007ae3, fd = 31459, u32 = 31459, u64 = 17179900643}}, {events = 0, data = {ptr = 0x4, fd = 4, u32 = 4, u64 = 4}}, {events = 10, data = {ptr = 0x100000000, fd = 0, u32 = 0, u64 = 4294967296}}, {events = 32, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x3000000018, fd = 24, u32 = 24, u64 = 206158430232}}, {events = 2371878256, data = {ptr = 0x8d5ffcb000007ae3, fd = 31459, u32 = 31459, u64 = 10187138714979826403}}, {events = 31459, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}} <repeats 83 times>, {events = 0, data = {ptr = 0x3736353400000000, fd = 0, u32 = 0, u64 = 3978425818282983424}}, {events = 4202862592, data = {ptr = 0xc5b1693c, fd = -978228932, u32 = 3316738364, u64 = 3316738364}}, {events = 0, data = {ptr = 0x7ae38d5ffbc0, fd = -1923089472, u32 = 2371877824, u64 = 135117748042688}}, {events = 2363494400, data = {ptr = 0x8d5ffd7000007ae3, fd = 31459, u32 = 31459, u64 = 10187139539613547235}}, {events = 31459, data = {ptr = 0x9, fd = 9, u32 = 9, u64 = 9}}, {events = 2167767840, data = {ptr = 0x8d5ffc8000007ffd, fd = 32765, u32 = 32765, u64 = 10187138508821397501}}, {events = 31459, data = {ptr = 0x7ae38d796435, fd = -1921424331, u32 = 2373542965, u64 = 135117749707829}}, {events = 2371878256, data = {ptr = 0x700007ae3, fd = 31459, u32 = 31459, u64 = 30064802531}}, {events = 0, data = {ptr = 0x7ae38d5ffd93, fd = -1923089005, u32 = 2371878291, u64 = 135117748043155}}, {events = 2281704800, data = {ptr = 0x200007ae3, fd = 31459, u32 = 31459, u64 = 8589966051}}, {events = 0, data = {ptr = 0x80, fd = 128, u32 = 128, u64 = 128}}, {events = 2281701424, data = {ptr = 0xfffffe9800007ae3, fd = 31459, u32 = 31459, u64 = 18446742527521356515}}, {events = 4294967295, data = {ptr = 0x71c87d8, fd = 119310296, u32 = 119310296, u64 = 119310296}}, {events = 14, data = {ptr = 0x8d5ffc5000000000, fd = 0, u32 = 0, u64 = 10187138302662934528}}, {events = 31459, data = {ptr = 0x7ae38d796ef4 <malloc+164>, fd = -1921421580, u32 = 2373545716, u64 = 135117749710580}}, {events = 0, data = {ptr = 0xf00000000, fd = 0, u32 = 0, u64 = 64424509440}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 15, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x71c8a08, fd = 119310856, u32 = 119310856, u64 = 119310856}}, {events = 2371878016, data = {ptr = 0x18b5e2900007ae3, fd = 31459, u32 = 31459, u64 = 111286145987410659}}, {events = 0, data = {ptr = 0xf, fd = 15, u32 = 15, u64 = 15}}, {events = 119310296, data = {ptr = 0x71c8a0800000000, fd = 0, u32 = 0, u64 = 512436224577765376}}, {events = 0, data = {ptr = 0x71c87d8, fd = 119310296, u32 = 119310296, u64 = 119310296}}, {events = 2371878096, data = {ptr = 0x18ba20900007ae3, fd = 31459, u32 = 31459, u64 = 111360775339145955}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}, {events = 0, data = {ptr = 0x8d5ffcd000000000, fd = 0, u32 = 0, u64 = 10187138852418748416}}, {events = 31459, data = {ptr = 0x71c8a40, fd = 119310912, u32 = 119310912, u64 = 119310912}}, {events = 119310296, data = {ptr = 0x71c8a0800000000, fd = 0, u32 = 0, u64 = 512436224577765376}}, {events = 0, data = {ptr = 0x71c87d8, fd = 119310296, u32 = 119310296, u64 = 119310296}}, {events = 1, data = {ptr = 0x8d5ffce000000000, fd = 0, u32 = 0, u64 = 10187138921138225152}}, {events = 31459, data = {ptr = 0x7ae38d7d361d <clock_gettime+29>, fd = -1921173987, u32 = 2373793309, u64 = 135117749958173}}}
        prep = {{events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}} <repeats 256 times>}
        inv = {prep = 0x7ae38d5fc0e0, events = 0x7ae38d5fcce0, nfds = -1}
        pe = <optimized out>
        e = {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}
        ctl = 0x7ae388000c38
        iou = 0x7ae388000cc0
        real_timeout = 2797
        q = <optimized out>
        w = <optimized out>
        sigmask = <optimized out>
        sigset = {__val = {0 <repeats 16 times>}}
        base = <optimized out>
        have_iou_events = <optimized out>
        have_signals = <optimized out>
        nevents = <optimized out>
        epollfd = <optimized out>
        count = 48
        nfds = <optimized out>
        fd = <optimized out>
        op = <optimized out>
        i = <optimized out>
        user_timeout = <optimized out>
        reset_timeout = 0
        __PRETTY_FUNCTION__ = "uv__io_poll"
#2  0x00000000018b9637 in uv_run (loop=0x71c87d8, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:447
        timeout = <optimized out>
        r = <optimized out>
        can_sleep = <optimized out>
#3  0x0000000000d446c0 in node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start()::{lambda(void*)#1}::_FUN(void*) ()
No symbol table info available.
#4  0x00007ae38d78639d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007ae38d80b49c in ?? () from /usr/lib/libc.so.6
No symbol table info available.

Thread 2 (Thread 0x7ae3874006c0 (LWP 4570)):
#0  0x00007ae38d782a19 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ae38d785479 in pthread_cond_wait () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00000000018c8239 in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:793
No locals.
#3  0x0000000000d3f72b in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
No symbol table info available.
#4  0x00007ae38d78639d in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007ae38d80b49c in ?? () from /usr/lib/libc.so.6
No symbol table info available.

Thread 1 (Thread 0x7ae38dc88840 (LWP 4566)):
#0  0x00007ae38cd569a1 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
No symbol table info available.
#1  0x00007ae38cd3d218 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
No symbol table info available.
#2  0x00007ae38cd4b6e9 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
No symbol table info available.
#3  0x00007ae38cd3de13 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
No symbol table info available.
#4  0x00007ae38cd3d4f6 in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
No symbol table info available.
#5  0x00007ae38cd3d5cf in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
No symbol table info available.
#6  0x00007ae38cce5ace in ?? () from /home/user/lodestar/node_modules/@chainsafe/blst-linux-x64-gnu/blst.linux-x64-gnu.node
No symbol table info available.
#7  0x0000000000c55a69 in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) ()
No symbol table info available.
#8  0x0000000000f5f8ff in v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) ()
No symbol table info available.
#9  0x0000000000f6016d in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, unsigned long*, int) ()
No symbol table info available.
#10 0x0000000000f60635 in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
No symbol table info available.
#11 0x000000000196adf6 in Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit ()
No symbol table info available.
#12 0x00000000018dcd1c in Builtins_InterpreterEntryTrampoline ()
No symbol table info available.
#13 0x00000b5c19ac04e9 in ?? ()
No symbol table info available.
#14 0x000016129046c499 in ?? ()
No symbol table info available.
#15 0x0000000500000000 in ?? ()
No symbol table info available.
#16 0x00000b5c19ac05b9 in ?? ()
No symbol table info available.
#17 0x00002a5f14d88879 in ?? ()
No symbol table info available.
#18 0x00000b5c19ac04e9 in ?? ()
No symbol table info available.
#19 0x00000b5c19ac04e9 in ?? ()
No symbol table info available.
#20 0x000016721c6996d1 in ?? ()
No symbol table info available.
#21 0x000019f99b197a31 in ?? ()
No symbol table info available.
#22 0x00000b5c19ac05b9 in ?? ()
No symbol table info available.
#23 0xffffffff00000000 in ?? ()
No symbol table info available.
#24 0x0000002000000000 in ?? ()
No symbol table info available.
#25 0x0000000000000000 in ?? ()
No symbol table info available.

@q9f
Copy link
Contributor Author

q9f commented Sep 10, 2024

Looks like a blst wrapper issue (or blst itself?)

@wemeetagain
Copy link
Member

Yeah, 1.21.0 introduced a new blst wrapper (a rewrite of the wrapper using napi-rs) and an updated version of blst (from a Sept 2022 version to the most recent release).

Will take some deeper investigation to see where the problem lies (whether in the newer version of blst, or in the wrapper).

@matthewkeil
Copy link
Member

you rock for the stack trace!!! amazing!!! Not sure which layer its coming from though from that info. We rewrote in rust recently so it adds a bit more complexity to that answer. I am putting together a branch for you that will allow a debug build of the bindings to get built locally on your box. Will fill out the function names of the first 6 lines on the back trace to narrow down where the illegal instruction is coming from

@q9f
Copy link
Contributor Author

q9f commented Sep 11, 2024

Just to be sane I did remove everything and built a fresh lodestar on both the affected machine and my workstation. I can confirm that this can not be reproduced on a modern Intel Core i9 - so I suspect something is off with the Celeron SoC.

I'm wondering if there is an easy way to debug the blst library directly as I don't think this stems from lodestar or the wrapper even.

Meanwhile, I'm downgrading to 1.20.2 for the time being.

Staying tuned for instructions.

@matthewkeil
Copy link
Member

matthewkeil commented Sep 11, 2024

@q9f I pushed a branch that will run the debug build of the c supranational/blst lib, the rust bindings in supranational/blst and our chainsafe/blst-ts bindings that wrap those. I added flags to optimize for GDB so the stack traces should be fully expressive. If you get a minute, please pull the branch and install/build/start as normal (like you did when it threw the error). It should dump a much more thorough stack trace. Thanks!

branch name is mkeil/afri-debug

As a note, I think you hunch is correct and that its in the assembly portion of blst. I do not think the assembly will show "proper stack traces" so we will get some meta from that. Another likely candidate is the napi-rs bindings layer. We have found some edges that are rough in that lib because its relatively new. Could be something as simple as a conditional build flag.

@q9f
Copy link
Contributor Author

q9f commented Sep 11, 2024

@matthewkeil this build from branch mkeil/afri-debug does not crash at signal 4.

You fixed it 😛

@matthewkeil
Copy link
Member

matthewkeil commented Sep 11, 2024

That is both wonderful and concerning because of the couple modifications I made when testing the debug build. What OS are you running? If you were able to build can I assume you have the rust toolchain on that machine?

@q9f
Copy link
Contributor Author

q9f commented Sep 12, 2024

Yes, I always maintain a Rust toolchain. The OS is Archlinux, by the way.

@matthewkeil
Copy link
Member

matthewkeil commented Sep 16, 2024

I've narrowed down the issue but an not sure why your system is reporting to not use std and I had to force it. @q9f
can you do me a favor and tell me what the value of the cargo env is on the celeron machine. You can do so using the following command rustc --print=cfg

I am trying to sort out the why so I can put up a PR in the supranational/blst repo as that is where the problem originates. Here is an excerpt of the build.rs that selects whether to conditionally apply the feature="std" that I overrode. There are some other segments below this snippet that also rely on those env vars that apply some other features which could also be where the issue is hidden but they seem less likely as manually forcing the std feature doesn't change their behavior.

    // account for cross-compilation [by examining environment variables]
    let target_os = env::var("CARGO_CFG_TARGET_OS").unwrap();
    let target_env = env::var("CARGO_CFG_TARGET_ENV").unwrap();
    let target_arch = env::var("CARGO_CFG_TARGET_ARCH").unwrap();
    let target_family = env::var("CARGO_CFG_TARGET_FAMILY").unwrap_or_default();

    let target_no_std = target_os.eq("none")
        || (target_os.eq("unknown") && target_arch.eq("wasm32"))
        || target_os.eq("uefi")
        || env::var("BLST_TEST_NO_STD").is_ok();

    if !target_no_std {
        println!("cargo:rustc-cfg=feature=\"std\"");         // this is the line I manually overrode
        if target_arch.eq("wasm32") || target_os.eq("unknown") {
            println!("cargo:rustc-cfg=feature=\"no-threads\"");
        }
    }

@q9f
Copy link
Contributor Author

q9f commented Sep 17, 2024

~/lodestar-debug unstable user@irena
❯ rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04)

~/lodestar-debug unstable user@irena
❯ rustc --print cfg
debug_assertions
panic="unwind"
target_abi=""
target_arch="x86_64"
target_endian="little"
target_env="gnu"
target_family="unix"
target_feature="fxsr"
target_feature="sse"
target_feature="sse2"
target_has_atomic="16"
target_has_atomic="32"
target_has_atomic="64"
target_has_atomic="8"
target_has_atomic="ptr"
target_os="linux"
target_pointer_width="64"
target_vendor="unknown"
unix

@jclapis
Copy link

jclapis commented Sep 26, 2024

Hey folks, I had a user run into this on my side and can reproduce it easily too. Running v1.22.0 on a CPU without the following:

  • adx
  • avx
  • avx2
  • bmi1
  • bmi2

New versions of BLST are supposed to dynamically detect CPU features at runtime; older versions had them fixed during compilation which is why Prysm and Lighthouse used to have "modern" vs. "portable" builds (and more importantly, Docker containers). Either option would probably resolve this issue.

@wemeetagain
Copy link
Member

It's strange that this is happening since we're using a fairly recent version of blst (post 0.3.12, only 3 months old) which seems to have the runtime detection.

@matthewkeil
Copy link
Member

@jclapis I posted the container info to Discord. Please feel free to drop results here when you get a chance to run them on the machine that was having issues. Thanks!

@matthewkeil
Copy link
Member

Forwarded from @jclapis on Discord. Using the same fix from @q9f solved his issue as well. Both are running Celeron processors so there is a common thread. Also got the info below as reference.

$ cat /proc/cpuinfo
processor: 0
vendor_id: GenuineIntel
cpu family: 6
model: 156
model name: Intel(R) Celeron(R) N5105 @ 2.00GHz
$
$
$ hyperdrive s ccf

Your CPU is missing support for the following features:
  - adx
  - avx
  - avx2
  - bmi1
  - bmi2

You must use the 'portable' image.

jcrtp-celeron

@matthewkeil
Copy link
Member

matthewkeil commented Oct 15, 2024

Added "portable" feature to bindings and perf tested results. Hopefully we get best of both worlds, as its faster and should be compatible. Building a test lodestar branch now with the change.

ChainSafe/blst-ts#154

@matthewkeil
Copy link
Member

@q9f I updated your branch mkeil/afri-debug with the bindings built with the portable feature. When you get a second please pull the changes and see if it resolved the issue.

cd /to/root/of/lodestar
rm -rf temp-deps/blst-ts
yarn clean && yarn clean:nm && yarn && yarn build

# start as you normally would

@q9f
Copy link
Contributor Author

q9f commented Oct 15, 2024

That works and does not fail with illegal instruction.

@matthewkeil
Copy link
Member

That works and does not fail with illegal instruction.

Awesome news @q9f! Thanks for checking that! Will PR the change in today and try to release with 1.23

@matthewkeil
Copy link
Member

PR in blst-ts
ChainSafe/blst-ts#154

PR in lodestar
#7164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-bug Issues that identify a bug and require a fix.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants