-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement fast tier-1 JIT compiler #283
Comments
Adopting an interpreter-first approach, where the system starts with an interpreter and switches to JIT compilation upon identifying performance bottlenecks, can potentially influence the security landscape. Here are some points on how this approach may impact security concerns:
Balancing performance, functionality, and security remains a key consideration in this approach. Here are some academic papers that emphasize interpreter-first approaches or related concepts in JIT compilation: |
After the code check-in for the tier-1 JIT compiler based on the x86-64 architecture (#289), the following action items have been identified to complete our goals:
|
In the implementation of For example, these are the implementations of the 'addi' instruction for x64 and ARM64. As demonstrated in these examples, after harmonizing the API, the only difference lies in the use of the x64 register RAX and the ARM64 register R6.
emit_load(state, S32, platform_parameter_registers[0], RAX, offsetof(struct riscv_internal, X) + 4 * ir->rs1);
emit_alu32_imm32(state, 0x81, 0, RAX, ir->imm);
emit_store(state, S32, RAX, platform_parameter_registers[0], offsetof(struct riscv_internal, X) + 4 * ir->rd);
emit_load(state, S32, platform_parameter_registers[0], R6, offsetof(struct riscv_internal, X) + 4 * ir->rs1);
emit_alu32_imm32(state, 0x81, 0, R6, ir->imm);
emit_store(state, S32, R6, platform_parameter_registers[0], offsetof(struct riscv_internal, X) + 4 * ir->rd); We can simply rewrite them to reuse JIT template. #ifndef __x86_64
TEMP_REG RAX
#else
TEMP_REG R6
#endif
emit_load(state, S32, platform_parameter_registers[0], TEMP_REG, offsetof(struct riscv_internal, X) + 4 * ir->rs1);
emit_alu32_imm32(state, 0x81, 0, TEMP_REG, ir->imm);
emit_store(state, S32, TEMP_REG, platform_parameter_registers[0], offsetof(struct riscv_internal, X) + 4 * ir->rd); |
We are essentially developing a basic tool akin to LLVM TableGen, which empowers developers to define the fundamental concepts in any desired way through custom backends tailored for JIT code generation. |
The JIT backend for Arm64 has now been implemented (#304), and the built-in domain-specific language (DSL) has undergone refinements. These enhancements enable straightforward descriptions for code generation, facilitating the on-the-fly creation of machine code for both x86-64 and Arm64 architectures. |
The tier-1 JIT compiler is essentially a single-pass compiler, focused on speed and simplicity of compilation rather than complex analysis. In a single pass, it does not construct a graph-based intermediate representation of the code. Instead, it generates code for a limited number of instructions at a time, utilizing minimal context from preceding instructions. Despite its apparent simplicity, this method of compilation presents its own set of challenges in terms of accurate and efficient implementation. Nevertheless, we are exploring basic optimizations, such as:
This approach to JIT compilation seeks a balance between the need for rapid code generation and the opportunity to apply foundational optimizations. Meanwhile, I plan to discuss and confirm the feasibility of the aforementioned optimizations with @vacantron and @qwe661234 later. Reference: Whose baseline compiler is it anyway? |
A hybrid JIT combines aspects of both a compiler and an interpreter. In such a system, a single method can be executed partly through interpretation and partly through compilation. This approach leverages the fact that most programs spend the majority of their time executing a small fraction of their code. Therefore, fully compiling every method isn't necessary for enhanced execution speed; focusing on the most frequently executed code segments is often enough. Take, for example, a method where the bulk of execution time is spent in a short loop that runs numerous times. A hybrid JIT would compile just this loop, while interpreting the rest of the method. This strategy is particularly beneficial in embedded systems for two reasons. First, it reduces the memory needed to store the native code since the entire method isn't compiled. Second, it shortens the compilation time, aligning closer to soft-real-time performance. For instance, consider a method that initializes and then calculates the dot product of two vectors. In such a method, the two for-loops might be the only parts compiled into native code. If the complete method contains a large number of bytecode instructions, but each loop only has a few, this hybrid approach can significantly reduce the compilation workload and the size of the resulting native code. Crucially, it does this while still optimizing the most critical parts of the method. This method of JIT compilation thus offers a balance between performance efficiency and resource utilization, making it especially suitable for systems where memory and processing power are at a premium. Reference: A Small Hybrid JIT for Embedded Systems |
There are some topics that we can delve into further:
|
After investigating the difference in dynamic instruction count between
|
T1C is ready. |
To achieve the goal of integrating a portable JIT compiler (#81) , we plan to implement a multi-tier compilation framework. We will start with the Fast JIT tier and then transition to LLVM JIT (ORC lazy JIT) to achieve both quick startup times and improved performance.
Following a similar approach to V8 JIT tiering, we will employ backend threads (#239) with eager compilation. This implies that once the profiling data indicates the shift from interpreter to JIT (#189), all RISC-V instructions will undergo compilation with the Fast JIT. We will patiently await the compilation process's completion, then proceed to instantiate the runtime module and execute the RISC-V instructions. Concurrently, we'll initiate LLVM JIT to compile RISC-V instructions and progressively transition to LLVM JIT functions during execution.
Design goals:
Reference:
The text was updated successfully, but these errors were encountered: