-
Notifications
You must be signed in to change notification settings - Fork 9
Home
In this repository, we specify a formal semantics of Viper, a high-level language for smart contracts, on the top of KEVM. This is a stepping stone to the goal of "developing a formally verified compiler for a tiny language [such as Viper]" as mentioned in https://blog.ethereum.org/2016/09/01/formal-methods-roadmap/
Viper is a new experimental language for smart contracts running on EVM. It is designed with the goal of being a simpler and more secure alternative to Solidity. The design goal emphasizes security over efficiency, constant-bounded time function execution, and simplicity.
Viper is a "semi object-oriented" language: a class represents a contract's blueprint, and the class's fields represent the contract's data and functionality. An object represents a unique contract class instance on the blockchain. However, it does not support explicit inheritance.
Viper claims to be strongly and statically typed, with support for type inference.
Types consist of primitive types (num
of 128-bit signed integers, num256
of 256-bit unsigned integers, decimal
with 128-bit integer and 10-bit fractional numbers, bool
, address
with a size of 20 bytes, and bytes
of byte-arrays), and compound types (fixed-size arrays, structs, and maps with primitive typed keys).
Unlike EVM, an arithmetic overflow throws an exception.
A local variable has the function scope (i.e., no variable hiding in local blocks) but no declaration is needed, a decision that complicates code readability. It does not allow recursive functions nor unbounded-loops, but only bounded-loops. The language syntax is influenced by that of Python.
Specifying Viper is challenging due to its experimental status. Viper does not have an official specification document. It derives its specification mainly from its compiler to EVM, which is usually in flux. Therefore, we took a snapshot (a886850c1d) of the latest version of the compiler at the time of writing, and inferred the semantics via reverse-engineering the implementation and communicating with the language designers.
Using KEVM, we were able to capture the Viper language semantics by formalizing the compiler translation from Viper to EVM. This "semantics by translation" approach helps to emphasize the essence of Viper semantics by factoring it out from EVM semantics.
Our formalization of the translation from Viper to EVM mirrors the two-stage approach of the Viper's own compiler, with the first stage translating Viper to LLL and the second translating LLL to EVM.
LLL is an intermediate Lisp-like language originally designed for the Solidity compiler, but is much simpler and lower-level than Lisp and closer to EVM. An LLL expression (like an S-expression) is a list of an operator symbol followed by zero or more arguments. Some operators represent EVM opcodes with the same name. Only a fragment of LLL is used for the intermediate representation of Viper.
The Viper to LLL translation mainly consists of: lowering function calls using the ABI encoding; encoding logical data structures such as structs into byte-array representation (with 32-byte alignment); allocating memory and storage for local and state variables, resp.; inlining builtin-functions; type checking; and inserting runtime checks for arithmetic overflow. The control-flow statements are simply translated to that of LLL.
The LLL to EVM translation is rather straightforward.
An LLL expression with an EVM opcode is translated to a sequence of EVM opcodes that amounts to evaluating the arguments and pushing the results into the stack in the reverse order followed by the EVM opcode.
A control-flow operator such as if
and repeat
is translated in the usual way using the jump opcodes.
We found that the Viper language is closely bound to EVM, and inevitably inherits its limitations. We believe that a high-level language such as Viper should have a standalone, abstract semantics that is completely separate from the semantics of the underlying compilation target (EVM in this instance). This helps to better understand and reason about the language semantics, making it easier to apply the language to different targets. Also, this separation of concerns assists formal reasoning. Ideally, one should verify the high-level logic of a Viper program at a high level of abstraction, and then prove that the original Viper program simulates the translated EVM code, i.e., the EVM code does not admit exceptional behaviors (including EVM-specific low-level exceptions) until they are defined in the Viper program.
For example, an internal call (i.e., an intra-contract call) propagates the occurrence of exception all the way up to the root caller. This design seems to be due to quick prototyping for the compiler to EVM. However, we think that the language should provide a way of properly catching exceptions, and leave the logic of exception handling to the programmer.
Another example is the address space of maps. A map of Viper is a total function whose domain is the entire address space of the local memory, which may lead to conflict with other maps. Currently, Viper simply relies on a hashing scheme to avoid the conflict, hoping that the hash collision will not happen, but in theory, the collision may happen and the conflict can occur. Despite the fact that the probability of the hash collision is very low and the blockchain will not work without the underlying probabilistic guarantee, we do believe that a high-level language semantics should not depend on that, but rather admit a purely logical map semantics and let its compiler properly implement it, e.g., by having an extra protection layer for detecting and resolving the hash collision.
While formalizing the language semantics, we found several bugs in the production Viper compiler, including a critical security vulnerability for smart contracts. Those bugs (listed below) were reported, confirmed, and fixed.
We omit details of the security critical bug mentioned above; at the time of writing, the Viper team is still working on resolving the issue, which has the potential to cause major loss of deployed funds on a wide variety of contracts. We plan to release information on the bug found after a suitable patch has been applied to the Viper compiler.
One example bug was found in typecasting, where the compiler failed to introduce a runtime check.
For example, as_num256
is a typecast to num256
, a 256-bits unsigned integer, and is not supposed to allow negative input since it will be coerced to a quite different positive number.
The compiler statically rejects invalid casts when they are detected at compile time, but failed to add the run-time check when they are not.
For example, as_num256(-1)
is statically rejected, but as_num256(x - y)
was compiled to x - y
even if x < y
.
Another example bug was found in compiling the hash expression, sha3(e)
.
For example, sha3(0)
is supposed to be translated into EVM code, PUSH1, 0, PUSH1, 192, MSTORE, PUSH1, 32, PUSH1, 192, SHA3
, which means that the code first stores the value to be hashed, '0', in the local memory at the pre-designated location (here 192), and then computes SHA3 hash of the memory range (with starting location 192 and width 32, i.e., MEM[192..223]
).
But the actually generated code was: PUSH1, 0, PUSH1, 192, MSTORE, PUSH1, 192, PUSH1, 32, SHA3
(i.e., PUSH1 192
and PUSH1 32
were flipped), which means that it computes SHA3 hash of the wrong memory range (with starting location 32 and width 192, i.e., MEM[32..223]
).
This seemingly obvious bug had not been found as the following reason. Fortunately (or unfortunately) the unintended memory range MEM[32...191]
that is prepended to the value to be hashed (i.e., MEM[192...223]
) is fixed over all Viper programs.
Indeed, they are pre-occupied to store five values of size limits used to clamp values: ADDRSIZE
, MAXNUM
, MINNUM
, MAXDECIMAL
, MINDECIMAL
.
- https://github.com/ethereum/viper/issues/410
- https://github.com/ethereum/viper/issues/411
- https://github.com/ethereum/viper/issues/448
- https://github.com/ethereum/viper/issues/449
- https://github.com/ethereum/viper/issues/450
- https://github.com/ethereum/viper/issues/451
- https://github.com/ethereum/viper/issues/453
- https://github.com/ethereum/viper/issues/469
- https://github.com/ethereum/viper/issues/493
- https://github.com/ethereum/viper/issues/504
- https://github.com/ethereum/viper/issues/508
- https://github.com/ethereum/viper/issues/509
- https://github.com/ethereum/viper/issues/515
- https://github.com/ethereum/viper/issues/516
- https://github.com/ethereum/viper/issues/519