Skip to content

Commit

Permalink
Merge pull request #468 from ScorexFoundation/v2.1
Browse files Browse the repository at this point in the history
Branch for v2.1
  • Loading branch information
aslesarenko authored Jun 5, 2019
2 parents 1b7b5a6 + 76fc50d commit a6e7e7b
Show file tree
Hide file tree
Showing 265 changed files with 19,984 additions and 4,351 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
*.fdb_latexmk

*.log
docs/spec/out/
test-out/
flamegraphs/
# sbt specific
Expand Down
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cache:
language: scala

jdk:
- oraclejdk8
- oraclejdk9

script:
- sbt -jvm-opts .travis.jvmopts test
Expand Down
9 changes: 5 additions & 4 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,11 @@ git.gitUncommittedChanges in ThisBuild := true

val bouncycastleBcprov = "org.bouncycastle" % "bcprov-jdk15on" % "1.60"
val scrypto = "org.scorexfoundation" %% "scrypto" % "2.1.6"
val scorexUtil = "org.scorexfoundation" %% "scorex-util" % "0.1.3"
val scorexUtil = "org.scorexfoundation" %% "scorex-util" % "0.1.4"
val macroCompat = "org.typelevel" %% "macro-compat" % "1.1.1"
val paradise = "org.scalamacros" %% "paradise" % "2.1.0" cross CrossVersion.full

val specialVersion = "master-5ffd1bf8-SNAPSHOT"
val specialVersion = "master-534cb6f5-SNAPSHOT"
val specialCommon = "io.github.scalan" %% "common" % specialVersion
val specialCore = "io.github.scalan" %% "core" % specialVersion
val specialLibrary = "io.github.scalan" %% "library" % specialVersion
Expand All @@ -91,7 +91,7 @@ val libraryconf = "io.github.scalan" %% "library-conf" % specialVersion
val testingDependencies = Seq(
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.scalactic" %% "scalactic" % "3.0.+" % "test",
"org.scalacheck" %% "scalacheck" % "1.13.+" % "test",
"org.scalacheck" %% "scalacheck" % "1.14.+" % "test",
"junit" % "junit" % "4.12" % "test",
"com.novocode" % "junit-interface" % "0.11" % "test",
specialCommon, (specialCommon % Test).classifier("tests"),
Expand All @@ -115,6 +115,7 @@ libraryDependencies ++= Seq(
"com.typesafe.akka" %% "akka-actor" % "2.4.+",
"org.bitbucket.inkytonik.kiama" %% "kiama" % "2.1.0",
"com.lihaoyi" %% "fastparse" % "1.0.0",
"org.spire-math" %% "debox" % "0.8.0"
) ++ testingDependencies


Expand Down Expand Up @@ -192,7 +193,7 @@ lazy val sigma = (project in file("."))
.settings(commonSettings: _*)

def runErgoTask(task: String, sigmastateVersion: String, log: Logger): Unit = {
val ergoBranch = "v2.0"
val ergoBranch = "sigma-validation-settings"
log.info(s"Testing current build in Ergo (branch $ergoBranch):")
val cwd = new File("").absolutePath
val ergoPath = new File(cwd + "/ergo-tests/")
Expand Down
10 changes: 0 additions & 10 deletions docs/LangSpec.md
Original file line number Diff line number Diff line change
Expand Up @@ -521,16 +521,6 @@ class Coll[A] {
*/
def flatMap[B](f: A => Coll[B]): Coll[B]

/** Computes length of longest segment whose elements all satisfy some predicate.
*
* @param p the predicate used to test elements.
* @param from the index where the search starts.
* @return the length of the longest segment of this collection starting from index `from`
* such that every element of the segment satisfies the predicate `p`.
* @since 2.0
*/
def segmentLength(p: A => Boolean, from: Int): Int

/** Finds the first element of the $coll satisfying a predicate, if any.
*
* @param p the predicate used to test elements.
Expand Down
13 changes: 13 additions & 0 deletions docs/PR-review-checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## What should be checked during PR review

### For each $TypeName.$methodName there should be

1. test case in SigmaDslTests (checks SigmaDsl <-> ErgoScript equality)
2. test case in CostingSpecification
3. costing rule method in ${TypeName}Coster
4. for each SMethod registration
- .withInfo($description, $argsInfo)
- .withIRInfo($irBuilder, $opDescriptor)

### For each PredefinedFunc registration there should be
- PredefFuncInfo($irBuilder, $opDescriptor)
2 changes: 1 addition & 1 deletion docs/conversions.dot
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ digraph conversions {
GroupElement -> Boolean [label=".isIdentity"]
GroupElement -> Bytes [label=".nonce"]
//todo remove compressed flag, use GroupElementSerializer
GroupElement -> Bytes [label=".getEncoded(compressed)" color=red]
GroupElement -> Bytes [label=".getEncoded" color=red]

String -> Bytes [label="fromBase58(...)"]
String -> Bytes [label="fromBase64(...)"]
Expand Down
8 changes: 5 additions & 3 deletions docs/sigma-dsl.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
# Sigma: Scala DSL for smart contracts with zero knowledge proof of knowledge
# SigmaDsl: Scala DSL for smart contracts with zero knowledge proof of knowledge

## Intro
SigmaDsl is a domain-specific language embedded into Scala and designed to be
source code compatible with SigmaScript. This means you can write SigmaDsl
code directly in Scala IDE (e.g. IntelliJ IDEA) and copy-paste code snippets
between SigmaDsl and SigmaScript.
Special Scala macros can also be used to automatically translate SigmaDsl to
Sigma byte code.

SigmaDsl is implemented as a library in the framework of
[Special](https://github.com/scalan/special)
SigmaDsl is implemented as Scala library using [Special](https://github.com/scalan/special)
framework.

## See also
[Special](https://github.com/scalan/special)
12 changes: 12 additions & 0 deletions docs/soft-fork-log.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

## A log of changes leading to soft-fork

This list should be updated every time something soft-forkable is added.

### Changes since 2.0

- new type (SGlobal.typeCode = 106)
- new method (SGlobal.groupGenerator.methodId = 1)
- new method (SAvlTree.updateDigest.methodId = 15)
- removed GroupElement.nonce (changed codes of getEncoded, exp, multiply, negate)
- change in Coll.filter serialization format (removed tagged variable id, changed condition type)
6 changes: 6 additions & 0 deletions docs/spec/appendix_ergotree_serialization.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
\section{Serialization format of ErgoTree nodes}
\label{sec:appendix:ergotree_serialization}

\mnote{These subsections are autogenerated from instrumented ValueSerializers}

\input{generated/ergotree_serialization1.tex}
36 changes: 36 additions & 0 deletions docs/spec/appendix_integer_encoding.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
\section{Compressed encoding of integer values}

\subsection{VLQ encoding}
\label{sec:vlq-encoding}

\begin{verbatim}
public final void putULong(long value) {
while (true) {
if ((value & ~0x7FL) == 0) {
buffer[position++] = (byte) value;
return;
} else {
buffer[position++] = (byte) (((int) value & 0x7F) | 0x80);
value >>>= 7;
}
}
}
\end{verbatim}

\subsection{ZigZag encoding}
\label{sec:zigzag-encoding}

Encode a ZigZag-encoded 64-bit value. ZigZag encodes signed integers
into values that can be efficiently encoded with varint. (Otherwise,
negative values must be sign-extended to 64 bits to be varint encoded,
thus always taking 10 bytes in the buffer.

Parameter \lst{n} is a signed 64-bit integer.
This Java method returns an unsigned 64-bit integer, stored in a signed int because Java has no explicit unsigned support.

\begin{verbatim}
public static long encodeZigZag64(final long n) {
// Note: the right-shift must be arithmetic
return (n << 1) ^ (n >> 63);
}
\end{verbatim}
127 changes: 127 additions & 0 deletions docs/spec/appendix_motivation.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
\section{Motivations}
\label{sec:appendix:motivation}

\subsection{Type Serialization format rationale}
\label{sec:appendix:motivation:type}

Some operations of \ASDag have type parameters, for which concrete types
should be specified (since \ASDag is monomorphic IR). When the operation
(such as \hyperref[sec:serialization:operation:ExtractRegisterAs]{\lst{ExtractRegisterAs}}) is serialized those types should also be
serialized as part of operation. The following encoding is designed to
minimize a number of bytes required to represent type in the serialization
format of \ASDag.

In most cases type term serialises into a single byte. In the intermediate
representation of ErgoTree each type is represented by a tree of nodes where
leaves are primitive types and other nodes are type constructors.
Simple (but sub-optimal) way to serialize a type would be to give each
primitive type and each type constructor a unique type code. Then, to
serialize a node, we need to emit its code and then perform recursive descent
to serialize all children.
However, to save storage space, we use special encoding schema to save bytes
for the types that are used more often.

We assume the most frequently used types are:
\begin{itemize}
\item primitive types (\lst{Int, Byte, Boolean, BigInt, GroupElement,
Box, AvlTree})
\item Collections of primitive types (\lst{Coll[Byte]} etc)
\item Options of primitive types (\lst{Option[Int]} etc.)
\item Nested arrays of primitive types (\lst{Coll[Coll[Int]]} etc.)
\item Functions of primitive types (\lst{Box => Boolean} etc.)
\item First biased pair of types (\lst{(_, Int)} when we know the first
component is a primitive type).
\item Second biased pair of types (\lst{(Int, _)} when we know the second
component is a primitive type)
\item Symmetric pair of types (\lst{(Int, Int)} when we know both types are
the same)
\end{itemize}

All the types above should be represented in an optimized way (preferable by a single byte).
For other types, we do recursive descent down the type tree as it is defined in section~\ref{sec:ser:type}

\subsection{Constant Segregation rationale}

\subsubsection{Massive script validation}

Consider a transaction \lst{tx} which have \lst{INPUTS} collection of boxes to
spend. Every input box can have a script protecting it (\lst{propostionBytes}
property). This script should be executed in a context of the current
transaction. The simplest transaction have 1 input box. Thus if we want to
have a sustained block validation of 1000 transactions per second we need to
be able to validate 1000 scripts per second.

For every script (of input \lst{box}) the following is done in order to
validate it:
\begin{enumerate}
\item Context is created with \lst{SELF} = box
\item The script is deserialized into ErgoTree
\item ErgoTree is traversed to build costGraph and calcGraph, two graphs for
cost estimation function and script calculation function.
\item Cost estimation is computed by evaluating costGraph with current context data
\item If cost and data size limits are not exceeded, calcGraph is
evaluated using context data to obtain sigma proposition (see
\hyperref[sec:type:SigmaProp]{\lst{SigmaProp}})
\item Verification procedure is executed
\end{enumerate}

\subsubsection{Potential for Script processing optimization}

Before an \langname contract can be stored in a blockchain it should be first
compiled from its source text into ErgoTree and then serialized into byte
array.

Because the language is purely functional and IR is graph-based, the
compilation process has an effect of normalization/unification. This means
that different original scripts may have identical ErgoTrees and as the
result identical serialized bytes.

Because of normalization, and also because of script reusability, the number
of conceptually (or logically) different scripts is much less than the number
of individual scripts in a blockchain. For example we may have 1000s of
different scripts in a blockchain with millions of boxes.

The average reusability ratio is 1000 in this case. And even those different
scripts may have different usage frequency. Having big reusability ratio we
can optimize script evaluation by performing steps 1 - 4 only once per unique
script.

The compiled calcGraph can be cached in \lst{Map[Array[Byte], Context =>
SigmaBoolean]}. Every script extracted from an input box can be used as a key
in this map to obtain ready to execute graph.

However, we have a problem with constants embedded in contracts. There is one
obstacle to the optimization by caching. In many cases it is very natural to
embed constants in the script body, most notable scenario is when public keys
are embedded. As result two functionally identical scripts may serialize to
different byte arrays because they have different embedded constants.

\subsubsection{Constant-less ErgoTree}

The solution to the problem with embedded constants is simple, we don't need
to embed constants. Each constant in the body of \ASDag can be replaced
with indexed placeholder (see \hyperref[sec:appendix:primops:ConstantPlaceholder]{\lst{ConstantPlaceholder}}).
Each placeholder have an index field. The index of the placeholder is
assigned by breadth-first topological order of the graph traversal.

The transformation is part of compilation and is performed ahead of time.
Each \ASDag have an array of all the constants extracted from its body. Each
placeholder refers to the constant by the constant's index in the array.

Thus the format of serialized script is shown in Figure~\ref{fig:ser:ergotree} which contains:
\begin{enumerate}
\item number of constants
\item constants collection
\item script expression with placeholders
\end{enumerate}

The constants collection contains serialized constant data (using
ConstantSerializer) one after another.
The script expression is a serialized ErgoTree with placeholders.

Using this new script format we can use script expression part as a key in
the cache. An observation is that after the constants are extracted, what
remains is a template. Thus instead of applying steps 1-4 to
\emph{constant-full} scripts we can apply them to \emph{constant-less}
templates. Before applying steps 4 and 5 we need to bind placeholders with
actual values taken from the cconstants collection.
Loading

0 comments on commit a6e7e7b

Please sign in to comment.