Merge pull request #468 from ScorexFoundation/v2.1

Branch for v2.1
ergoplatform · Jun 5, 2019 · a6e7e7b · a6e7e7b
2 parents 1b7b5a6 + 76fc50d
commit a6e7e7b
Show file tree

Hide file tree

Showing 265 changed files with 19,984 additions and 4,351 deletions.
diff --git a/.gitignore b/.gitignore
@@ -9,6 +9,7 @@
 *.fdb_latexmk
 
 *.log
+docs/spec/out/
 test-out/
 flamegraphs/
 # sbt specific

diff --git a/.travis.yml b/.travis.yml
@@ -14,7 +14,7 @@ cache:
 language: scala
 
 jdk:
-  - oraclejdk8
+  - oraclejdk9
 
 script:
   - sbt -jvm-opts .travis.jvmopts test

diff --git a/build.sbt b/build.sbt
@@ -73,11 +73,11 @@ git.gitUncommittedChanges in ThisBuild := true
 
 val bouncycastleBcprov = "org.bouncycastle" % "bcprov-jdk15on" % "1.60"
 val scrypto            = "org.scorexfoundation" %% "scrypto" % "2.1.6"
-val scorexUtil         = "org.scorexfoundation" %% "scorex-util" % "0.1.3"
+val scorexUtil         = "org.scorexfoundation" %% "scorex-util" % "0.1.4"
 val macroCompat        = "org.typelevel" %% "macro-compat" % "1.1.1"
 val paradise           = "org.scalamacros" %% "paradise" % "2.1.0" cross CrossVersion.full
 
-val specialVersion = "master-5ffd1bf8-SNAPSHOT"
+val specialVersion = "master-534cb6f5-SNAPSHOT"
 val specialCommon  = "io.github.scalan" %% "common" % specialVersion
 val specialCore    = "io.github.scalan" %% "core" % specialVersion
 val specialLibrary = "io.github.scalan" %% "library" % specialVersion
@@ -91,7 +91,7 @@ val libraryconf = "io.github.scalan" %% "library-conf" % specialVersion
 val testingDependencies = Seq(
   "org.scalatest" %% "scalatest" % "3.0.5" % "test",
   "org.scalactic" %% "scalactic" % "3.0.+" % "test",
-  "org.scalacheck" %% "scalacheck" % "1.13.+" % "test",
+  "org.scalacheck" %% "scalacheck" % "1.14.+" % "test",
   "junit" % "junit" % "4.12" % "test",
   "com.novocode" % "junit-interface" % "0.11" % "test",
   specialCommon, (specialCommon % Test).classifier("tests"),
@@ -115,6 +115,7 @@ libraryDependencies ++= Seq(
   "com.typesafe.akka" %% "akka-actor" % "2.4.+",
   "org.bitbucket.inkytonik.kiama" %% "kiama" % "2.1.0",
   "com.lihaoyi" %% "fastparse" % "1.0.0",
+  "org.spire-math" %% "debox" % "0.8.0"
 ) ++ testingDependencies
 
 
@@ -192,7 +193,7 @@ lazy val sigma = (project in file("."))
     .settings(commonSettings: _*)
 
 def runErgoTask(task: String, sigmastateVersion: String, log: Logger): Unit = {
-  val ergoBranch = "v2.0"
+  val ergoBranch = "sigma-validation-settings"
   log.info(s"Testing current build in Ergo (branch $ergoBranch):")
   val cwd = new File("").absolutePath
   val ergoPath = new File(cwd + "/ergo-tests/")

diff --git a/docs/LangSpec.md b/docs/LangSpec.md
@@ -521,16 +521,6 @@ class Coll[A] {
     */
   def flatMap[B](f: A => Coll[B]): Coll[B]
 
-  /** Computes length of longest segment whose elements all satisfy some predicate.
-    *
-    *  @param   p     the predicate used to test elements.
-    *  @param   from  the index where the search starts.
-    *  @return  the length of the longest segment of this collection starting from index `from`
-    *           such that every element of the segment satisfies the predicate `p`.
-    *  @since 2.0
-    */
-  def segmentLength(p: A => Boolean, from: Int): Int
-
   /** Finds the first element of the $coll satisfying a predicate, if any.
     *
     *  @param p       the predicate used to test elements.

diff --git a/docs/PR-review-checklist.md b/docs/PR-review-checklist.md
@@ -0,0 +1,13 @@
+## What should be checked during PR review
+
+### For each $TypeName.$methodName there should be
+
+1. test case in SigmaDslTests (checks SigmaDsl <-> ErgoScript equality)
+2. test case in CostingSpecification
+3. costing rule method in ${TypeName}Coster
+4. for each SMethod registration
+     - .withInfo($description, $argsInfo)
+     - .withIRInfo($irBuilder, $opDescriptor)
+
+### For each PredefinedFunc registration there should be
+     - PredefFuncInfo($irBuilder, $opDescriptor)
diff --git a/docs/conversions.dot b/docs/conversions.dot
@@ -64,7 +64,7 @@ digraph conversions {
     GroupElement -> Boolean [label=".isIdentity"]
     GroupElement -> Bytes [label=".nonce"]
       //todo remove compressed flag, use GroupElementSerializer
-    GroupElement -> Bytes [label=".getEncoded(compressed)" color=red]
+    GroupElement -> Bytes [label=".getEncoded" color=red]
 
     String -> Bytes [label="fromBase58(...)"]
     String -> Bytes [label="fromBase64(...)"]

diff --git a/docs/sigma-dsl.md b/docs/sigma-dsl.md
@@ -1,13 +1,15 @@
-# Sigma: Scala DSL for smart contracts with zero knowledge proof of knowledge  
+# SigmaDsl: Scala DSL for smart contracts with zero knowledge proof of knowledge  
 
 ## Intro
  SigmaDsl is a domain-specific language embedded into Scala and designed to be
  source code compatible with SigmaScript. This means you can write SigmaDsl
  code directly in Scala IDE (e.g. IntelliJ IDEA) and copy-paste code snippets
  between SigmaDsl and SigmaScript.
+ Special Scala macros can also be used to automatically translate SigmaDsl to 
+ Sigma byte code.
 
-SigmaDsl is implemented as a library in the framework of
-[Special](https://github.com/scalan/special)
+SigmaDsl is implemented as Scala library using [Special](https://github.com/scalan/special) 
+framework.
 
 ## See also
 [Special](https://github.com/scalan/special)
diff --git a/docs/soft-fork-log.md b/docs/soft-fork-log.md
@@ -0,0 +1,12 @@
+
+## A log of changes leading to soft-fork
+
+This list should be updated every time something soft-forkable is added.
+
+### Changes since 2.0
+
+ - new type (SGlobal.typeCode = 106)
+ - new method (SGlobal.groupGenerator.methodId = 1)
+ - new method (SAvlTree.updateDigest.methodId = 15)
+ - removed GroupElement.nonce (changed codes of getEncoded, exp, multiply, negate) 
+ - change in Coll.filter serialization format (removed tagged variable id, changed condition type)   
diff --git a/docs/spec/appendix_ergotree_serialization.tex b/docs/spec/appendix_ergotree_serialization.tex
@@ -0,0 +1,6 @@
+\section{Serialization format of ErgoTree nodes}
+\label{sec:appendix:ergotree_serialization}
+
+\mnote{These subsections are autogenerated from instrumented ValueSerializers}
+
+\input{generated/ergotree_serialization1.tex}
diff --git a/docs/spec/appendix_integer_encoding.tex b/docs/spec/appendix_integer_encoding.tex
@@ -0,0 +1,36 @@
+\section{Compressed encoding of integer values}
+
+\subsection{VLQ encoding}
+\label{sec:vlq-encoding}
+
+\begin{verbatim}
+public final void putULong(long value) {
+    while (true) {
+        if ((value & ~0x7FL) == 0) {
+            buffer[position++] = (byte) value;
+            return;
+        } else {
+            buffer[position++] = (byte) (((int) value & 0x7F) | 0x80);
+            value >>>= 7;
+        }
+    }
+}
+\end{verbatim}
+
+\subsection{ZigZag encoding}
+\label{sec:zigzag-encoding}
+
+Encode a ZigZag-encoded 64-bit value.  ZigZag encodes signed integers
+into values that can be efficiently encoded with varint.  (Otherwise,
+negative values must be sign-extended to 64 bits to be varint encoded,
+thus always taking 10 bytes in the buffer.
+
+Parameter \lst{n} is a signed 64-bit integer.
+This Java method returns an unsigned 64-bit integer, stored in a signed int because Java has no explicit unsigned support.
+
+\begin{verbatim}
+ public static long encodeZigZag64(final long n) {
+   // Note:  the right-shift must be arithmetic
+   return (n << 1) ^ (n >> 63);
+ }    
+\end{verbatim}
diff --git a/docs/spec/appendix_motivation.tex b/docs/spec/appendix_motivation.tex
@@ -0,0 +1,127 @@
+\section{Motivations}
+\label{sec:appendix:motivation}
+
+\subsection{Type Serialization format rationale}
+\label{sec:appendix:motivation:type}
+
+Some operations of \ASDag have type parameters, for which concrete types
+should be specified (since \ASDag is monomorphic IR). When the operation
+(such as \hyperref[sec:serialization:operation:ExtractRegisterAs]{\lst{ExtractRegisterAs}}) is serialized those types should also be
+serialized as part of operation. The following encoding is designed to
+minimize a number of bytes required to represent type in the serialization
+format of \ASDag.
+
+In most cases type term serialises into a single byte. In the intermediate
+representation of ErgoTree each type is represented by a tree of nodes where
+leaves are primitive types and other nodes are type constructors.
+Simple (but sub-optimal) way to serialize a type would be to give each
+primitive type and each type constructor a unique type code. Then, to
+serialize a node, we need to emit its code and then perform recursive descent
+to serialize all children. 
+However, to save storage space, we use special encoding schema to save bytes
+for the types that are used more often.
+
+We assume the most frequently used types are:
+\begin{itemize}
+    \item primitive types (\lst{Int, Byte, Boolean, BigInt, GroupElement,
+    Box, AvlTree})
+    \item  Collections of primitive types (\lst{Coll[Byte]} etc)
+    \item  Options of primitive types (\lst{Option[Int]} etc.)
+    \item Nested arrays of primitive types (\lst{Coll[Coll[Int]]} etc.)
+    \item Functions of primitive types (\lst{Box => Boolean} etc.)
+    \item First biased pair of types (\lst{(_, Int)} when we know the first
+    component is a primitive type).
+    \item Second biased pair of types (\lst{(Int, _)} when we know the second
+    component is a primitive type)
+    \item Symmetric pair of types (\lst{(Int, Int)} when we know both types are
+    the same)
+\end{itemize}
+
+All the types above should be represented in an optimized way (preferable by a single byte).
+For other types, we do recursive descent down the type tree as it is defined in section~\ref{sec:ser:type}
+
+\subsection{Constant Segregation rationale}
+
+\subsubsection{Massive script validation}
+
+Consider a transaction \lst{tx} which have \lst{INPUTS} collection of boxes to
+spend. Every input box can have a script protecting it (\lst{propostionBytes}
+property). This script should be executed in a context of the current
+transaction. The simplest transaction have 1 input box. Thus if we want to
+have a sustained block validation of 1000 transactions per second we need to
+be able to validate 1000 scripts per second.
+
+For every script (of input \lst{box}) the following is done in order to
+validate it:
+\begin{enumerate}
+    \item Context is created with \lst{SELF} = box
+    \item The script is deserialized into ErgoTree 
+    \item ErgoTree is traversed to build costGraph and calcGraph, two graphs for
+    cost estimation function and script calculation function.
+    \item Cost estimation is computed by evaluating costGraph with current context data
+    \item If cost and data size limits are not exceeded, calcGraph is
+    evaluated using context data to obtain sigma proposition (see
+    \hyperref[sec:type:SigmaProp]{\lst{SigmaProp}})
+    \item Verification procedure is executed
+\end{enumerate}
+
+\subsubsection{Potential for Script processing optimization}
+
+Before an \langname contract can be stored in a blockchain it should be first
+compiled from its source text into ErgoTree and then serialized into byte
+array.
+
+Because the language is purely functional and IR is graph-based, the
+compilation process has an effect of normalization/unification. This means
+that different original scripts may have identical ErgoTrees and as the
+result identical serialized bytes.
+
+Because of normalization, and also because of script reusability, the number
+of conceptually (or logically) different scripts is much less than the number
+of individual scripts in a blockchain. For example we may have 1000s of
+different scripts in a blockchain with millions of boxes.
+
+The average reusability ratio is 1000 in this case. And even those different
+scripts may have different usage frequency. Having big reusability ratio we
+can optimize script evaluation by performing steps 1 - 4 only once per unique
+script.
+
+The compiled calcGraph can be cached in \lst{Map[Array[Byte], Context =>
+SigmaBoolean]}. Every script extracted from an input box can be used as a key
+in this map to obtain ready to execute graph.
+
+However, we have a problem with constants embedded in contracts. There is one
+obstacle to the optimization by caching. In many cases it is very natural to
+embed constants in the script body, most notable scenario is when public keys
+are embedded. As result two functionally identical scripts may serialize to
+different byte arrays because they have different embedded constants.
+
+\subsubsection{Constant-less ErgoTree}
+
+The solution to the problem with embedded constants is simple, we don't need
+to embed constants. Each constant in the body of \ASDag can be replaced
+with indexed placeholder (see \hyperref[sec:appendix:primops:ConstantPlaceholder]{\lst{ConstantPlaceholder}}).
+Each placeholder have an index field. The index of the placeholder is
+assigned by breadth-first topological order of the graph traversal.
+
+The transformation is part of compilation and is performed ahead of time.
+Each \ASDag have an array of all the constants extracted from its body. Each
+placeholder refers to the constant by the constant's index in the array.
+
+Thus the format of serialized script is shown in Figure~\ref{fig:ser:ergotree} which contains:
+\begin{enumerate}
+    \item number of constants
+    \item constants collection
+    \item script expression with placeholders
+\end{enumerate}
+
+The constants collection contains serialized constant data (using
+ConstantSerializer) one after another.
+The script expression is a serialized ErgoTree with placeholders.
+
+Using this new script format we can use script expression part as a key in
+the cache. An observation is that after the constants are extracted, what
+remains is a template. Thus instead of applying steps 1-4 to
+\emph{constant-full} scripts we can apply them to \emph{constant-less}
+templates. Before applying steps 4 and 5 we need to bind placeholders with
+actual values taken from the cconstants collection.