Fix #599: [Pdr bot, sim] Implement two-sided prediction (PR #670)

Fix #599: - Make Aimodel do classification (not regression), where classifier outputs probabilities of up vs down - In sim_engine predictoor: model two-sided prediction - In sim_engine trader: use up/down confidence level to decide amt of $ to put in - In sim_engine trader: sell when predict down, then buy back - In sim_engine: add contour plots to visualize models, when 2 input vars (and hide plot when not 2 vars) - In predictoor_agent approach 1: stake up/down as 50/50 - In predictoor_agent approach 3: stake up=prob_up*stake_amt, down=(1-prob_up)*stake_amt - rename predictoor approach 3 --> approach 2 - predictoor.md README updated
oceanprotocol · Feb 26, 2024 · e4d4ca2 · e4d4ca2
1 parent 9c73f6a
commit e4d4ca2
Show file tree

Hide file tree

Showing 43 changed files with 1,412 additions and 1,087 deletions.
diff --git a/README.md b/README.md
@@ -81,6 +81,7 @@ OPF-run bots & higher-level tools:
 - `publisher` - publish pdr data feeds
 - `analytics` - analytics tools
 - `payout` - OCEAN & ROSE payout
+- `deployer` - deployer tool
 - `accuracy` - calculates % correct, for display in predictoor.ai webapp
 
 Mid-level building blocks:

diff --git a/READMEs/predictoor.md b/READMEs/predictoor.md
@@ -65,12 +65,14 @@ What it does:
 1. Grab historical price data from exchanges and stores in `parquet_data/` dir. It re-uses any previously saved data.
 1. Run through many 5min epochs. At each epoch:
    - Build a model
-   - Predict up/down
+   - Predict
    - Trade
-   - Plot total profit versus time, and more
-   - Log to console, and to `logs/out_<time>.txt`
+   - Plot profit versus time, more
+   - Log to console and `logs/out_<time>.txt`
+
+"Predict" actions are _two-sided_: it does one "up" prediction tx, and one "down" tx, with more stake to the higher-confidence direction. Two-sided is more profitable than one-sided prediction.
 
-The baseline settings use a linear model inputting prices of the previous 10 epochs as inputs (autoregressive_n = 10), just BTC close price as input, a simulated 0% trading fee, and a trading strategy of "buy if predict up; sell 5min later". You can play with different values in `my_ppss.yaml`.
+By default, simulation uses a linear model inputting prices of the previous 2-10 epochs as inputs (autoregressive_n), just BTC close price as input, a simulated 0% trading fee, and a trading strategy of "buy if predict up; sell 5min later". You can play with different values in `my_ppss.yaml`.
 
 Profit isn't guaranteed: fees, slippage and more eats into them. Model accuracy makes a big difference too.
 
@@ -80,27 +82,30 @@ Simulation uses Python [logging](https://docs.python.org/3/howto/logging.html) f
 
 ## 3. Run Predictoor Bot on Sapphire Testnet
 
-Predictoor contracts run on [Oasis Sapphire](https://docs.oasis.io/dapp/sapphire/) testnet and mainnet. Sapphire is a privacy-preserving EVM-compatible L1 chain.
+Predictoor contracts run on [Oasis Sapphire](https://docs.oasis.io/dapp/sapphire/) testnet and mainnet. Sapphire is a privacy-preserving EVM-compatible L1 chain. 
 
-Let's get our bot running on testnet first.
+Let's get our predictoor bot running on testnet first.
 
-First, tokens! You need (fake) ROSE to pay for gas, and (fake) OCEAN to stake and earn. [Get them here](testnet-faucet.md).
+The bot does two-sided predictions, like in simulation. This also means it needs two Ethereum accounts, with keys PRIVATE_KEY and PRIVATE_KEY2.
 
-Then, copy & paste your private key as an envvar. In console:
+First, tokens! You need (fake) ROSE to pay for gas, and (fake) OCEAN to stake and earn, for both accounts. [Get them here](testnet-faucet.md).
+
+Then, copy & paste your private keys as envvars. In console:
 ```console
-export PRIVATE_KEY=<YOUR_PRIVATE_KEY>
+export PRIVATE_KEY=<YOUR_PRIVATE_KEY 1>
+export PRIVATE_KEY2=<YOUR_PRIVATE_KEY 2>
 ```
 
 Update `my_ppss.yaml` as desired.
 
-Then, run a bot with modeling-on-the fly (approach 3). In console:
+Then, run a bot with modeling-on-the fly (approach 2). In console:
 ```console
-pdr predictoor 3 my_ppss.yaml sapphire-testnet
+pdr predictoor 2 my_ppss.yaml sapphire-testnet
 ```
 
 Your bot is running, congrats! Sit back and watch it in action. It will loop continuously.
 
-At every 5m/1h epoch, it builds & submits >1 times, to maximize accuracy without missing submission deadlines. Specifically: 60 s before predictions are due, it builds a model then submits a prediction. It repeats this until the deadline.
+At every 5m/1h epoch, it builds & submits >1 times, to maximize accuracy without missing submission deadlines. Specifically: 60 s before predictions are due, it builds a model then prediction txs for up and for down (with stake for each). It repeats this until the deadline.
 
 It logs to console, and to `logs/out_<time>.txt`. Like simulation, it uses Python logging framework, configurable in `logging.yaml`.
 
@@ -111,25 +116,26 @@ The CLI has support tools too. Learn about each via:
 - `pdr get_predictions_info -h`
 - and more yet; type `pdr -h` to see
 
-You can track behavior at finer resolution by writing more logs to the [code](../pdr_backend/predictoor/approach3/predictoor_agent3.py), or [querying Predictoor subgraph](subgraph.md).
+You can track behavior at finer resolution by writing more logs to the [code](../pdr_backend/predictoor/predictoor_agent.py), or [querying Predictoor subgraph](subgraph.md).
 
 
 ## 4. Run Predictoor Bot on Sapphire Mainnet
 
 Time to make it real: let's get our bot running on Sapphire _mainnet_.
 
-First, real tokens! Get [ROSE via this guide](get-rose-on-sapphire.md) and [OCEAN via this guide](get-ocean-on-sapphire.md).
+First, real tokens! Get [ROSE via this guide](get-rose-on-sapphire.md) and [OCEAN via this guide](get-ocean-on-sapphire.md), for each of your two accounts.
 
-Then, copy & paste your private key as an envvar. (You can skip this if it's same as testnet.) In console:
+Then, copy & paste your private keys as envvars. (You can skip this if keys are same as testnet.) In console:
 ```console
-export PRIVATE_KEY=<YOUR_PRIVATE_KEY>
+export PRIVATE_KEY=<YOUR_PRIVATE_KEY 1>
+export PRIVATE_KEY2=<YOUR_PRIVATE_KEY 2>
 ```
 
 Update `my_ppss.yaml` as desired.
 
 Then, run the bot. In console:
 ```console
-pdr predictoor 3 my_ppss.yaml sapphire-mainnet
+pdr predictoor 2 my_ppss.yaml sapphire-mainnet
 ```
 
 This is where there's real $ at stake. Good luck!
@@ -142,7 +148,6 @@ When running predictoors on mainnet, you have the potential to earn $.
 
 **[Here](payout.md)** are instructions to claim your earnings.
 
-
 Congrats! You've gone through all the essential steps to earn $ by running a predictoor bot on mainnet.
 
 The next sections describe how to go beyond, by optimizing the model and more.
@@ -151,7 +156,7 @@ The next sections describe how to go beyond, by optimizing the model and more.
 
 The idea: make your own model, tuned for accuracy, which in turn will optimize it for $. Here's how:
 1. Fork `pdr-backend` repo.
-1. Change predictoor approach3 modeling code as you wish, while iterating with simulation.
+1. Change predictoor approach 2 modeling code as you wish, while iterating with simulation.
 1. Bring your model as a Predictoor bot to testnet then mainnet.
 
 # Right-size staking
@@ -186,7 +191,7 @@ deployment_configs:
     memory: '512Mi'
     source: "binance"
     type: "predictoor"
-    approach: 3
+    approach: 2
     network: "sapphire-testnet"
     s_until_epoch_end: 20
     pdr_backend_image_source: "oceanprotocol/pdr-backend:latest"

diff --git a/READMEs/trader.md b/READMEs/trader.md
@@ -58,12 +58,14 @@ What it does:
 1. Grab historical price data from exchanges and stores in `parquet_data/` dir. It re-uses any previously saved data.
 1. Run through many 5min epochs. At each epoch:
    - Build a model
-   - Predict up/down
+   - Predict
    - Trade
-   - Plot total profit versus time, and more
-   - Log to console, and to `logs/out_<time>.txt`
+   - Plot profit versus time, more
+   - Log to console and `logs/out_<time>.txt`
+
+"Predict" actions are _two-sided_: it does one "up" prediction tx, and one "down" tx, with more stake to the higher-confidence direction. Two-sided is more profitable than one-sided prediction.
 
-The baseline settings use a linear model inputting prices of the previous 10 epochs as inputs, a simulated 0% trading fee, and a trading strategy of "buy if predict up; sell 5min later". You can play with different values in `my_ppss.yaml`.
+By default, simulation uses a linear model inputting prices of the previous 2-10 epochs as inputs (autoregressive_n), just BTC close price as input, a simulated 0% trading fee, and a trading strategy of "buy if predict up; sell 5min later". You can play with different values in `my_ppss.yaml`.
 
 Profit isn't guaranteed: fees, slippage and more eats into them. Model accuracy makes a big difference too.
 

diff --git a/pdr_backend/aimodel/aimodel.py b/pdr_backend/aimodel/aimodel.py
@@ -0,0 +1,49 @@
+from typing import Tuple
+
+from enforce_typing import enforce_types
+import numpy as np
+
+
+@enforce_types
+class Aimodel:
+
+    def __init__(self, skm, scaler):
+        self._skm = skm  # sklearn model
+        self._scaler = scaler
+
+    def predict_true(self, X):
+        """
+        @description
+          Classify each input sample, with lower fidelity: just True vs False
+
+        @arguments
+          X -- 2d array of [sample_i, var_i]:cont_value -- model inputs
+
+        @return
+          ytrue -- 1d array of [sample_i]:bool_value -- classifier model outputs
+        """
+        # We explicitly don't call skm.predict() here, because it's
+        #   inconsistent with predict_proba() for svc and maybe others.
+        # Rather, draw on the probability output to guarantee consistency.
+        yptrue = self.predict_ptrue(X)
+        print(f"in predict_true(); yptrue[:10] = {yptrue[:10]}")
+        ytrue = yptrue > 0.5
+        return ytrue
+
+    def predict_ptrue(self, X: np.ndarray) -> np.ndarray:
+        """
+        @description
+          Classify each input sample, with higher fidelity: prob of being True
+
+        @arguments
+          X -- 2d array of [sample_i, var_i]:cont_value -- model inputs
+
+        @return
+          yptrue - 1d array of [sample_i]: prob_of_being_true -- model outputs
+        """
+        X = self._scaler.transform(X)
+        T = self._skm.predict_proba(X)  # [sample_i][class_i]
+        N = T.shape[0]
+        class_i = 1  # this is the class for "True"
+        yptrue = np.array([T[i, class_i] for i in range(N)])
+        return yptrue
diff --git a/pdr_backend/aimodel/aimodel_data_factory.py b/pdr_backend/aimodel/aimodel_data_factory.py
@@ -45,22 +45,42 @@ class AimodelDataFactory:
     def __init__(self, ss: PredictoorSS):
         self.ss = ss
 
+    @staticmethod
+    def ycont_to_ytrue(ycont: np.ndarray, y_thr: float) -> np.ndarray:
+        """
+        @description
+          Convert regression y (ycont) to classifier y (ybool).
+
+        @arguments
+          ycont -- 1d array of [sample_i]:cont_value -- regression model outputs
+          y_thr -- classify to True if ycont >= this threshold
+
+        @return
+          ybool -- 1d array of [sample_i]:bool_value -- classifier model outputs
+        """
+        ybool = np.array([ycont_val >= y_thr for ycont_val in ycont])
+        return ybool
+
     def create_xy(
         self,
         mergedohlcv_df: pl.DataFrame,
         testshift: int,
         do_fill_nans: bool = True,
     ) -> Tuple[np.ndarray, np.ndarray, pd.DataFrame, np.ndarray]:
         """
+        @description
+          Create X, y data for a regression setting
+          For y in a classification setting, call ycont_to_ytrue() after.
+
         @arguments
           mergedohlcv_df -- *polars* DataFrame. See class docstring
           testshift -- to simulate across historical test data
           do_fill_nans -- if any values are nan, fill them? (Via interpolation)
             If you turn this off and mergedohlcv_df has nans, then X/y/etc gets nans
 
         @return
-          X -- 2d array of [sample_i, var_i]:value -- inputs for model
-          y -- 1d array of [sample_i]:value -- target outputs for model
+          X -- 2d array of [sample_i, var_i]:cont_value -- model inputs
+          ycont -- 1d array of [sample_i]:cont_value -- regression model outputs
           x_df -- *pandas* DataFrame. See class docstring.
           xrecent -- [var_i]:value -- most recent X value. Bots use to predict
         """
@@ -132,7 +152,8 @@ def create_xy(
         assert "datetime" not in x_df.columns
 
         # return
-        return X, y, x_df, xrecent
+        ycont = y
+        return X, ycont, x_df, xrecent
 
 
 @enforce_types

diff --git a/pdr_backend/aimodel/aimodel_factory.py b/pdr_backend/aimodel/aimodel_factory.py
@@ -1,8 +1,10 @@
 from enforce_typing import enforce_types
-from sklearn import linear_model, svm
-from sklearn.gaussian_process import GaussianProcessRegressor
-from sklearn.gaussian_process.kernels import RBF
+from sklearn.linear_model import LogisticRegression
+import numpy as np
+from sklearn.preprocessing import StandardScaler
+from sklearn.svm import SVC
 
+from pdr_backend.aimodel.aimodel import Aimodel
 from pdr_backend.ppss.aimodel_ss import AimodelSS
 
 
@@ -11,23 +13,31 @@ class AimodelFactory:
     def __init__(self, aimodel_ss: AimodelSS):
         self.aimodel_ss = aimodel_ss
 
-    def build(self, X_train, y_train):
-        model = self._model()
-        model.fit(X_train, y_train)
-        return model
+    def build(self, X: np.ndarray, ybool: np.ndarray) -> Aimodel:
+        """
+        @description
+          Train the model
+
+        @arguments
+          X -- 2d array of [sample_i, var_i]:cont_value -- model inputs
+          ybool -- 1d array of [sample_i]:bool_value -- classifier model outputs
 
-    def _model(self):
+        @return
+          model -- Aimodel
+        """
         a = self.aimodel_ss.approach
-        if a == "LIN":
-            return linear_model.LinearRegression()
-        if a == "GPR":
-            kernel = 1.0 * RBF(length_scale=1e1, length_scale_bounds=(1e-2, 1e3))
-            return GaussianProcessRegressor(kernel=kernel, alpha=0.0)
-        if a == "SVR":
-            return svm.SVR()
-        if a == "NuSVR":
-            return svm.NuSVR()
-        if a == "LinearSVR":
-            return svm.LinearSVR()
-
-        raise ValueError(a)
+        if a == "LinearLogistic":
+            skm = LogisticRegression()
+        elif a == "LinearSVC":
+            skm = SVC(kernel="linear", probability=True, C=0.025)
+        else:
+            raise ValueError(a)
+
+        scaler = StandardScaler()
+        scaler.fit(X)
+
+        X = scaler.transform(X)
+        skm.fit(X, ybool)
+
+        model = Aimodel(skm, scaler)
+        return model