Skip to content

Commit

Permalink
Fix #599: [Pdr bot, sim] Implement two-sided prediction (PR #670)
Browse files Browse the repository at this point in the history
Fix #599:
- Make Aimodel do classification (not regression), where classifier outputs probabilities of up vs down
- In sim_engine predictoor: model two-sided prediction
- In sim_engine trader: use up/down confidence level to decide amt of $ to put in
- In sim_engine trader: sell when predict down, then buy back
- In sim_engine: add contour plots to visualize models, when 2 input vars (and hide plot when not 2 vars)
- In predictoor_agent approach 1: stake up/down as 50/50
- In predictoor_agent approach 3: stake up=prob_up*stake_amt, down=(1-prob_up)*stake_amt
- rename predictoor approach 3 --> approach 2
- predictoor.md README updated
  • Loading branch information
trentmc authored Feb 26, 2024
1 parent 9c73f6a commit e4d4ca2
Show file tree
Hide file tree
Showing 43 changed files with 1,412 additions and 1,087 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ OPF-run bots & higher-level tools:
- `publisher` - publish pdr data feeds
- `analytics` - analytics tools
- `payout` - OCEAN & ROSE payout
- `deployer` - deployer tool
- `accuracy` - calculates % correct, for display in predictoor.ai webapp

Mid-level building blocks:
Expand Down
45 changes: 25 additions & 20 deletions READMEs/predictoor.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,14 @@ What it does:
1. Grab historical price data from exchanges and stores in `parquet_data/` dir. It re-uses any previously saved data.
1. Run through many 5min epochs. At each epoch:
- Build a model
- Predict up/down
- Predict
- Trade
- Plot total profit versus time, and more
- Log to console, and to `logs/out_<time>.txt`
- Plot profit versus time, more
- Log to console and `logs/out_<time>.txt`

"Predict" actions are _two-sided_: it does one "up" prediction tx, and one "down" tx, with more stake to the higher-confidence direction. Two-sided is more profitable than one-sided prediction.

The baseline settings use a linear model inputting prices of the previous 10 epochs as inputs (autoregressive_n = 10), just BTC close price as input, a simulated 0% trading fee, and a trading strategy of "buy if predict up; sell 5min later". You can play with different values in `my_ppss.yaml`.
By default, simulation uses a linear model inputting prices of the previous 2-10 epochs as inputs (autoregressive_n), just BTC close price as input, a simulated 0% trading fee, and a trading strategy of "buy if predict up; sell 5min later". You can play with different values in `my_ppss.yaml`.

Profit isn't guaranteed: fees, slippage and more eats into them. Model accuracy makes a big difference too.

Expand All @@ -80,27 +82,30 @@ Simulation uses Python [logging](https://docs.python.org/3/howto/logging.html) f

## 3. Run Predictoor Bot on Sapphire Testnet

Predictoor contracts run on [Oasis Sapphire](https://docs.oasis.io/dapp/sapphire/) testnet and mainnet. Sapphire is a privacy-preserving EVM-compatible L1 chain.
Predictoor contracts run on [Oasis Sapphire](https://docs.oasis.io/dapp/sapphire/) testnet and mainnet. Sapphire is a privacy-preserving EVM-compatible L1 chain.

Let's get our bot running on testnet first.
Let's get our predictoor bot running on testnet first.

First, tokens! You need (fake) ROSE to pay for gas, and (fake) OCEAN to stake and earn. [Get them here](testnet-faucet.md).
The bot does two-sided predictions, like in simulation. This also means it needs two Ethereum accounts, with keys PRIVATE_KEY and PRIVATE_KEY2.

Then, copy & paste your private key as an envvar. In console:
First, tokens! You need (fake) ROSE to pay for gas, and (fake) OCEAN to stake and earn, for both accounts. [Get them here](testnet-faucet.md).

Then, copy & paste your private keys as envvars. In console:
```console
export PRIVATE_KEY=<YOUR_PRIVATE_KEY>
export PRIVATE_KEY=<YOUR_PRIVATE_KEY 1>
export PRIVATE_KEY2=<YOUR_PRIVATE_KEY 2>
```

Update `my_ppss.yaml` as desired.

Then, run a bot with modeling-on-the fly (approach 3). In console:
Then, run a bot with modeling-on-the fly (approach 2). In console:
```console
pdr predictoor 3 my_ppss.yaml sapphire-testnet
pdr predictoor 2 my_ppss.yaml sapphire-testnet
```

Your bot is running, congrats! Sit back and watch it in action. It will loop continuously.

At every 5m/1h epoch, it builds & submits >1 times, to maximize accuracy without missing submission deadlines. Specifically: 60 s before predictions are due, it builds a model then submits a prediction. It repeats this until the deadline.
At every 5m/1h epoch, it builds & submits >1 times, to maximize accuracy without missing submission deadlines. Specifically: 60 s before predictions are due, it builds a model then prediction txs for up and for down (with stake for each). It repeats this until the deadline.

It logs to console, and to `logs/out_<time>.txt`. Like simulation, it uses Python logging framework, configurable in `logging.yaml`.

Expand All @@ -111,25 +116,26 @@ The CLI has support tools too. Learn about each via:
- `pdr get_predictions_info -h`
- and more yet; type `pdr -h` to see

You can track behavior at finer resolution by writing more logs to the [code](../pdr_backend/predictoor/approach3/predictoor_agent3.py), or [querying Predictoor subgraph](subgraph.md).
You can track behavior at finer resolution by writing more logs to the [code](../pdr_backend/predictoor/predictoor_agent.py), or [querying Predictoor subgraph](subgraph.md).


## 4. Run Predictoor Bot on Sapphire Mainnet

Time to make it real: let's get our bot running on Sapphire _mainnet_.

First, real tokens! Get [ROSE via this guide](get-rose-on-sapphire.md) and [OCEAN via this guide](get-ocean-on-sapphire.md).
First, real tokens! Get [ROSE via this guide](get-rose-on-sapphire.md) and [OCEAN via this guide](get-ocean-on-sapphire.md), for each of your two accounts.

Then, copy & paste your private key as an envvar. (You can skip this if it's same as testnet.) In console:
Then, copy & paste your private keys as envvars. (You can skip this if keys are same as testnet.) In console:
```console
export PRIVATE_KEY=<YOUR_PRIVATE_KEY>
export PRIVATE_KEY=<YOUR_PRIVATE_KEY 1>
export PRIVATE_KEY2=<YOUR_PRIVATE_KEY 2>
```

Update `my_ppss.yaml` as desired.

Then, run the bot. In console:
```console
pdr predictoor 3 my_ppss.yaml sapphire-mainnet
pdr predictoor 2 my_ppss.yaml sapphire-mainnet
```

This is where there's real $ at stake. Good luck!
Expand All @@ -142,7 +148,6 @@ When running predictoors on mainnet, you have the potential to earn $.

**[Here](payout.md)** are instructions to claim your earnings.


Congrats! You've gone through all the essential steps to earn $ by running a predictoor bot on mainnet.

The next sections describe how to go beyond, by optimizing the model and more.
Expand All @@ -151,7 +156,7 @@ The next sections describe how to go beyond, by optimizing the model and more.

The idea: make your own model, tuned for accuracy, which in turn will optimize it for $. Here's how:
1. Fork `pdr-backend` repo.
1. Change predictoor approach3 modeling code as you wish, while iterating with simulation.
1. Change predictoor approach 2 modeling code as you wish, while iterating with simulation.
1. Bring your model as a Predictoor bot to testnet then mainnet.

# Right-size staking
Expand Down Expand Up @@ -186,7 +191,7 @@ deployment_configs:
memory: '512Mi'
source: "binance"
type: "predictoor"
approach: 3
approach: 2
network: "sapphire-testnet"
s_until_epoch_end: 20
pdr_backend_image_source: "oceanprotocol/pdr-backend:latest"
Expand Down
10 changes: 6 additions & 4 deletions READMEs/trader.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,12 +58,14 @@ What it does:
1. Grab historical price data from exchanges and stores in `parquet_data/` dir. It re-uses any previously saved data.
1. Run through many 5min epochs. At each epoch:
- Build a model
- Predict up/down
- Predict
- Trade
- Plot total profit versus time, and more
- Log to console, and to `logs/out_<time>.txt`
- Plot profit versus time, more
- Log to console and `logs/out_<time>.txt`

"Predict" actions are _two-sided_: it does one "up" prediction tx, and one "down" tx, with more stake to the higher-confidence direction. Two-sided is more profitable than one-sided prediction.

The baseline settings use a linear model inputting prices of the previous 10 epochs as inputs, a simulated 0% trading fee, and a trading strategy of "buy if predict up; sell 5min later". You can play with different values in `my_ppss.yaml`.
By default, simulation uses a linear model inputting prices of the previous 2-10 epochs as inputs (autoregressive_n), just BTC close price as input, a simulated 0% trading fee, and a trading strategy of "buy if predict up; sell 5min later". You can play with different values in `my_ppss.yaml`.

Profit isn't guaranteed: fees, slippage and more eats into them. Model accuracy makes a big difference too.

Expand Down
49 changes: 49 additions & 0 deletions pdr_backend/aimodel/aimodel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
from typing import Tuple

from enforce_typing import enforce_types
import numpy as np


@enforce_types
class Aimodel:

def __init__(self, skm, scaler):
self._skm = skm # sklearn model
self._scaler = scaler

def predict_true(self, X):
"""
@description
Classify each input sample, with lower fidelity: just True vs False
@arguments
X -- 2d array of [sample_i, var_i]:cont_value -- model inputs
@return
ytrue -- 1d array of [sample_i]:bool_value -- classifier model outputs
"""
# We explicitly don't call skm.predict() here, because it's
# inconsistent with predict_proba() for svc and maybe others.
# Rather, draw on the probability output to guarantee consistency.
yptrue = self.predict_ptrue(X)
print(f"in predict_true(); yptrue[:10] = {yptrue[:10]}")
ytrue = yptrue > 0.5
return ytrue

def predict_ptrue(self, X: np.ndarray) -> np.ndarray:
"""
@description
Classify each input sample, with higher fidelity: prob of being True
@arguments
X -- 2d array of [sample_i, var_i]:cont_value -- model inputs
@return
yptrue - 1d array of [sample_i]: prob_of_being_true -- model outputs
"""
X = self._scaler.transform(X)
T = self._skm.predict_proba(X) # [sample_i][class_i]
N = T.shape[0]
class_i = 1 # this is the class for "True"
yptrue = np.array([T[i, class_i] for i in range(N)])
return yptrue
27 changes: 24 additions & 3 deletions pdr_backend/aimodel/aimodel_data_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,22 +45,42 @@ class AimodelDataFactory:
def __init__(self, ss: PredictoorSS):
self.ss = ss

@staticmethod
def ycont_to_ytrue(ycont: np.ndarray, y_thr: float) -> np.ndarray:
"""
@description
Convert regression y (ycont) to classifier y (ybool).
@arguments
ycont -- 1d array of [sample_i]:cont_value -- regression model outputs
y_thr -- classify to True if ycont >= this threshold
@return
ybool -- 1d array of [sample_i]:bool_value -- classifier model outputs
"""
ybool = np.array([ycont_val >= y_thr for ycont_val in ycont])
return ybool

def create_xy(
self,
mergedohlcv_df: pl.DataFrame,
testshift: int,
do_fill_nans: bool = True,
) -> Tuple[np.ndarray, np.ndarray, pd.DataFrame, np.ndarray]:
"""
@description
Create X, y data for a regression setting
For y in a classification setting, call ycont_to_ytrue() after.
@arguments
mergedohlcv_df -- *polars* DataFrame. See class docstring
testshift -- to simulate across historical test data
do_fill_nans -- if any values are nan, fill them? (Via interpolation)
If you turn this off and mergedohlcv_df has nans, then X/y/etc gets nans
@return
X -- 2d array of [sample_i, var_i]:value -- inputs for model
y -- 1d array of [sample_i]:value -- target outputs for model
X -- 2d array of [sample_i, var_i]:cont_value -- model inputs
ycont -- 1d array of [sample_i]:cont_value -- regression model outputs
x_df -- *pandas* DataFrame. See class docstring.
xrecent -- [var_i]:value -- most recent X value. Bots use to predict
"""
Expand Down Expand Up @@ -132,7 +152,8 @@ def create_xy(
assert "datetime" not in x_df.columns

# return
return X, y, x_df, xrecent
ycont = y
return X, ycont, x_df, xrecent


@enforce_types
Expand Down
52 changes: 31 additions & 21 deletions pdr_backend/aimodel/aimodel_factory.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
from enforce_typing import enforce_types
from sklearn import linear_model, svm
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.linear_model import LogisticRegression
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

from pdr_backend.aimodel.aimodel import Aimodel
from pdr_backend.ppss.aimodel_ss import AimodelSS


Expand All @@ -11,23 +13,31 @@ class AimodelFactory:
def __init__(self, aimodel_ss: AimodelSS):
self.aimodel_ss = aimodel_ss

def build(self, X_train, y_train):
model = self._model()
model.fit(X_train, y_train)
return model
def build(self, X: np.ndarray, ybool: np.ndarray) -> Aimodel:
"""
@description
Train the model
@arguments
X -- 2d array of [sample_i, var_i]:cont_value -- model inputs
ybool -- 1d array of [sample_i]:bool_value -- classifier model outputs
def _model(self):
@return
model -- Aimodel
"""
a = self.aimodel_ss.approach
if a == "LIN":
return linear_model.LinearRegression()
if a == "GPR":
kernel = 1.0 * RBF(length_scale=1e1, length_scale_bounds=(1e-2, 1e3))
return GaussianProcessRegressor(kernel=kernel, alpha=0.0)
if a == "SVR":
return svm.SVR()
if a == "NuSVR":
return svm.NuSVR()
if a == "LinearSVR":
return svm.LinearSVR()

raise ValueError(a)
if a == "LinearLogistic":
skm = LogisticRegression()
elif a == "LinearSVC":
skm = SVC(kernel="linear", probability=True, C=0.025)
else:
raise ValueError(a)

scaler = StandardScaler()
scaler.fit(X)

X = scaler.transform(X)
skm.fit(X, ybool)

model = Aimodel(skm, scaler)
return model
Loading

0 comments on commit e4d4ca2

Please sign in to comment.