Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_train infinite loop? #2

Open
flamby opened this issue Apr 4, 2018 · 6 comments
Open

run_train infinite loop? #2

flamby opened this issue Apr 4, 2018 · 6 comments

Comments

@flamby
Copy link

flamby commented Apr 4, 2018

Hi,

First of all, congrats for this project, it appears to be very promising.

I ran run_train like this ./run_train.py --target=low BTC_ETH --period=day and 2 days later, it's still running with around 77 _zoo/BTC_ETH sub folders, with only LinearModel.

Could it be the reason that the training is still ongoing? i.e. in an attempt to find other models with good results, without success?
I did not find where to configure the limit.

Thanks and keep the good work!

@flamby
Copy link
Author

flamby commented Apr 5, 2018

silly me. i did not see the while True
but anyway, no other model than LinearModel are stored...

@maxim5
Copy link
Owner

maxim5 commented Apr 5, 2018

Hi @flamby

First of all, you can examine the logs and check the best scores that RNN or DNN models are showing (by the way there's also verbose logging, but as far as I remember I didn't do a command-line option for it). Probably they are much worse, which means that there is too much noise in the chart and very little signal. In this case, the simplest possible model will often win. There is a small chance that tweaking the ranges of hyper-parameters will help, but unfortunately there's not much you can do, other than choosing a different period/target.

I have seen pairs like and in my opinion the best thing to do is to select another pair, which demonstrates more vivid pattern in the data, thus easier to train. Luckily there is a big choice.

@flamby
Copy link
Author

flamby commented Apr 5, 2018

Hi @maxim5

Thanks for the clarification.

I like the idea of having an ensemble of different ML algorithm, instead of an ensemble of 5 LinearModel.

So I came to this run_predict change. Instead of one infinite while with all models, i have now 4 consecutives (xgboost is not getting any result, need to dig into this) run of models (w/ capped iteration each). Then I do manual selection based on Sign accuracy. It could be automated also I guess.

def main():
  tickers, periods, targets = util.parse_command_line(default_periods=['day'],
                                                      default_targets=['high'])
  # Change me
  _models = [
             {'func': iterate_linear, 'max_iteration': 2},
             {'func': iterate_neural, 'max_iteration': 2},
             {'func': iterate_rnn,    'max_iteration': 1},
             {'func': iterate_cnn,    'max_iteration': 1}
             # {'func': iterate_xgb,    'max_iteration': 1}
            ]

  for _model in _models:
    i = 0
    while i < _model['max_iteration']:
      for ticker in tickers:
        for period in periods:
          BASE_DIR = "_zoo/%s_%s/" % (ticker, period)
          for target in targets:
            job_info = JobInfo('_data', '_zoo', name='%s_%s' % (ticker, period), target=target)
            job_runner = JobRunner(job_info, limit=np.median)
            _model['func'](job_info, job_runner)
            job_runner.print_result()
            i += 1
          TEMP_DIR = os.path.join(BASE_DIR, _model['func'].__name__)
          os.makedirs(TEMP_DIR, exist_ok=True)
          to_move = [os.path.join(BASE_DIR, d) for d in os.listdir(BASE_DIR) 
                     if os.path.isdir(os.path.join(BASE_DIR, d))
                     and (d.startswith("low_") or d.startswith("high_"))]
          print("** Moving %s models to %s directory for manual selection **" 
                % (_model['func'].__name__, TEMP_DIR))
          for d in to_move:
            shutil.move(d, TEMP_DIR)

It moves all stored models into their respective directory, named after the function used (iterate_linear, iterate_neural, etc.)

The rational came to me when reading François Chollet (Keras author) "Deep Learning with Python" book : you should ensemble models that are as good as possible while being as different as possible

Regarding pair w/ less noise, I found ETC_ETH having good patterns for ML.

I'll try to plug WaveNet model into your code as it seems to be very performant to detect patterns. I'll need at least 2 other exchanges connector I guess for that.

New activation functions like SWISH could also help.

@maxim5
Copy link
Owner

maxim5 commented Apr 12, 2018

Hi @flamby

I see what you mean. What I usually do is let all models to learn and then drop the similar ones from the ensemble. Note that combining models with different hyperparameters (e.g., k - the window size, a very important parameter) is actually fine and it doesn't contradict François's idea. But I like your approach as well.

My concern was specifically about a situation when the linear model performs much better than any other complex model: this usually indicates that only a simple inference is possible, due to the nature of the data. It would be interesting to know if your approach leads to significant improvement over a linear model alone.

Can you share how you'd like to plug the wave net in?

@bautroibaola
Copy link

bautroibaola commented Sep 7, 2018

Dear maxim,

I also have same concern with flamby:

Because in run_train.py has while True loop, so how can I know when I should stop the training and move to run_predict.py? And need I run train every time before run predict?

Thanks for your hard work!

@maxim5
Copy link
Owner

maxim5 commented Sep 8, 2018

@bautroibaola This is a common problem in ML: there is no way to tell the model is ready. What people usually do is train it as long as they have time and simply take the best models. That's why there's an endless loop. However, feel free to replace it with some limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants