Official code repository for the paper "Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain". Check out our paper for more details. Accompanying datasets can be found here.
Install the required packages.
Torch experiments:
pip install -r requirements/requirements-pytorch.txt
statsforecast experiments:
pip install -r requirements/requirements-stats.txt
Easily load and access the dataset from Hugging Face Hub:
from datasets import load_dataset
ds = load_dataset(
"Salesforce/cloudops_tsf",
"azure_vm_traces_2017", # "borg_cluster_data_2011", "alibaba_cluster_trace_2018"
split=None, # "train_test", "pretrain"
)
We use Hydra for config management.
Run the hyperparameter tuning script:
python -m benchmark.benchmark_exp model_name=MODEL_NAME dataset_name=DATASET
- where
MODEL_SIZE
is one of:TemporalFusionTransformer
,Autoformer
,FEDformer
,NSTransformer
,PatchTST
,LinearFamily
,DeepTime
,TimeGrad
, orDeepVAR
. DATASET
is one ofazure_vm_traces_2017
,borg_cluster_data_2011
, oralibaba_cluster_trace_2018
.
After hyperparameter tuning, run the test script:
python -m benchmark.benchmark_exp model_name=MODEL_NAME dataset_name=DATASET test=true
- where
MODEL_SIZE
is one of:TemporalFusionTransformer
,Autoformer
,FEDformer
,NSTransformer
,PatchTST
,LinearFamily
,DeepTime
,TimeGrad
, orDeepVAR
. DATASET
is one ofazure_vm_traces_2017
,borg_cluster_data_2011
, oralibaba_cluster_trace_2018
.- training logs and checkpoints will be saved in
outputs/benchmark_exp
python -m benchmark.stats_exp DATASET --models MODEL_1 MODEL_2
DATASET
is one ofazure_vm_traces_2017
,borg_cluster_data_2011
, oralibaba_cluster_trace_2018
.MODEL_1
,MODEL_2
is a list of models you want to run, fromnaive
,auto_arima
,auto_ets
,auto_theta
,multivariate_naive
, orvar
.
Run the pre-training script:
python -m pretraining.pretrain_exp backbone=BACKBONE size=SIZE ++data.dataset_name=DATASET
- where the options for
BACKBONE
,SIZE
options can be found inconf/backbone
andconf/size
respectively. DATASET
is one ofazure_vm_traces_2017
,borg_cluster_data_2011
, oralibaba_cluster_trace_2018
.- see
confg/pretrain.yaml
for more details on the options. - training logs and checkpoints will be saved in
outputs/pretrain_exp
Run the forecast script:
python -m pretraining.forecast_exp backbone=BACKBONE forecast=FORECAST size=SIZE ++data.dataset_name=DATASET
- where the options for
BACKBONE
,FORECAST
,SIZE
options can be found inconf/backbone
,conf/forecast
, andconf/size
respectively. DATASET
is one ofazure_vm_traces_2017
,borg_cluster_data_2011
, oralibaba_cluster_trace_2018
.- see
confg/forecast.yaml
for more details on the options. - training logs and checkpoints will be saved in
outputs/forecast_exp
If you find the paper or the source code useful to your projects, please cite the following bibtex:
@article{woo2023pushing, title={Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain}, author={Woo, Gerald and Liu, Chenghao and Kumar, Akshat and Sahoo, Doyen}, journal={arXiv preprint arXiv:2310.05063}, year={2023} }