This is a guide for adding a new feature to the search space.
- Step 1 Add new variation
- Step 2 Adjust model.py
- Step 3 Add a config within the model.py
- Step 4 Add an argparse argument for the train.py
- Step 5 Create configuration json in exploration folder
- Other Parameter Groups
- Ideas
If the variation is in the following categories, add to the appropriate file or create a new file in the variations folder:
variations/
├── activation_variations.py
├── normalization_variations.py
├── position_encoding_variations.py
└── softmax_variations.py
Some variations, such as orderings of the network, may need to be made directly
to the model.py
file.
Import the new variation:
from variations.softmax_variations import YourSoftmaxVariation
And add to the model.py in appropriate section:
if self.softmax_variant_attn == "yournewvariation":
self.softmax_layer = YourNewVariation(config)
Open up model.py
and add your new configuration within the GPTConfig
dataclass:
@dataclass
class GPTConfig:
block_size: int = 1024
vocab_size: int = 50304 # GPT-2 vocab_size of 50257, padded up to nearest m
n_layer: int = 12
n_head: int = 12
n_embd: int = 768
dropout: float = 0.0
# Your New Setting
new_variation_setting: bool = True
Open up train.py
and add your new feature to the model group inside parse_args
function,
depending on the type:
For boolean values:
model_group.add_argument('--use_faster_inference', default=True, action=argparse.BooleanOptionalAction)
For string values (e.g. for selection between several types of a module):
model_group.add_argument("--softmax_variant", type=str, default="softermax", choices=["constantmax", "polymax", "strongermax", "softermax", "sigsoftmax", "sigsoftmax_base2"])
For numeric values:
model_group.add_argument("--block_size", type=int, default=256)
cd
into the exploration folder and copy a template for a new exploration sweep.
Run the sweep with run_experiments.py
from the repo root specifying our
config file.
python3 run_experiments.py --config explorations/config.json --output_dir out_test
This will automatically timestamp and apply labels to your tensorboard logs, create direct csv logs, and save output checkpoints into a specified folder.
train.py
is parameterized with argparse into three groups:
model_group
- these are automatically added to a config and sent to model.pytraining_group
- only used by train.pylogging_group
- also only used by train.py (specifically for logging)
Adding to the model group will have it sent into model.py, making it really just a two step process for adding a new feature.
In addition to scanning perplexity results from for different settings:
- Reinforcement Loops - adding gymnasium to optimize parameters
- Training Loop - Generating output sample.py, augmenting, then feeding back as training data.
- Monitoring of hyperparameters - e.g. gamma and beta values for constantmax