Skip to content

improving w2s generalization (from openai) using preferences

Notifications You must be signed in to change notification settings

advaitgosai/weak-to-strong-experiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weak-to-Strong Generalization with Preferences

Vision

Imagenette was chosen due to logistic and compute restrictions (bigger experiments coming soon!)

Inspired by OpenAI's research at (https://github.com/openai/weak-to-strong/tree/main)

Original Approach: Generate the weak labels using an AlexNet model pretrained on ImageNette and we use linear probes on top of DINO models as a strong student.

Modified Approach: Use the weak labels to generate pairwise preferences to train a linear probe, eliciting a stronger weak-to-strong supervision.

Set download=True in data.py for the first run

python3 run_weak_strong.py \
    data_path: <DATA_PATH> \
    weak_model_name: <WEAK_MODEL>\
    strong_model_name: <STRONG_MODEL> \
    batch_size <BATCH_SIZE> \
    seed <SEED> \
    n_epochs <N_EPOCHS> \
    lr <LR> \
    n_train <N_TRAIN>

Run the prefs version using the same command. Refer to the OpenAI GitHub for detailed parameter explanations.

With the commands above we get the following results (note that the results may not reproduce exactly due to randomness):

AlexNet (weak label) Accuracy: 0.089

DINO ResNet50 (strong on gt) Accuracy: 0.91

Model PGR
AlexNet → DINO ResNet50 0.096

| AlexNet → DINO ResNet50 (Prefs) | 0.132 |

Scores are low, but better for prefs.

You can add new custom models to models.py and new datasets to data.py.

Text

Submitted as a classroom assignment for CS 690S, here are some results:

w2s1 w2s2 w2s3 w2s4

References:

Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, and Jeff Wu. Weak-to- strong generalization: Eliciting strong capabilities with weak supervision, 2023.

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model, 2023. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, 2017.

Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend. Advances in neural information processing systems, 28, 2015.

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.

About

improving w2s generalization (from openai) using preferences

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages