GitHub - advaitgosai/weak-to-strong-experiments: improving w2s generalization (from openai) using preferences

Weak-to-Strong Generalization with Preferences

Vision

Imagenette was chosen due to logistic and compute restrictions (bigger experiments coming soon!)

Inspired by OpenAI's research at (https://github.com/openai/weak-to-strong/tree/main)

Original Approach: Generate the weak labels using an AlexNet model pretrained on ImageNette and we use linear probes on top of DINO models as a strong student.

Modified Approach: Use the weak labels to generate pairwise preferences to train a linear probe, eliciting a stronger weak-to-strong supervision.

Set download=True in data.py for the first run

python3 run_weak_strong.py \
    data_path: <DATA_PATH> \
    weak_model_name: <WEAK_MODEL>\
    strong_model_name: <STRONG_MODEL> \
    batch_size <BATCH_SIZE> \
    seed <SEED> \
    n_epochs <N_EPOCHS> \
    lr <LR> \
    n_train <N_TRAIN>

Run the prefs version using the same command. Refer to the OpenAI GitHub for detailed parameter explanations.

With the commands above we get the following results (note that the results may not reproduce exactly due to randomness):

AlexNet (weak label) Accuracy: 0.089

DINO ResNet50 (strong on gt) Accuracy: 0.91

Model	PGR
AlexNet → DINO ResNet50	0.096

| AlexNet → DINO ResNet50 (Prefs) | 0.132 |

Scores are low, but better for prefs.

You can add new custom models to models.py and new datasets to data.py.

Text

Submitted as a classroom assignment for CS 690S, here are some results:

References:

Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, and Jeff Wu. Weak-to- strong generalization: Eliciting strong capabilities with weak supervision, 2023.

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model, 2023. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, 2017.

Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend. Advances in neural information processing systems, 28, 2015.

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
data.py		data.py
models.py		models.py
run_weak_strong.py		run_weak_strong.py
run_weak_strong_prefs.py		run_weak_strong_prefs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weak-to-Strong Generalization with Preferences

Vision

Text

References:

About

Releases

Packages

Languages

advaitgosai/weak-to-strong-experiments

Folders and files

Latest commit

History

Repository files navigation

Weak-to-Strong Generalization with Preferences

Vision

Text

References:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages