Skip to content

Latest commit

 

History

History
3644 lines (1782 loc) · 128 KB

CHANGELOG.md

File metadata and controls

3644 lines (1782 loc) · 128 KB

CHANGELOG

v5.3.2 (2025-01-17)

Fix

  • fix: fixing jumprelu threshold when folding dec norm (#404) (06b6669)

v5.3.1 (2025-01-14)

Chore

  • chore: Fixed bug in the neuronpedia_integration cell in notebook (#402)

  • corrected basic loading tutorial

  • fix formatting


Co-authored-by: David Chanin <chanindav@gmail.com> (4935b83)

Fix

  • fix: removing deleted saebench SAEs from pretrained_saes.yaml (#403) (65e1986)

v5.3.0 (2024-12-29)

Feature

  • feat: Replace assert statements with exception code (#400)

  • replaces assert statements with exception code

  • replaces assert statements with exception code in less obvious cases

  • removes unnecessary if and else statements (324be25)

v5.2.1 (2024-12-15)

Unknown

  • Merge pull request #398 from jbloomAus/np_yaml

fix: width for llamascope 32x was incorrect (0617dba)

  • Update Llama Scope NP ids (26899cd)

v5.2.0 (2024-12-06)

Chore

  • chore: fix tokenizer typing for bos_token_id (#399) (b3b67d6)

  • chore: Replace isort black and flake8 with ruff (#393)

  • replaces in cache_activations_runner.py

  • replaces isort, black, adn flake8 with Ruff

  • adds SIM lint rule

  • fixes for CI check

  • adds RET lint rule

  • adds LOG lint rule

  • fixes RET error

  • resolves conflicts

  • applies make format

  • adds T20 rule

  • replaces extend-select with select

  • resolves conflicts

  • fixes lint errors

  • update .vscode/settings.json

  • Revert "update .vscode/settings.json"

This reverts commit 1bb5497d7495f7fb0843bc4eb885ba90cf6b4f47.

  • updates .vscode/settings.json

  • adds newline (52dbff9)

Feature

  • feat: Save estimated norm scaling factor during checkpointing (#395)

  • refactor saving

  • save estimated_norm_scaling_factor

  • use new constant names elsewhere

  • estimate norm scaling factor in ActivationsStore init

  • fix tests

  • add test

  • tweaks

  • safetensors path

  • remove scaling factor on fold

  • test scaling factor value

  • format

  • format

  • undo silly change

  • format

  • save fn protocol

  • make save fn static

  • test which checkpoints have estimated norm scaling factor

  • fix test

  • fmt (63a15a0)

Fix

  • fix: width for llamascope 32x was incorrect (355691f)

  • fix: force build (53180e0)

  • fix: typo in pretrained yaml (9db9e36)

Unknown

  • Merge pull request #397 from jbloomAus/np_yaml

fix: typo in pretrained yaml (19bcb2e)

v5.1.0 (2024-11-30)

Feature

  • feat: Replace print with controllable logging (#388)

  • replaces in pretrained_sae_loaders.py

  • replaces in load_model.py

  • replaces in neuronpedia_integration.py

  • replaces in tsea.py

  • replaces in pretrained_saes.py

  • replaces in cache_activations_runner.py

  • replaces in activations_store.py

  • replaces in training_sae.py

  • replaces in upload_saes_to_huggingface.py

  • replaces in sae_training_runner.py

  • replaces in config.py

  • fixes error for CI


Co-authored-by: David Chanin <chanindav@gmail.com> (2bcd646)

v5.0.0 (2024-11-29)

Breaking

  • feat: Cleaned up CacheActionsRunnerConfig (#389)

BREAKING CHANGE: Superfluous config options have been removed

  • Cleaned up CacheActionsRunnerConfig

Before CacheActivationConfig had a inconsistent config file for some interopability with LanguageModelSAERunnerConfig. It was kind of unclear which parameters were necessary vs redundant, and just was fairly unclear.

Simplified to the required arguments:

  • dataset_path: Tokenized or untokenized dataset
  • total_training_tokens
  • model_name
  • model_batch_size
  • hook_name
  • final_hook_layer
  • d_in

I think this scheme captures everything you need when attempting to cache activations and makes it a lot easier to reason about.

Optional:

activation_save_path # defaults to &#34;activations/{dataset}/{model}/{hook_name}
shuffle=True
prepend_bos=True
streaming=True
seqpos_slice
buffer_size_gb=2 # Size of each buffer. Affects memory usage and saving freq
device=&#34;cuda&#34; or &#34;cpu&#34;
dtype=&#34;float32&#34;
autocast_lm=False
compile_llm=True
hf_repo_id # Push to hf
model_kwargs # `run_with_cache`
model_from_pretrained_kwargs
  • Keep compatiability with old config
  • Renamed to keep values same where possible
  • Moved _from_saved_activations (private api for CachedActivationRunner) to cached_activation_runner.py
  • Use properties instead of __post_init__ (d81e286)

v4.4.5 (2024-11-24)

Fix

Unknown

  • Merge pull request #387 from jbloomAus/np_yaml

fix: add missing neuronpedia yaml entries (deae2a7)

v4.4.4 (2024-11-24)

Fix

Unknown

  • Merge pull request #386 from jbloomAus/np_yaml

fix: add missing neuronpedia yaml entries (e35f998)

v4.4.3 (2024-11-24)

Fix

Unknown

  • Merge pull request #385 from jbloomAus/np_yaml

fix: add missing neuronpedia yaml entry (7ac5253)

v4.4.2 (2024-11-24)

Fix

  • fix: update neuronpedia yaml entries (465c958)

Unknown

  • Merge pull request #383 from jbloomAus/np_yaml

fix: update neuronpedia yaml entries (93b3dd2)

v4.4.1 (2024-11-19)

Fix

  • fix: remove typeguard dependency (#380) (c555d9b)

v4.4.0 (2024-11-19)

Feature

  • feat: Topk SAE training (#370)

  • feat: topk training

  • adding tests

  • adding docs for training topk saes

  • fixing typing

  • more tests

  • adding topk to hidden pre test

  • changes from CR

  • temporarily adding typeguard so tests will pass (aa8f42b)

v4.3.5 (2024-11-18)

Chore

  • chore: adding test that all config params pass to sae (#379) (2a43b68)

Fix

  • fix: Force build for the pretrained_saes.yaml update (8752dcc)

Unknown

  • Merge pull request #378 from jbloomAus/add-new-saebench-saes

Added new SAEBench gemma 2 2b SAEs (637b27b)

v4.3.4 (2024-11-14)

Fix

  • fix: hotfix scale decoder norm is not passed to training sae (#377)

  • fix: hotfix scale decoder norm is not passed to training sae

  • remove default params from TrainingSAEConfig (38876b4)

v4.3.3 (2024-11-12)

Fix

  • fix: fixing jumprelu encode and save/load (#373)

  • fix: jumprelu encode and save/load

  • fixing tests

  • changes from CR (17506ac)

v4.3.2 (2024-11-12)

Chore

  • chore: fixing whitespace so docs render as list not paragraph (#374) (156ddc9)

  • chore: add codecov.yaml and exclude legacy files (#372) (aa98caf)

Fix

  • fix: add neuronpedia ids for llamascope (23b4912)

Unknown

  • Merge pull request #375 from jbloomAus/add_np_llamascope

fix: add neuronpedia ids for llamascope (60542fa)

  • Merge pull request #371 from jbloomAus/fix-llamascope-details

fixed llamascope sae names and loader (fecfe5d)

  • fixed llamascope sae names and loader (8f6bcb0)

  • Merge pull request #369 from Hzfinfdu/main

Add Llama Scope SAEs & improvements to evaluating ce scores. (a1546e6)

  • fix format for PR (1443b58)

  • feature(evals): mask ignore_tokens in replacement hooks for evaluation (ae67eaa)

v4.3.1 (2024-11-10)

Fix

  • fix: fixing type errors after bad merge (4a08d0d)

  • fix: only scale sparsity by dec norm if specified in the config (#365) (ceb2d3f)

v4.3.0 (2024-11-10)

Chore

  • chore: updating training docs with tips / jumprelu (#366)

  • chore: updating training docs with tips / jumprelu

  • fixing missing space char (f739500)

Feature

  • feat: Support arbitrary huggingface causal LM models (#226)

  • adding load_model helper for huggingface causal LM models

  • polishing huggingface integration

  • adding more tests

  • updating docs

  • tweaking docstrings

  • perf fix: dont calculate loss by default

  • better handling of HF tuple outputs

  • fixing test

  • changes from CR

  • fixing default model params for huggingface models

  • move hf model to device on load (044d4be)

Performance

  • perf: faster cleanup of datasets when caching activations (#367)

previously I used dataset.save_to_disk to write the final dataset, but this can be slow. Instead I manually move the shards to the standard hf format which allows us not to resave the entire dataset (a3663b7)

v4.2.0 (2024-11-09)

Chore

  • chore: adding 'Load this SAE' popup to docs table (#362) (1866aa7)

  • chore: more flexible training losses (#357)

  • retun and log a dict from train step

  • updating trainer loss pbar

  • avoid unnecessary gpu sync

  • fixing tests

  • adding logging for unnormalized l1 loss (0c1179c)

Feature

  • feat: adding a CLI training runner (#359) (998c277)

Unknown

  • add support for Llama Scope SAEs (aaf2f29)

v4.1.1 (2024-11-06)

Chore

  • chore: Update training_a_sparse_autoencoder.ipynb (#358)

Changed "She lived in a big, happy little girl." to "She lived in a big, happy little town." (b8703fe)

Fix

  • fix: load the same config from_pretrained and get_sae_config (#361)

  • fix: load the same config from_pretrained and get_sae_config

  • merge neuronpedia_id into get_sae_config

  • fixing test (8e09458)

v4.1.0 (2024-11-03)

Feature

  • feat: Support training JumpReLU SAEs (#352)

  • adds JumpReLU logic to TrainingSAE

  • adds unit tests for JumpReLU

  • changes classes to match tutorial

  • replaces bandwidth constant with param

  • re-add logic to JumpReLU logic to TrainingSAE

  • adds TrainingSAE.save_model()

  • changes threshold to match paper

  • add tests for TrainingSAE when archicture is jumprelu

  • adds test for SAE.load_from_pretrained() for JumpReLU

  • removes code causing test to fail

  • renames initial_threshold to threshold

  • removes setattr()

  • adds test for TrainingSAE.save_model()

  • renames threshold to jumprelu_init_threshold

  • adds jumprelu_bandwidth

  • removes default value for jumprelu_init_threshold downstream

  • replaces zero tensor with None in Step.backward()

  • adds jumprelu to architecture type (0b56d03)

v4.0.10 (2024-10-30)

Fix

  • fix: normalize decoder bias in fold_norm_scaling_factor (#355)

  • WIP: fix fold_norm_scaling

  • fixing test (6951e74)

v4.0.9 (2024-10-24)

Fix

  • fix: typo in layer 12 YAML (d634c8b)

Unknown

  • Merge pull request #349 from jbloomAus/np_id_fix_2

fix: use the correct layer for new gemma scope SAE sparsities (4c32de0)

v4.0.8 (2024-10-24)

Fix

  • fix: use the correct layer for new gemma scope SAE sparsities (a78b93e)

Unknown

  • Merge pull request #348 from jbloomAus/np_id_fix

fix: use the correct layer for new gemma scope SAE sparsities (1f6823a)

v4.0.7 (2024-10-23)

Fix

  • fix: Test JumpReLU/Gated SAE and fix sae forward with error term (#328)

  • chore: adding tests a slight refactoring for SAE forward methods

  • refactoring forward methods using a helper to avoid firing hooks

  • rewording intermediate var

  • use process_sae_in helper in training sae encode

  • testing that sae.forward() with error term works with hooks

  • cleaning up more unneeded device=cpu in tests (ae345b6)

v4.0.6 (2024-10-23)

Chore

  • chore: Add tests for evals (#346)

  • add unit tests for untested functions

  • adds test to increase coverage

  • fixes typo (06594f9)

Fix

  • fix: pass device through to SAEConfigLoadOptions properly (#347) (531b1c7)

v4.0.5 (2024-10-22)

Fix

  • fix: last NP id fix, hopefully (a470460)

Unknown

  • Merge pull request #345 from jbloomAus/last_np_id_fix

fix: last NP id fix, hopefully (d5a7906)

v4.0.4 (2024-10-22)

Fix

  • fix: np ids should contain model id (da5c622)

Unknown

  • Merge pull request #344 from jbloomAus/fix_np_ids_again

fix: np ids should contain model id (88d9a0b)

v4.0.3 (2024-10-22)

Fix

  • fix: fix duplicate np ids (bcaf802)

  • fix: yaml was missing some gemmascope np ids, update np id formats (3f43590)

Unknown

  • Merge pull request #343 from jbloomAus/fix_duplicate_np_ids

fix: fix duplicate np ids (4060d07)

  • Merge pull request #342 from jbloomAus/fix_yaml_missing_gemmascope_and_np_ids

fix: yaml was missing some gemmascope np ids, update np id formats (db2fa5f)

v4.0.2 (2024-10-22)

Fix

  • fix: previous saebench yaml fixes were incomplete for pythia-70m-deduped (b0adf2d)

Unknown

  • Merge pull request #341 from jbloomAus/fix_pythia70md_again

fix: previous saebench yaml fixes were incomplete for pythia-70m-deduped (72e3ef4)

v4.0.1 (2024-10-20)

Chore

  • chore: reduce test space usage in CI (#336)

  • chore: reduce test space usage in CI

  • busting caches

  • try reducing sizes further

  • try using smaller datasets where possible

  • tokenizing a super tiny dataset for tests (36e1d86)

Fix

  • fix: changes dtype default value in read_sae_from_disk() (#340) (5820585)

Unknown

  • Merge pull request #339 from jbloomAus/fix/saeb-bench-model-names

updated SAE Bench pythia model names (and loader device cfg) (2057455)

  • Merge pull request #324 from jbloomAus/improving-evals

chore: Misc basic evals improvements (eg: consistent activation heuristic, cli args) (10f4773)

  • Updated tests to be correct (d1b4f5d)

  • Organized basic eval metrics and eliminated NaNs (f6be1a6)

  • format with updated env (97622b5)

  • format (31b2e9d)

  • fix other test from rebase (530a426)

  • fix tests (92019ac)

  • set type to int for ctx lens (e4b5be6)

  • update evaluating SAEs tutorial (d2bebbc)

  • moving SAE to correct device (b881b05)

  • Added dataset_trust_remote_code arg (53dbde6)

  • Added trust_remote_code arg (b738924)

  • Updated eval config explanations (d439927)

  • Added updated plots for feature metrics (7a4ce2d)

  • Initial draft of evals tutorial (42309c8)

  • first pass evals notebook (8005ff9)

  • add verbose mode (15f1b59)

  • add more cli args (bc17fa5)

  • fix featurewise weight based metric type (d504a96)

  • add featurewise weight based metrics (ed365c4)

  • fix during training eval config (333d71c)

  • add feature density histogram to evals + consistent activation heuristic (9341398)

  • keep track of tokens used seperately (c168c2b)

  • use evals code in CI (87601ba)

  • add cossim and relative reconstruction bias (c028072)

  • add sae_lens version (a3123c8)

  • remove redundant string in keys (a2dd2e0)

  • updated SAE Bench pythia model names (and loader device cfg) (2078eac)

v4.0.0 (2024-10-15)

Breaking

  • feat: Use hf datasets for activation store (#321)

BREAKING CHANGE: use huggingface for cached activations

  • refactored load activations into new function

  • activation store

  • cache activation runner

  • formatting and get total_size

  • doing tests

  • cleaner load buffer

  • cleaner load dataset

  • cleanup cache activation runner

  • add comments

  • failing test

  • update

  • fixed! set shuffle param in get_buffer

  • fixed linting

  • added more tests

  • refactor tests & cleanup

  • format config.py

  • added hook name mismatch test

  • set deperacted to -1

  • fix tempshards test

  • update test name

  • add benchmark: safetensors vs dataset

  • added stop iteration at end of dataset

  • don't double save

  • add push to hub

  • fix save

  • fomatting

  • comments

  • removed unecessary write

  • cleanup pushing to hub, same as PretokenizeRunnerConfig

  • use num_buffers by default (rather than 64)

  • update comment

  • shuffle and save to disk

  • cleanup error checking

  • added cfg info

  • delete to iterable

  • formatting

  • delete deprectated params

  • set format of dataset

  • fix tests

  • delete shuffle args

  • fix test

  • made dynamic dataset creation shorter

  • removed print statements

  • showcase hf_repo_id in docs


Co-authored-by: Tom Pollak <tompollak100@gmail.com> Co-authored-by: David Chanin <chanindav@gmail.com> (ff335f0)

Feature

  • feat: support othellogpt in SAELens (#317)

  • support seqpos slicing

  • add basic tests, ensure it's in the SAE config

  • format

  • fix tests

  • fix tests 2

  • fix: Changing the activations store to handle context sizes smaller than dataset lengths for tokenized datasets.

  • fix: Found bug which allowed for negative context lengths. Removed the bug

  • Update pytest to test new logic for context size of tokenized dataset

  • Reformat code to pass CI tests

  • Add warning for when context_size is smaller than the dataset context_size

  • feat: adding support for start and end position offsets for token sequences

  • Add start_pos_offset and end_pos_offset to the SAERunnerConfig

  • Add tests for start_pos_offset and end_pos_offset in the LanguageModelSAERunnerConfig

  • feat: start and end position offset support for SAELens.

  • Add test for CacheActivationsRunnerConfig with start and end pos offset

  • Test cache activation runner wtih valid start and end pos offset

  • feat: Enabling loading of start and end pos offset from saes. Adding tests for this

  • fix: Renaming variables and a test

  • adds test for position offests for saes

  • reformats files with black

  • Add start and end pos offset to the base sae dict

  • fix test for sae training runner config with position offsets

  • add a benchmark test to train an SAE on OthelloGPT

  • Remove double import from typing

  • change dead_feature_window to int

  • remove print statements from test file

  • Rebase on seqpos tuple implementation and remove start/end pos offset

  • Reword docstring for seqpos to be clearer.

  • Added script to train an SAE on othelloGPT


Co-authored-by: callummcdougall <cal.s.mcdougall@gmail.com> Co-authored-by: jbloomAus <jbloomaus@gmail.com> Co-authored-by: liuman <zhenninghimme@gmail.com> (7047f87)

  • feat: add get_sae_config() function (#331)

  • extracts code to get_connor_rob_hook_z_config()

  • extracts code into get_dictionary_learning_config_1()

  • extract repeated lines to above conditions

  • fixes incorrect function name

  • extracts code in generate_sae_table.py to function

  • removes unnecessary update()

  • replaces calls to specific loaders with get_sae_config()

  • replaces **kwargs with dataclass

  • refactors attribute access

  • renames SAEConfigParams to SAEConfigLoadOptions

  • gets rid of indent

  • replaces repo_id, folder_name with release, sae_id

  • extracts to get_conversion_loader_name()

  • extracts if-else to dict

  • move blocks to sensible place

  • extracts to get_repo_id_and_folder_name()

  • adds tests for get_repo_id_and_folder_name()

  • adds tests for get_sae_config()

  • removes mocking

  • fixes test

  • removes unused import (d451b1d)

Fix

  • fix: force new build (26fead6)

  • fix: add neuronpedia links for gemmascope 32plus (1087f19)

Unknown

  • Merge pull request #332 from jbloomAus/pretrained_yaml_gs_32plus

fix: add neuronpedia links for gemmascope 32plus (42ba557)

  • Add Curt to citation (#329) (24b8560)

v3.23.4 (2024-10-10)

Fix

  • fix: add-neuronpedia-ids-correct-gemma-2-2b-model-name (#327) (6ed1400)

v3.23.3 (2024-10-08)

Fix

  • fix: properly manage pyproject.toml version (#325) (432df87)

v3.23.2 (2024-10-07)

Chore

  • chore: adds black-jupyter to dependencies (#318)

  • adds black-jupyter to dependencies

  • formats notebooks (ed5d791)

Fix

  • fix: hook_sae_acts_post for Gated models should be post-masking (#322)

  • first commit

  • formatting (5e70edc)

v3.23.1 (2024-10-03)

Chore

  • chore: Change print warning messages to warnings.warn messages in activations-store (#314) (606f464)

Fix

  • fix: Correctly load SAE Bench TopK SAEs (#308) (4fb5bbe)

v3.23.0 (2024-10-01)

Chore

  • chore: deletes print() in unit tests (#306) (7d9fe10)

Feature

  • feat: allow smaller context size of a tokenized dataset (#310)

  • fix: Changing the activations store to handle context sizes smaller than dataset lengths for tokenized datasets.

  • fix: Found bug which allowed for negative context lengths. Removed the bug

  • Update pytest to test new logic for context size of tokenized dataset

  • Reformat code to pass CI tests

  • Add warning for when context_size is smaller than the dataset context_size


Co-authored-by: liuman <zhenninghimme@gmail.com> (f04c0f9)

Fix

  • fix: Add entity argument to wandb.init (#309) (305e576)

v3.22.2 (2024-09-27)

Fix

  • fix: force new build for adding llama-3-8b-it (ed23343)

Unknown

  • Merge pull request #307 from jbloomAus/feature/llama-3-8b-it

Feature/llama 3 8b it (f786712)

v3.22.1 (2024-09-26)

Chore

  • chore: delete dashboard_runner.py (#303) (df0aba7)

Fix

  • fix: fixing canrager SAEs in SAEs table docs (#304) (54cfc67)

v3.22.0 (2024-09-25)

Feature

  • feat: Add value error if both d sae and expansion factor set (#301)

  • adds ValueError if both d_sae and expansion_factor set

  • renames class

  • removes commented out line (999ffe8)

v3.21.1 (2024-09-23)

Fix

  • fix: log-spaced-checkpoints-instead (#300) (cdc64c1)

v3.21.0 (2024-09-23)

Feature

  • feat: Add experimental Gemma Scope embedding SAEs (#299)

  • add experimental embedding gemmascope SAEs

  • format and lint (bb9ebbc)

Unknown

  • Fix gated forward functions (#295)

  • support seqpos slicing

  • fix forward functions for gated

  • remove seqpos changes

  • fix formatting (remove my changes)

  • format


Co-authored-by: jbloomAus <jbloomaus@gmail.com> (a708220)

v3.20.5 (2024-09-20)

Fix

  • fix: removing missing layer 11, 16k, l0=79 sae (#293)

Thanks! (e20e21f)

v3.20.4 (2024-09-18)

Chore

  • chore: Update README.md (#292) (d44c7c2)

Fix

  • fix: Fix imports from huggingface_hub.utils.errors package (#296)

  • Fix imports from huggingface_hub.utils.errors package

  • Load huggingface error classes from huggingface_hub.utils (9d8ba77)

v3.20.3 (2024-09-13)

Fix

  • fix: Improve error message for Gemma Scope non-canonical ID not found (#288)

  • Update sae.py as a nicer Gemma Scope error encouraging canonical

  • Update sae.py

  • Update sae.py

  • format


Co-authored-by: jbloomAus <jbloomaus@gmail.com> (9d34598)

v3.20.2 (2024-09-13)

Fix

  • fix: Update README.md (#290) (1d1ac1e)

v3.20.1 (2024-09-13)

Fix

  • fix: neuronpedia oai v5 sae ids (ffec7ed)

Unknown

  • Merge pull request #291 from jbloomAus/fix_np_oai_ids

fix: neuronpedia oai v5 sae ids (253191e)

v3.20.0 (2024-09-12)

Feature

  • feat: Add SAE Bench SAEs (#285)

  • add ignore tokens in evals

  • remove accidental hard coding

  • fix mse

  • extract sae filtering code

  • add sae_bench saes

  • use from pretrained no processing by default

  • use open web text by default

  • add estimate of number of SAEs print statements

  • add unit tests

  • type fix (680c52b)

v3.19.4 (2024-09-05)

Fix

  • fix: add OAI mid SAEs for neuronpedia (a9cb852)

  • fix: Gemma Scope 9b IT ids for Neuronpedia (dcafbff)

Unknown

  • Merge pull request #282 from jbloomAus/oai_mid_fix

fix: add OAI mid SAEs for neuronpedia (643f0c5)

  • Merge pull request #281 from jbloomAus/fix_np_gemmascope_ids

fix: Gemma Scope 9b IT ids for Neuronpedia (e354918)

v3.19.3 (2024-09-04)

Fix

  • fix: more gemma scope canonical ids + a few canonical ids were off. (#280)

  • fix model name in config for it models

  • add 9b non-standard sizes

  • fix att 131k canonical that were off

  • fix mlp 131k canonical that were off (aa3e733)

v3.19.2 (2024-09-04)

Fix

  • fix: centre writing weights defaults (#279)

  • add model_from_pretrained_kwargs to SAE config

  • default to using no processing when training SAEs

  • add centre writing weights true as config override for some SAEs

  • add warning about from pretrained kwargs

  • fix saving of config by trainer

  • fix: test (7c0d1f7)

v3.19.1 (2024-09-04)

Chore

  • chore: Update usage of Neuronpedia explanations export (#267)

Co-authored-by: David Chanin <chanindav@gmail.com> (f100aed)

Fix

  • fix: reset hooks before saes in tutorial (#278) (2c225fd)

Unknown

  • updating howpublished url in docs (#270) (25d9ba4)

v3.19.0 (2024-09-03)

Chore

  • chore: Cleanup basic tutorial (#271)

  • saelens: remove unnecessary html outputsaving, to save some space

  • saelens: update top comment on basic tutorial (396e66e)

  • chore: update get neuronpedia quicklist function in logit lens tutorial (#274) (5b819b5)

  • chore: Corrected outdated code to call API (#269)

Co-authored-by: dhuynh95 <daniel.huynh@mithrilsecurity.io> (7b19adc)

  • chore: updating mkdocs deps (#268)

  • chore: updating mkdocs deps

  • adding type: ignore to wandb calls (75d142f)

Feature

  • feat: only log ghost grad if you've enabled it (#272)

  • saelens: improve log output to only include ghost grad logs if you're using them

  • sae: update ghostgrad log tests (da05f08)

v3.18.2 (2024-08-25)

Fix

  • fix: gemma scope saes yml. 16k for Gemma 2 9b was missing entries. (#266)

  • add missing saes, 16k was missing for 9b att and mlp

  • remove file name not needed (86c04ac)

v3.18.1 (2024-08-23)

Chore

  • chore: adding more metatadata to pyproject.toml for PyPI (#263) (5c2d391)

Fix

  • fix: modify duplicate neuronpedia ids in config.yml, add test. (#265)

  • fix duplicate ids

  • fix test that had mistake (0555178)

v3.18.0 (2024-08-22)

Feature

  • feat: updated pretrained yml gemmascope and neuronpedia ids (#264) (a3cb00d)

v3.17.1 (2024-08-18)

Fix

  • fix: fix memory crash when batching huge samples (#262) (f0bec81)

v3.17.0 (2024-08-16)

Feature

  • feat: add end-2-end saes from Braun et al to yaml (#261) (1d4eac1)

v3.16.0 (2024-08-15)

Feature

  • feat: make canonical saes for attn (#259) (ed2437b)

v3.15.0 (2024-08-14)

Chore

  • chore: updating slack link in docs (#255) (5c7595a)

Feature

  • feat: support uploading and loading arbitrary huggingface SAEs (#258) (5994827)

Unknown

  • Remove duplicate link (#256) (c40f1c5)

  • Update index.md (#257)

removes comment asking for table creation and links to it (1e185b3)

  • Merge pull request #244 from jbloomAus/add_pythia_70m_saes

Added pythia-70m SAEs to yaml (022f1de)

  • Merge branch 'main' into add_pythia_70m_saes (32901f2)

v3.14.0 (2024-08-05)

Feature

  • feat: GemmaScope SAEs + fix gemma-scope in docs (#254) (3da4cee)

Unknown

  • More complete set of Gemma Scope SAEs (#252)

  • commit for posterity

  • ignore pt files in home

  • add canonical saes

  • improve gemma 2 loader

  • better error msg on wrong id

  • handle config better

  • handle hook z weirdness better

  • add joseph / curt script

  • add gemma scope saes

  • format

  • make linter happy (68de42c)

  • Updated dashes (7c7a271)

  • Changed gemma repo to google (fa483f0)

  • Fixed pretrained_saes.yaml Gemma 2 paths (920b77e)

  • Gemma2 2b saes (#251)

  • Added support for Gemma 2

  • minor fixes

  • format

  • remove faulty error raise


Co-authored-by: jbloomAus <jbloomaus@gmail.com> (df273c4)

v3.13.1 (2024-07-31)

Fix

  • fix: update automated-interpretability dep to use newly released version (#247)

  • fix: update automated-interpretability dep to use newly released version

  • fixing / ignore optim typing errors (93b2ebe)

Unknown

  • Tutorial 2.0 (#250)

  • tutorial 2.0 draft

  • minor changes

  • Various additions to tutorial

  • Added ablation

  • better intro text

  • improve content further

  • fix steering

  • fix ablation to be true ablation

  • current tutorial


Co-authored-by: curt-tigges <ct@curttigges.com> (fe27b7c)

  • Fix typo in readme (#249) (fe987f1)

  • Merge pull request #242 from jbloomAus/add_openai_gpt2_small_saes

Added OpenAI TopK SAEs to pretrained yaml (2c1cbc4)

  • Added pythia-70m SAEs to yaml (25fb167)

  • Neuronpedia API key is now in header, not in body (#243) (caacef1)

  • Merge pull request #237 from jbloomAus/use_error_term_param

Use error term param (ac86d10)

  • Update pyproject.toml (4b032ab)

  • Added OpenAI TopK SAEs to pretrained yaml (7463e9f)

v3.13.0 (2024-07-18)

Feature

  • feat: validate that pretokenized dataset tokenizer matches model tokenizer (#215)

Co-authored-by: Joseph Bloom <69127271+jbloomAus@users.noreply.github.com> (c73b811)

Unknown

  • add more bootleg gemma saes (#240)

  • add more bootleg gemma saes

  • removed unused import (22a0841)

v3.12.5 (2024-07-18)

Fix

  • fix: fixing bug with cfg loading for fine-tuning (#241) (5a88d2c)

Unknown

  • Update deploy_docs.yml

Removed the Debug Info step that was causing issues. (71fd509)

v3.12.4 (2024-07-17)

Fix

  • fix: Trainer eval config will now respect trainer config params (#238)

  • Trainer eval config will now respect trainer config params

  • Corrected toml version (5375505)

Unknown

  • Neuronpedia Autointerp/Explanation Improvements (#239)

  • Neuronpedia autointerp API improvements: new API, new flags for save to disk and test key, fix bug with scoring disabled

  • Ignore C901 (ba7d218)

  • Fixed toml file (8211cac)

  • Ensured that even detatched SAEs are returned to former state (90ac661)

  • Added use_error_term functionality to run_with_x functions (1531c1f)

  • Added use_error_term to hooked sae transformer (d172e79)

  • Trainer will now fold and log estimated norm scaling factor (#229)

  • Trainer will now fold and log estimated norm scaling factor after doing fit

  • Updated tutorials to use SAEDashboard

  • fix: sae hook location (#235)

  • 3.12.2

Automatically generated by python-semantic-release

  • fix: sae to method (#236)

  • 3.12.3

Automatically generated by python-semantic-release

  • Trainer will now fold and log estimated norm scaling factor after doing fit

  • Added functionality to load and fold in precomputed scaling factors from the YAML directory

  • Fixed toml


Co-authored-by: Joseph Bloom <69127271+jbloomAus@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> (8d38d96)

v3.12.3 (2024-07-15)

Fix

  • fix: sae to method (#236) (4df78ea)

v3.12.2 (2024-07-15)

Fix

  • fix: sae hook location (#235) (94ba11c)

Unknown

v3.12.1 (2024-07-11)

Fix

  • fix: force release of dtype_fix (bfe7feb)

Unknown

  • Merge pull request #225 from jbloomAus/dtype_fix

fix: load_from_pretrained should not require a dtype nor default to float32 (71d9da8)

  • TrainingSAE should: 1) respect device override and 2) not default to float32 dtype, and instead default to the SAE's dtype (a4a1c46)

  • load_from_pretrained should not require a dtype nor default to float32 (a485dc0)

  • Fix SAE failing to upload to wandb due to artifact name. (#224)

  • Fix SAE artifact name.

  • format


Co-authored-by: Joseph Bloom <jbloomaus@gmail.com> (6ae4849)

v3.12.0 (2024-07-09)

Feature

  • feat: use TransformerLens 2 (#214)

  • Updated pyproject.toml to use TL ^2.0, and to use fork of sae-vis that also uses TL ^2.0

  • Removed reliance on sae-vis

  • Removed neuronpedia tutorial

  • Added error handling for view operation

  • Corrected formatting (526e736)

Unknown

  • Fix/allow device override (#221)

  • Forced load_from_pretrained to respect device and dtype params

  • Removed test file (697dd5f)

  • Fixed hooks for single head SAEs (#219)

  • included zero-ablation-hook for single-head SAEs

  • fixed a typo in single_head_replacement_hook (3bb4f73)

v3.11.2 (2024-07-08)

Fix

  • fix: rename encode_fn to encode and encode to encode_standard (#218) (8c09ec1)

v3.11.1 (2024-07-08)

Fix

  • fix: avoid bfloat16 errors in training gated saes (#217) (1e48f86)

Unknown

  • Update README.md (9adba61)

  • Update deploy_docs.yml

Modified this file to install dependencies (using caching for efficiency). (e90d5c1)

  • Adding type hint (5da6a13)

  • Actually doing merge (c362e81)

  • Merge remote-tracking branch 'upstream/main' into Evals (52780c0)

  • Making changes in response to comments (cf4ebcd)

v3.11.0 (2024-07-04)

Feature

  • feat: make pretrained sae directory docs page (#213)

  • make pretrained sae directory docs page

  • type issue weirdness

  • type issue weirdness (b8a99ab)

v3.10.0 (2024-07-04)

Feature

  • feat: make activations_store re start the dataset when it runs out (#207)

  • make activations_store re start the dataset when it runs out

  • remove misleading comments

  • allow StopIteration to bubble up where appropriate

  • add test to ensure that stopiteration is raised

  • formatting

  • more formatting

  • format tweak so we can re-try ci

  • add deps back (91f4850)

  • feat: allow models to be passed in as overrides (#210) (dd95996)

Fix

  • fix: Activation store factor unscaling fold fix (#212)

  • add unscaling to evals

  • fix act norm unscaling missing

  • improved variance explained, still off for that prompt

  • format

  • why suddenly a typingerror and only in CI? (1db84b5)

v3.9.2 (2024-07-03)

Fix

  • fix: Gated SAE Note Loading (#211)

  • fix: add tests, make pass

  • not in (b083feb)

Unknown

  • SAETrainingRunner takes optional HFDataset (#206)

  • SAETrainingRunner takes optional HFDataset

  • more explicit errors when the buffer is too large for the dataset

  • format

  • add warnings when a new dataset is added

  • replace default dataset with empty string

  • remove valueerror (2c8fb6a)

v3.9.1 (2024-07-01)

Fix

  • fix: pin typing-extensions version (#205) (3f0e4fe)

v3.9.0 (2024-07-01)

Feature

  • feat: OpenAI TopK SAEs for residual stream of GPT2 Small (#201)

  • streamlit app

  • feat: basic top-k support + oai gpt2small saes

  • fix merge mistake (06c4302)

Unknown

  • prevent context size mismatch error (#200) (76389ac)

  • point gpt2 dataset path to apollo-research/monology-pile (#199) (d3eb427)

v3.8.0 (2024-06-29)

Feature

  • feat: harmize activation store and pretokenize runner (#181)

  • eat: harmize activation store and pretokenize runner

  • reverting SAE cfg back to prepend_bos

  • adding a benchmark test

  • adding another test

  • adding list of tokenized datasets to docs

  • adding a warning message about lack of pre-tokenization, and linking to SAELens docs

  • fixing tests after apollo deleted sae- dataset versions

  • Update training_saes.md (2e6a3c3)

Unknown

v3.7.0 (2024-06-25)

Feature

  • feat: new saes for gemma-2b-it and feature splitting on gpt2-small-layer-8 (#195) (5cfe382)

v3.6.0 (2024-06-25)

Feature

  • feat: Support Gated-SAEs (#188)

  • Initial draft of encoder

  • Second draft of Gated SAE implementation

  • Added SFN loss implementation

  • Latest modification of SFN loss training setup

  • fix missing config use

  • dont have special sfn loss

  • add hooks and reshape

  • sae error term not working, WIP

  • make tests pass

  • add benchmark for gated


Co-authored-by: Joseph Bloom <jbloomaus@gmail.com> (232c39c)

Unknown

  • fix hook z loader (#194) (cb30996)

v3.5.0 (2024-06-20)

Feature

Unknown

  • Performance improvements + using multiple GPUs. (#189)

  • fix: no grads when filling cache

  • trainer should put activations on sae device

  • hack to allow sae device to be specific gpu when model is on multiple devices

  • add some tests (not in CI, which check multiple GPU performance

  • make formatter typer happy

  • make sure SAE calls move data between devices as needed (400474e)

v3.4.1 (2024-06-17)

Fix

  • fix: allow settings trust_remote_code for new huggingface version (#187)

  • fix: allow settings trust_remote_code for new huggingface version

  • default to True, not none


Co-authored-by: jbloomAus <jbloomaus@gmail.com> (33a612d)

v3.4.0 (2024-06-14)

Feature

  • feat: Adding Mistral SAEs (#178)

Note: normalize_activations is now a string and should be either 'none', 'expected_average_only_in' (Anthropic April Update, not yet folded), 'constant_norm_rescale' (Anthropic Feb update).

  • Adding code to load mistral saes

  • Black formatting

  • Removing library changes that allowed forward pass normalization

  • feat: support feb update style norm scaling for mistral saes

  • Adding code to load mistral saes

  • Black formatting

  • Removing library changes that allowed forward pass normalization

  • Adding code to load mistral saes

  • Black formatting

  • Removing library changes that allowed forward pass normalization

  • feat: support feb update style norm scaling for mistral saes

  • remove accidental inclusion


Co-authored-by: jbloomAus <jbloomaus@gmail.com> (227d208)

Unknown

  • Update README.md Slack Link Expired (this one shouldn't expire) (209696a)

  • add expected perf for pretrained (#179)

Co-authored-by: jbloom-md <joseph@massdynamics.com> (10bd9c5)

  • fix progress bar updates (#171) (4d92975)

v3.3.0 (2024-06-10)

Feature

  • feat: updating docs and standardizing PretokenizeRunner export (#176) (03f071b)

Unknown

v3.2.3 (2024-06-05)

Fix

  • fix: allow tutorial packages for colab install to use latest version (#173)

fix: allow tutorial packages for colab install to use latest version (#173) (f73cb73)

Unknown

  • fix pip install in HookedSAETransformer Demo (#172) (5d0faed)

v3.2.2 (2024-06-02)

Fix

  • fix: removing truncation in activations store data loading (#62) (43c93e2)

v3.2.1 (2024-06-02)

Fix

  • fix: moving non-essential deps to dev (#121) (1a2cde0)

v3.2.0 (2024-05-30)

Feature

  • feat: activation norm scaling factor folding (#170)

  • feat: add convenience function for folding scaling factor

  • keep playing around with benchmark (773e308)

v3.1.1 (2024-05-29)

Fix

  • fix: share config defaulting between hf and local loading (#169) (7df479c)

v3.1.0 (2024-05-29)

Feature

  • feat: add w_dec_norm folding (#167)

  • feat: add w_dec_norm folding

  • format (f1908a3)

Unknown

  • Fixed typo in Hooked_SAE_Transformer_Demo.ipynb preventing Open in Colab badge from working (#166)

Minor typo in file name was preventing Hooked_SAE_Transformer_Demo.ipynb "Open in Colab" badge from working. (4850b16)

  • Fix hook z training reshape bug (#165)

  • remove file duplicate

  • fix: hook-z evals working, and reshaping mode more explicit (0550ae3)

v3.0.0 (2024-05-28)

Breaking

  • feat: refactor SAE code

BREAKING CHANGE: renamed and re-implemented paths (3c67666)

Unknown

  • major: trigger release

BREAKING CHANGE: https://python-semantic-release.readthedocs.io/en/latest/commit-parsing.html#commit-parser-angular

BREAKING CHANGE: (fac8533)

  • major: trigger release

BREAKING CHANGE: trigger release (apparently we need a newline) (90ed2c2)

  • BREAKING CHANGE: Quality of Life Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. (#162)

  • move HookedSAETransformer from TL

  • add tests

  • move runners one level up

  • fix docs name

  • trainer clean up

  • create training sae, not fully seperate yet

  • remove accidentally commited notebook

  • commit working code in the middle of refactor, more work to do

  • don't use act layers plural

  • make tutorial not use the activation store

  • moved this file

  • move import of toy model runner

  • saes need to store at least enough information to run them

  • further refactor and add tests

  • finish act store device rebase

  • fix config type not caught by test

  • partial progress, not yet handling error term for hooked sae transformer

  • bring tests in line with trainer doing more work

  • revert some of the simplification to preserve various features, ghost grads, noising

  • hooked sae transformer is working

  • homogenize configs

  • re-enable sae compilation

  • remove old file that doesn't belong

  • include normalize activations in base sae config

  • make sure tutorial works

  • don't forget to update pbar

  • rename sparse autoencoder to sae for brevity

  • move non-training specific modules out of training

  • rename to remove _point

  • first steps towards better docs

  • final cleanup

  • have ci use same test coverage total as make check-ci

  • clean up docs a bit


Co-authored-by: ckkissane <67170576+ckkissane@users.noreply.github.com> (e4eaccc)

  • Move activation store to cpu (#159)

  • add act store device to config

  • fix serialisation issue with device

  • fix accidental hardcoding of a device

  • test activations get moved correctly

  • fix issue with test cacher that shared state

  • add split store & model test + fix failure

  • clarify comment

  • formatting fixes (eb9489a)

  • Refactor training (#158)

  • turn training runner into a class

  • make a trainer class

  • further refactor

  • update runner call

  • update docs (72179c8)

  • Enable autocast for LM activation creation (#157)

  • add LM autocasting

  • add script to test autocast performance

  • format fix

  • update autocast demo script (cf94845)

  • gemma 2b sae resid post 12. fix ghost grad print (2a676b2)

  • don't hardcode hook (a10283d)

  • add mlp out SAEs to from pretrained (ee9291e)

  • remove resuming ability, keep resume config but complain if true (#156) (64e4dcd)

  • Add notebook to transfer W&B models to HF (#154)

hard to check this works quickly but assuming it does. (91239c1)

  • Remove sae parallel training, simplify code (#155)

  • remove sae parallel training, simplify code

  • remove unused import

  • remove accidental inclusion of file

(not tagging this as breaking since we're do a new major release this week and I don't want to keep bumping the major version) (f445fdf)

  • Update pretrained_saes.yaml (37fb150)

  • Ansible: update incorrect EC2 quota request link (432c7e1)

  • Merge pull request #153 from jbloomAus/ansible_dev

Ansible: dev only mode (51d2175)

  • Ansible: dev only mode (027460f)

  • feature: add gemma-2b bootleg saes (#152) (b9b7e32)

v2.1.3 (2024-05-15)

Fix

  • fix: Fix normalisation (#150)

  • fix GPT2 sweep settings to use correct dataset

  • add gpt2 small block sweep to check norm

  • larger buffer + more evals

  • fix activation rescaling so normalisation works

  • formatting fixes (9ce0fe4)

Unknown

  • Fix checkpointing of training state that includes a compiled SAE (#143)

  • Adds state_dict to L1Scheduler

  • investigating test failure

  • fix: Fix issues with resumption testing (#144)

  • fix always-true comparison in train context testing

  • set default warmup steps to zero

  • remove unused type attribute from L1Scheduler

  • update training tests to use real context builder

  • add docstring for build_train_ctx

  • 2.1.2

Automatically generated by python-semantic-release

  • Adds state_dict to L1Scheduler

  • investigating test failure


Co-authored-by: github-actions <github-actions@github.com> (2f8c4e1)

  • fix GPT2 sweep settings to use correct dataset (#147)

  • fix GPT2 sweep settings to use correct dataset

  • add gpt2 small block sweep to check norm

  • larger buffer + more evals


Co-authored-by: Joseph Bloom <69127271+jbloomAus@users.noreply.github.com> (448d911)

  • Pretokenize runner (#148)

  • feat: adding a pretokenize runner

  • rewriting pretokenization based on feedback (f864178)

  • Fix config files for Ansible (ec70cea)

  • Pin Ansible config example to a specific version, update docs (#142)

  • Pin Ansible config example to a specific version, update docs

  • Allow running cache acts or train sae separately. Update README

  • Update readme (41785ae)

v2.1.2 (2024-05-13)

Fix

  • fix: Fix issues with resumption testing (#144)

  • fix always-true comparison in train context testing

  • set default warmup steps to zero

  • remove unused type attribute from L1Scheduler

  • update training tests to use real context builder

  • add docstring for build_train_ctx (085d04f)

v2.1.1 (2024-05-13)

Fix

  • fix: hardcoded mps device in ckrk attn saes (#141) (eba3f4e)

Unknown

  • feature: run saelens on AWS with one command (#138)

  • Ansible playbook for automating caching activations and training saes

  • Add automation

  • Fix example config

  • Fix bugs with ansible mounting s3

  • Reorg, more automation, Ubuntu instead of Amazon Linux

  • More automation

  • Train SAE automation

  • Train SAEs and readme

  • fix gitignore

  • Fix automation config bugs, clean up paths

  • Fix shutdown time, logs (13de52a)

  • Gpt 2 sweep (#140)

  • sweep settings for gpt2-small

  • get model string right

  • fix some comments that don't apply now

  • formatting fix (4cb270b)

  • Remove cuda cache emptying in evals.py (#139) (bdef2cf)

v2.1.0 (2024-05-11)

Chore

  • chore: remove use_deterministic_algorithms=True since it causes cuda errors (#137) (1a3bedb)

Feature

  • feat: Hooked toy model (#134)

  • adds initial re-implementation of toy models

  • removes instance dimension from toy models

  • fixing up minor nits and adding more tests


Co-authored-by: David Chanin <chanindav@gmail.com> (03aa25c)

v2.0.0 (2024-05-10)

Breaking

  • feat: rename batch sizes to give informative units (#133)

BREAKING CHANGE: renamed batch sizing config params

  • renaming batch sizes to give units

  • changes in notebooks

  • missed one!


Co-authored-by: David Chanin <chanindav@gmail.com> (cc78e27)

Chore

  • chore: tools to make tests more deterministic (#132) (2071d09)

  • chore: Make tutorial notebooks work in Google Colab (#120)

Co-authored-by: David Chanin <chanindav@gmail.com> (007141e)

v1.8.0 (2024-05-09)

Chore

  • chore: closing " in docs (#130) (5154d29)

Feature

  • feat: Add model_from_pretrained_kwargs as config parameter (#122)

  • add model_from_pretrained_kwargs config parameter to allow full control over model used to extract activations from. Update tests to cover new cases

  • tweaking test style


Co-authored-by: David Chanin <chanindav@gmail.com> (094b1e8)

v1.7.0 (2024-05-08)

Feature

  • feat: Add torch compile (#129)

  • Surface # of eval batches and # of eval sequences

  • fix formatting

  • config changes

  • add compilation to lm_runner.py

  • remove accidental print statement

  • formatting fix (5c41336)

  • feat: Change eval batch size (#128)

  • Surface # of eval batches and # of eval sequences

  • fix formatting

  • fix print statement accidentally left in (758a50b)

v1.6.1 (2024-05-07)

Fix

  • fix: Revert "feat: Add kl eval (#124)" (#127)

This reverts commit c1d9cbe8627f27f4d5384ed4c9438c3ad350d412. (1a0619c)

v1.6.0 (2024-05-07)

Feature

  • feat: Add bf16 autocast (#126)

  • add bf16 autocast and gradient scaling

  • simplify autocast setup

  • remove completed TODO

  • add autocast dtype selection (generally keep bf16)

  • formatting fix

  • remove autocast dtype (8e28bfb)

v1.5.0 (2024-05-07)

Feature

  • feat: Add kl eval (#124)

  • add kl divergence to evals.py

  • fix linter (c1d9cbe)

Unknown

  • major: How we train saes replication (#123)

  • l1 scheduler, clip grad norm

  • add provisional ability to normalize activations

  • notebook

  • change heuristic norm init to constant, report b_e and W_dec norms (fix tests later)

  • fix mse calculation

  • add benchmark test

  • update heuristic init to 0.1

  • make tests pass device issue

  • continue rebase

  • use better args in benchmark

  • remove stack in get activations

  • broken! improve CA runner

  • get cache activation runner working and add some tests

  • add training steps to path

  • avoid ghost grad tensor casting

  • enable download of full dataset if desired

  • add benchmark for cache activation runner

  • add updated tutorial

  • format


Co-authored-by: Johnny Lin <hijohnnylin@gmail.com> (5f46329)

v1.4.0 (2024-05-05)

Feature

  • feat: Store state to allow resuming a run (#106)

  • first pass of saving

  • added runner resume code

  • added auto detect most recent checkpoint code

  • make linter happy (and one small bug)

  • blak code formatting

  • isort

  • help pyright

  • black reformatting:

  • activations store flake

  • pyright typing

  • black code formatting

  • added test for saving and loading

  • bigger training set

  • black code

  • move to pickle

  • use pickle because safetensors doesn't support all the stuff needed for optimizer and scheduler state

  • added resume test

  • added wandb_id for resuming

  • use wandb id for checkpoint

  • moved loaded to device and minor fixes to resuming


Co-authored-by: David Chanin <chanindav@gmail.com> (4d12e7a)

Unknown

  • Fix: sparsity norm calculated at incorrect dimension. (#119)

  • Fix: sparsity norm calculated at incorrect dimension.

For L1 this does not effect anything as essentially it's calculating the abs() and average everything. For L2 this is problematic as L2 involves sum and sqrt. Unexpected behaviors occur when x is of shape (batch, sen_length, hidden_dim).

  • Added tests.

  • Changed sparsity calculation to handle 3d inputs. (ce95fb2)

v1.3.0 (2024-05-03)

Feature

  • feat: add activation bins for neuronpedia outputs, and allow customizing quantiles (#113) (05d650d)

  • feat: Update for Neuropedia auto-interp (#112)

  • cleanup Neuronpedia autointerp code

  • Fix logic bug with OpenAI key


Co-authored-by: Joseph Bloom <69127271+jbloomAus@users.noreply.github.com> (033283d)

  • feat: SparseAutoencoder.from_pretrained() similar to transformer lens (#111)

  • add partial work so David can continue

  • feat: adding a SparseAutoencoder.from_pretrained() function


Co-authored-by: jbloomaus <jbloomaus@gmail.com> (617d416)

Fix

  • fix: replace list_files_info with list_repo_tree (#117) (676062c)

  • fix: Improved activation initialization, fix using argument to pass in API key (#116) (7047bcc)

v1.2.0 (2024-04-29)

Feature

  • feat: breaks up SAE.forward() into encode() and decode() (#107)

  • breaks up SAE.forward() into encode() and decode()

  • cleans up return typing of encode by splitting into a hidden and public function (7b4311b)

v1.1.0 (2024-04-29)

Feature

  • feat: API for generating autointerp + scoring for neuronpedia (#108)

  • API for generating autointerp for neuronpedia

  • Undo pytest vscode setting change

  • Fix autointerp import

  • Use pypi import for automated-interpretability (7c43c4c)

v1.0.0 (2024-04-27)

Breaking

  • chore: empty commit to bump release

BREAKING CHANGE: v1 release (2615a3e)

Chore

  • chore: fix outdated lr_scheduler_name in docs (#109)

  • chore: fix outdated lr_scheduler_name in docs

  • add tutorial hparams (7cba332)

Unknown

  • BREAKING CHANGE: 1.0.0 release

BREAKING CHANGE: 1.0.0 release (c23098f)

  • Neuronpedia: allow resuming upload (#102) (0184671)

v0.7.0 (2024-04-24)

Feature

  • feat: make a neuronpedia list with features via api call (#101) (23e680d)

Unknown

  • Merge pull request #100 from jbloomAus/np_improvements

Improvements to Neuronpedia Runner (5118f7f)

  • neuronpedia: save run settings to json file to avoid errors when resuming later. automatically skip batch files that already exist (4b5412b)

  • skip batch file if it already exists (7d0e396)

  • neuronpedia: include log sparsity threshold in skipped_indexes.json (5c967e7)

v0.6.0 (2024-04-21)

Chore

  • chore: enabling pythong 3.12 checks for CI (25526ea)

  • chore: setting up precommit to be consistent with CI (18e706d)

Feature

  • feat: Added tanh-relu activation fn and input noise options (#77)

  • Still need to pip-install from GitHub hufy implementation.

  • Added support for tanh_sae.

  • Added notebook for loading the tanh_sae

  • tweaking config options to be more declarating / composable

  • testing adding noise to SAE forward pass

  • updating notebook


Co-authored-by: David Chanin <chanindav@gmail.com> (551e94d)

Unknown

  • Update proposal.md (6d45b33)

  • Merge pull request #96 from jbloomAus/github-templates

add templates for PR's / issues (241a201)

  • add templates for PR's / issues (74ff597)

  • Merge pull request #95 from jbloomAus/load-state-dict-not-strict

Make load_state_dict use strict=False (4a9e274)

  • fix accidental bug (c22fbbd)

  • fix load pretrained legacy with state dict change (b5e97f8)

  • Make load_state_dict use strict=False (fdf7fe9)

  • Merge pull request #94 from jbloomAus/update-pre-commit

chore: setting up precommit to be consistent with CI (6a056b7)

  • Merge pull request #87 from evanhanders/old_to_new

Adds function that converts old .pt pretrained SAEs to new folder format (1cb1725)

  • Merge pull request #93 from jbloomAus/py-312-ci

chore: enabling python 3.12 checks for CI (87be422)

v0.5.1 (2024-04-19)

Chore

  • chore: re-enabling isort in CI (#86) (9c44731)

Fix

  • fix: pin pyzmq==26.0.1 temporarily (0094021)

  • fix: typing issue, temporary (25cebf1)

Unknown

  • v0.5.1 (0ac218b)

  • fixes string vs path typing errors (94f1fc1)

  • removes unused import (06406b0)

  • updates formatting for alignment with repo standards. (5e1f342)

  • consolidates with SAE class load_legacy function & adds test (0f85ded)

  • adds old->new file conversion function (fda2b57)

  • Merge pull request #91 from jbloomAus/decoder-fine-tuning

Decoder fine tuning (1fc652c)

  • par update (2bb5975)

  • Merge pull request #89 from jbloomAus/fix_np

Enhance + Fix Neuronpedia generation / upload (38d507c)

  • minor changes (bc766e4)

  • reformat run.ipynb (822882c)

  • get decoder fine tuning working (11a71e1)

  • format (040676d)

  • Merge pull request #88 from jbloomAus/get_feature_from_neuronpedia

FEAT: Add API for getting Neuronpedia feature (1666a68)

  • Fix resuming from batch (145a407)

  • Use original repo for sae_vis (1a7d636)

  • Use correct model name for np runner (138d5d4)

  • Merge main, remove eindex (6578436)

  • Add API for getting Neuronpedia feature (e78207d)

v0.5.0 (2024-04-17)

Feature

  • feat: Mamba support vs mamba-lens (#79)

  • mamba support

  • added init

  • added optional model kwargs

  • Support transformers and mamba

  • forgot one model kwargs

  • failed opts

  • tokens input

  • hack to fix tokens, will look into fixing mambalens

  • fixed checkpoint

  • added sae group

  • removed some comments and fixed merge error

  • removed unneeded params since that issue is fixed in mambalens now

  • Unneded input param

  • removed debug checkpoing and eval

  • added refs to hookedrootmodule

  • feed linter

  • added example and fixed loading

  • made layer for eval change

  • fix linter issues

  • adding mamba-lens as optional dep, and fixing typing/linting

  • adding a test for loading mamba model

  • adding mamba-lens to dev for CI

  • updating min mamba-lens version

  • updating mamba-lens version


Co-authored-by: David Chanin <chanindav@gmail.com> (eea7db4)

Unknown

  • update readme (440df7b)

  • update readme (3694fd2)

  • Fix upload skipped/dead features (932f380)

  • Use python typer instead of shell script for neuronpedia jobs (b611e72)

  • Merge branch 'main' into fix_np (cc6cb6a)

  • convert sparsity to log sparsity if needed (8d7d404)

v0.4.0 (2024-04-16)

Feature

  • feat: support orthogonal decoder init and no pre-decoder bias (ac606a3)

Fix

  • fix: sae dict bug (484163e)

  • fix: session loader wasn't working (a928d7e)

Unknown

  • enable setting adam pars in config (1e53ede)

  • fix sae dict loader and format (c558849)

  • default orthogonal init false (a8b0113)

  • Formatting (1e3d53e)

  • Eindex required by sae_vis (f769e7a)

  • Upload dead feature stubs (9067380)

  • Make feature sparsity an argument (8230570)

  • Fix buffer" (dde2481)

  • Merge branch 'main' into fix_np (6658392)

  • notebook update (feca408)

  • Merge branch 'main' into fix_np (f8fb3ef)

  • Final fixes (e87788d)

  • Don't use buffer, fix anomalies (2c9ca64)

v0.3.0 (2024-04-15)

Feature

  • feat: add basic tutorial for training saes (1847280)

v0.2.2 (2024-04-15)

Fix

  • fix: dense batch dim mse norm optional (8018bc9)

Unknown

  • format (c359c27)

  • make dense_batch_mse_normalization optional (c41774e)

  • Runner is fixed, faster, cleaned up, and now gives whole sequences instead of buffer. (3837884)

  • Merge branch 'main' into fix_np (3ed30cf)

  • add warning in run script (9a772ca)

  • update sae loading code (356a8ef)

  • add device override to session loader (96b1e12)

  • update readme (5cd5652)

v0.2.1 (2024-04-13)

Fix

  • fix: neuronpedia quicklist (6769466)

v0.2.0 (2024-04-13)

Chore

  • chore: improving CI speed (9e3863c)

  • chore: updating README.md with pip install instructions and PyPI badge (682db80)

Feature

  • feat: overhaul saving and loading (004e8f6)

Unknown

  • Use legacy loader, add back histograms, logits. Fix anomaly characters. (ebbb622)

  • Merge branch 'main' into fix_np (586e088)

  • Merge pull request #80 from wllgrnt/will-update-tutorial

bugfix - minimum viable updates to tutorial notebook (e51016b)

  • minimum viable fixes to evaluation notebook (b907567)

  • Merge pull request #76 from jbloomAus/faster-ci

perf: improving CI speed (8b00000)

  • try partial cache restore (392f982)

  • Merge branch 'main' into faster-ci (89e1568)

  • Merge pull request #78 from jbloomAus/fix-artifact-saving-loading

Fix artifact saving loading (8784c74)

  • remove duplicate code (6ed6af5)

  • set device in load from pretrained (b4e12cd)

  • fix typing issue which required ignore (a5df8b0)

  • remove print statement (295e0e4)

  • remove load with session option (74926e1)

  • fix broken test (16935ef)

  • avoid tqdm repeating during training (1d70af8)

  • avoid division by 0 (2c7c6d8)

  • remove old notebook (e1ad1aa)

  • use-sae-dict-not-group (27f8003)

  • formatting (827abd0)

  • improve artifact loading storage, tutorial forthcoming (604f102)

  • add safetensors to project (0da48b0)

  • Don't precompute background colors and tick values (271dbf0)

  • Merge pull request #71 from weissercn/main

Addressing notebook issues (8417505)

  • Merge pull request #70 from jbloomAus/update-readme-install

chore: updating README.md with pip install instructions and PyPI badge (4d7d1e7)

  • FIX: Add back correlated neurons, frac_nonzero (d532b82)

  • linting (1db0b5a)

  • fixed graph name (ace4813)

  • changed key for df_enrichment_scores, so it can be run (f0a9d0b)

  • fixed space in notebook 2 (2278419)

  • fixed space in notebook 2 (24a6696)

  • fixed space in notebook (d2f8c8e)

  • fixed pickle backwards compatibility in tutorial (3a97a04)

v0.1.0 (2024-04-06)

Feature

Fix

  • fix: removing paths-ignore from action to avoid blocking releases (28ff797)

  • fix: updating saevis version to use pypi (dbd96a2)

Unknown

  • Merge pull request #69 from chanind/remove-ci-ignore

fix: removing paths-ignore from action to avoid blocking releases (179cea1)

  • Update README.md (1720ce8)

  • Merge pull request #68 from chanind/updating-sae-vis

fix: hotfix updating saevis version to use pypi (a13cee3)

v0.0.0 (2024-04-06)

Chore

  • chore: adding more tests to ActivationsStore + light refactoring (cc9899c)

  • chore: running isort to fix imports (53853b9)

  • chore: setting up pyright type checking and fixing typing errors (351995c)

  • chore: enable full flake8 default rules list (19886e2)

  • chore: using poetry for dependency management (465e003)

  • chore: removing .DS_Store files (32f09b6)

Unknown

  • Merge pull request #66 from chanind/pypi

feat: setting up sae_lens package and auto-deploy with semantic-release (34633e8)

  • Merge branch 'main' into pypi (3ce7f99)

  • Merge pull request #60 from chanind/improve-config-typing

fixing config typing (b8fba4f)

  • setting up sae_lens package and auto-deploy with semantic-release (ba41f32)

  • fixing config typing

switch to using explicit params for ActivationsStore config instead of RunnerConfig base class (9be3445)

  • Merge pull request #65 from chanind/fix-forgotten-scheduler-opts

passing accidentally overlooked scheduler opts (773bc02)

  • passing accidentally overlooked scheduler opts (ad089b7)

  • Merge pull request #64 from chanind/lr-decay

adding lr_decay_steps and refactoring get_scheduler (c960d99)

  • adding lr_decay_steps and refactoring get_scheduler (fd5448c)

  • Merge pull request #53 from hijohnnylin/neuronpedia_runner

Generate and upload Neuronpedia artifacts (0b94f84)

  • format (792c7cb)

  • ignore type incorrectness in imported package (5fe83a9)

  • Merge pull request #63 from chanind/remove-eindex

removing unused eindex depencency (1ce44d7)

  • removing unused eindex depencency (7cf991b)

  • Safe to_str_tokens, fix memory issues (901b888)

  • Allow starting neuronpedia generation at a specific batch numbe (85d8f57)

  • FIX: Linting 'do not use except' (ce3d40c)

  • Fix vocab: Ċ should be line break. Also set left and right buffers (205b1c1)

  • Merge (b159010)

  • Update Neuronpedia Runner (885de27)

  • Merge pull request #58 from canrager/main

Make prepend BOS optional: Default True (48a07f9)

  • make tests pass with use_bos flag (618d4bb)

  • Merge pull request #59 from chanind/fix-docs-deploy

attempting to fix docs deploy (cfafbe7)

Adding tests to get_scheduler (13c8085)

  • Merge pull request #56 from chanind/sae-tests

minor refactoring to SAE and adding tests (2c425ca)

  • minor refactoring to SAE and adding tests (92a98dd)

  • adding tests to get_scheduler (3b7e173)

  • Generate and upload Neuronpedia artifacts (b52e0e2)

  • Merge pull request #54 from jbloomAus/hook_z_suppourt

notional support, needs more thorough testing (277f35b)

  • Merge pull request #55 from chanind/contributing-docs

adding a contribution guide to docs (8ac8f05)

  • adding a contribution guide to docs (693c5b3)

  • notional support, needs more thorough testing (9585022)

  • Generate and upload Neuronpedia artifacts (4540268)

  • Merge pull request #52 from hijohnnylin/fix_db_runner_assert

FIX: Don't check wandb assert if not using wandb (5c48811)

  • FIX: Don't check wandb assert if not using wandb (1adefda)

  • add docs badge (f623ed1)

  • try to get correct deployment (777dd6c)

  • Merge pull request #51 from jbloomAus/mkdocs

Add Docs to the project. (d2ebbd7)

  • mkdocs, test (9f14250)

  • code cov (2ae6224)

  • Merge pull request #48 from chanind/fix-sae-vis-version

Pin sae_vis to previous working version (3f8a30b)

  • fix suffix issue (209ba13)

  • pin sae_vis to previous working version (ae0002a)

  • don't ignore changes to .github (35fdeec)

  • add cov report (971d497)

  • Merge pull request #40 from chanind/refactor-train-sae

Refactor train SAE and adding unit tests (5aa0b11)

  • Merge branch 'main' into refactor-train-sae (0acdcb3)

  • Merge pull request #41 from jbloomAus/move_to_sae_vis

Move to sae vis (bcb9a52)

  • flake8 can ignore imports, we're using isort anyway (6b7ae72)

  • format (af680e2)

  • fix mps bug (e7b238f)

  • more tests (01978e6)

  • wip (4c03b3d)

  • more tests (7c1cb6b)

  • testing that sparsity counts get updated correctly (5b5d653)

  • adding some unit tests to _train_step() (dbf3f01)

  • Merge branch 'main' into refactor-train-sae (2d5ec98)

  • Update README.md (d148b6a)

  • Merge pull request #20 from chanind/activations_store_tests

chore: adding more tests to ActivationsStore + light refactoring (69dcf8e)

  • Merge branch 'main' into activations_store_tests (4896d0a)

  • refactoring train_sae_on_language_model.py into smaller functions (e75a15d)

  • suppourt apollo pretokenized datasets (e814054)

  • handle saes saved before groups (5acd89b)

  • typechecker (fa6cc49)

  • fix geom median bug (8d4a080)

  • remove references to old code (861151f)

  • remove old geom median code (05e0aca)

  • Merge pull request #22 from themachinefan/faster_geometric_median

Faster geometric median. (341c49a)

  • makefile check type and types of geometric media (736bf83)

  • Merge pull request #21 from schmatz/fix-dashboard-image

Fix broken dashboard image on README (eb90cc9)

  • Merge pull request #24 from neelnanda-io/add-post-link

Added link to AF post (39f8d3d)

  • Added link to AF post (f0da9ea)

  • formatting (0168612)

  • use device, don't use cuda if not there (20334cb)

  • format (ce49658)

  • fix tsea typing (449d90f)

  • faster geometric median. Run geometric_median,py to test. (92cad26)

  • Fix dashboard image (6358862)

  • fix incorrect code used to avoid typing issue (ed0b0ea)

  • add nltk (bc7e276)

  • ignore various typing issues (6972c00)

  • add babe package (481069e)

  • make formatter happy (612c7c7)

  • share scatter so can link (9f88dc3)

  • add_analysis_files_for_post (e75323c)

  • don't block on isort linting (3949a46)

  • formatting (951a320)

  • Update README.md (b2478c1)

  • Merge pull request #18 from chanind/type-checking

chore: setting up pyright type checking and fixing typing errors (bd5fc43)

  • Merge branch 'main' into type-checking (57c4582)

  • Merge pull request #17 from Benw8888/sae_group_pr

SAE Group for sweeps PR (3e78bce)

  • Merge pull request #1 from chanind/sae_group_pr_isort_fix

chore: running isort to fix imports (dd24413)

  • black format (0ffcf21)

  • fixed expansion factor sweep (749b8cf)

  • remove tqdm from data loader, too noisy (de3b1a1)

  • fix tests (b3054b1)

  • don't calculate geom median unless you need to (d31bc31)

  • add to method (b3f6dc6)

  • flake8 and black (ed8345a)

  • flake8 linter changes (8e41e59)

  • Merge branch 'main' into sae_group_pr (082c813)

  • Delete evaluating.ipynb (d3cafa3)

  • Delete activation_storing.py (fa82992)

  • Delete lp_sae_training.py (0d1e1c9)

  • implemented SAE groups (66facfe)

  • Merge pull request #16 from chanind/flake-default-rules

chore: enable full flake8 default rules list (ad84706)

  • implemented sweeping via config list (80f61fa)

  • Merge pull request #13 from chanind/poetry

chore: using poetry for dependency management (496f7b4)

  • progress on implementing multi-sae support (2ba2131)

  • Merge pull request #11 from lucyfarnik/fix-caching-shuffle-edge-case

Fixed edge case in activation cache shuffling (3727b5d)

  • Merge pull request #12 from lucyfarnik/add-run-name-to-config

Added run name to config (c2e05c4)

  • Added run name to config (ab2aabd)

  • Fixed edge case in activation cache shuffling (18fd4a1)

  • Merge pull request #9 from chanind/rm-ds-store

chore: removing .DS_Store files (37771ce)

  • improve readmen (f3fe937)

  • fix_evals_bad_rebase (22e415d)

  • evals changes, incomplete (736c40e)

  • make tutorial independent of artefact and delete old artefact (6754e65)

  • fix MSE in ghost grad (44f7988)

  • Merge pull request #5 from jbloomAus/clean_up_repo

Add CI/CD, black formatting, pre-commit with flake8 linting. Fix some bugs. (01ccb92)

  • clean up run examples (9d46bdd)

  • move where we save the final artifact (f445fac)

  • fix activations store innefficiency (07d38a0)

  • black format and linting (479765b)

  • dummy file change (912a748)

  • try adding this branch listed specifically (7fd0e0c)

  • yml not yaml (9f3f1c8)

  • add ci (91aca91)

  • get units tests working (ade2976)

  • make unit tests pass, add make file (08b2c92)

  • add pytest-cov to requirements.txt (ce526df)

  • seperate research from main repo (32b668c)

  • remove comma and set default store batch size lower (9761b9a)

  • notebook for Johny (39a18f2)

  • best practices ghost grads fix (f554b16)

  • Update README.md

improved the hyperpars (2d4caf6)

  • dashboard runner (a511223)

  • readme update (c303c55)

  • still hadn't fixed the issue, now fixed (a36ee21)

  • fix mean of loss which broke in last commit (b4546db)

  • generate dashboards (35fa631)

  • Merge pull request #3 from jbloomAus/ghost_grads_dev

Ghost grads dev (4d150c2)

  • save final log sparsity (98e4f1b)

  • start saving log sparsity (4d6df6f)

  • get ghost grads working (e863ed7)

  • add notebook/changes for ghost-grad (not working yet) (73053c1)

  • idk, probs good (0407ad9)

  • bunch of shit (1ec8f97)

  • Merge branch 'main' of github.com:jbloomAus/mats_sae_training (a22d856)

  • Reverse engineering the "not only... but" feature (74d4fb8)

  • Merge pull request #2 from slavachalnev/no_reinit

Allow sampling method to be None (4c5fed8)

  • Allow sampling method to be None (166799d)

  • research/week_15th_jan/gpt2_small_resid_pre_3.ipynb (52a1da7)

  • add arg for dead neuron calc (ffb75fb)

  • notebooks for lucy (0319d89)

  • add args for b_dec_init (82da877)

  • add geom median as submodule instead (4c0d001)

  • add geom median to req (4c8ac9d)

  • add-geometric-mean-b_dec-init (d5853f8)

  • reset feature sparsity calculation (4c7f6f2)

  • anthropic sampling (048d267)

  • get anthropic resampling working (ca74543)

  • add ability to finetune existing autoencoder (c1208eb)

  • run notebook (879ad27)

  • switch to batch size independent loss metrics (0623d39)

  • track mean sparsity (75f1547)

  • don't stop early (44078a6)

  • name runs better (5041748)

  • improve-eval-metrics-for-attn (00d9b65)

  • add hook q (b061ee3)

  • add copy suppression notebook (1dc893a)

  • fix check in neuron sampling (809becd)

  • Merge pull request #1 from jbloomAus/activations_on_disk

Activations on disk (e5f198e)

  • merge into main (94ed3e6)

  • notebook (b5344a3)

  • various research notebooks (be63fce)

  • Added activations caching to run.ipynb (054cf6d)

  • Added activations dir to gitignore (c4a31ae)

  • Saving and loading activations from disk (309e2de)

  • Fixed typo that threw out half of activations (5f73918)

  • minor speed improvement (f7ea316)

  • add notebook with different example runs (c0eac0a)

  • add ability to train on attn heads (18cfaad)

  • add gzip for pt artefacts (9614a23)

  • add_example_feature_dashboard (e90e54d)

  • get_shit_done (ce73042)

  • commit_various_things_in_progress (3843c39)

  • add sae visualizer and tutorial (6f4030c)

  • make it possible to load sae trained on cuda onto mps (3298b75)

  • reduce hist freq, don't cap re-init (debcf0f)

  • add loader import to readme (b63f14e)

  • Update README.md (88f086b)

  • improve-resampling (a3072c2)

  • add readme (e9b8e56)

  • fixl0_plus_other_stuff (2f162f0)

  • add checkpoints (4cacbfc)

  • improve_model_saving_loading (f6697c6)

  • stuff (19d278a)

  • Added support for non-tokenized datasets (afcc239)

  • notebook_for_keith (d06e09b)

  • fix resampling bug (2b43980)

  • test pars (f601362)

  • further-lm-improvments (63048eb)

  • get_lm_working_well (eba5f79)

  • basic-lm-training-currently-broken (7396b8b)

  • set_up_lm_runner (d1095af)

  • fix old test, may remove (b407aab)

  • happy with hyperpars on benchmark (836298a)

  • improve metrics (f52c7bb)

  • make toy model runner (4851dd1)

  • various-changes-toy-model-test (a61b75f)

  • Added activation store and activation gathering (a85f24d)

  • First successful run on toy models (4927145)

  • halfway-to-toy-models (feeb411)

  • Initial commit (7a94b0e)