Textual Inversion

AUTOMATIC1111 edited this page Oct 2, 2022 · 27 revisions

What is Textual Inversion?

Textual Inversion allows you to train a tiny part of the neural network on your own pictures, and use results when generating new ones.

The result of training is a .pt or a .bin file (former is the format used by original author, latter is by the diffusers library).

See original site for more details about what textual inversion is: https://textual-inversion.github.io/.

Using pre-trained embeddings

Put the embedding into the embeddings directory and use its filename in the prompt. You don't have to restart the program for this to work.

As an example, here is an embedding of Usada Pekora I trained on WD1.2 model, on 53 pictures (119 augmented) for 19500 steps, with 8 vectors per token setting.

Pictures it generates: grid-0037

portrait of usada pekora
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 4077357776, Size: 512x512, Model hash: 45dee52b

You can combine multiple embeddings in one prompt: grid-0038

portrait of usada pekora, mignon
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 4077357776, Size: 512x512, Model hash: 45dee52b

Be very careful about which model you are using with your embeddings: they work well with the model you used during training, and not so well on different models. For example, here is the above embedding and vanilla 1.4 stable diffusion model: grid-0036

portrait of usada pekora
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 4077357776, Size: 512x512, Model hash: 7460a6fa

Training embeddings

Textual inversion tab

Experimental support for training embeddings in user interface.

create a new empty embedding, select directory with images, train the embedding on it
the feature is very raw, use at own risk
i was able to reproduce results I got with other repos in training anime artists as styles, after few tens of thousands steps
works with half precision floats, but needs experimentation to see if results will be just as good
if you have enough memory, safer to run with --no-half --precision full
no preprocessing is done for images (except for resizing to 512x512), not even flip
you can interrupt and resume training without any loss of data (except for AdamW optimization parameters, but it seems none of existing repos save those anyway so the general opinion is they are not important)
no support for batch sizes or gradient accumulation
it should not be possible to run this with --lowvram and --medvram flags.

Third party repos

I successfully trained embeddings using those repositories:

Other options are to train on colabs and/or using diffusers library, which I know nothing about.

Finding embeddings online

huggingface concepts library - a lot of different embeddings, but mostly useless.
16777216c - NSFW, anime artist styles by a mysterious stranger.
cattoroboto - some anime embeddings by anon.
viper1 - NSFW, furry girls.
anon's embeddings - NSFW, anime artists.
rentry - a page with links to embeddings from many sources.

This is the Stable Diffusion web UI wiki. Wiki Home