fix(docs): explain converting Textual Inversions, using layer tokens, and prompt range syntax (#179)
This commit is contained in:
parent
30b08c6d6d
commit
78005812f3
|
@ -14,6 +14,9 @@ weights, to the directories used by `diffusers` and on to the ONNX models used b
|
||||||
- [Figuring out which script produced the LoRA weights](#figuring-out-which-script-produced-the-lora-weights)
|
- [Figuring out which script produced the LoRA weights](#figuring-out-which-script-produced-the-lora-weights)
|
||||||
- [LoRA weights from cloneofsimo/lora](#lora-weights-from-cloneofsimolora)
|
- [LoRA weights from cloneofsimo/lora](#lora-weights-from-cloneofsimolora)
|
||||||
- [LoRA weights from kohya-ss/sd-scripts](#lora-weights-from-kohya-sssd-scripts)
|
- [LoRA weights from kohya-ss/sd-scripts](#lora-weights-from-kohya-sssd-scripts)
|
||||||
|
- [Converting Textual Inversion embeddings](#converting-textual-inversion-embeddings)
|
||||||
|
- [Figuring out what token a Textual Inversion uses](#figuring-out-what-token-a-textual-inversion-uses)
|
||||||
|
- [Figuring out how many layers a Textual Inversion uses](#figuring-out-how-many-layers-a-textual-inversion-uses)
|
||||||
|
|
||||||
## Conversion steps for each type of model
|
## Conversion steps for each type of model
|
||||||
|
|
||||||
|
@ -25,11 +28,14 @@ You can start from a diffusers directory, HuggingFace Hub repository, or an SD c
|
||||||
3. diffusers directory or LoRA weights from `cloneofsimo/lora` to...
|
3. diffusers directory or LoRA weights from `cloneofsimo/lora` to...
|
||||||
4. ONNX models
|
4. ONNX models
|
||||||
|
|
||||||
One disadvantage of using ONNX is that LoRA weights must be merged with the base model before being converted,
|
Textual inversions can be converted directly to ONNX by merging them with the base model.
|
||||||
so the final output is roughly the size of the base model. Hopefully this can be reduced in the future.
|
|
||||||
|
One current disadvantage of using ONNX is that LoRA weights must be merged with the base model before being converted,
|
||||||
|
so the final output is roughly the size of the base model. Hopefully this can be reduced in the future
|
||||||
|
(https://github.com/ssube/onnx-web/issues/213).
|
||||||
|
|
||||||
If you are using the Auto1111 web UI or another tool, you may not need to convert the models to ONNX. In that case,
|
If you are using the Auto1111 web UI or another tool, you may not need to convert the models to ONNX. In that case,
|
||||||
you will not have an `extras.json` file and should skip step 4.
|
you will not have an `extras.json` file and should skip the last step.
|
||||||
|
|
||||||
## Converting diffusers models
|
## Converting diffusers models
|
||||||
|
|
||||||
|
@ -233,3 +239,51 @@ Make sure to set the `format` key and that it matches the format you used to exp
|
||||||
Based on docs in:
|
Based on docs in:
|
||||||
|
|
||||||
- https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#%E3%83%9E%E3%83%BC%E3%82%B8%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%97%E3%83%88%E3%81%AB%E3%81%A4%E3%81%84%E3%81%A6
|
- https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#%E3%83%9E%E3%83%BC%E3%82%B8%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%97%E3%83%88%E3%81%AB%E3%81%A4%E3%81%84%E3%81%A6
|
||||||
|
|
||||||
|
## Converting Textual Inversion embeddings
|
||||||
|
|
||||||
|
You can convert Textual Inversion embeddings by merging their weights and tokens into a copy of their base model,
|
||||||
|
which is directly supported by the conversion script in `onnx-web` with no additional steps.
|
||||||
|
|
||||||
|
Textual Inversions may have more than one set of weights, which can be used and controlled separately. Some Textual
|
||||||
|
Inversions provide their own token, but you can set a custom token for any of them.
|
||||||
|
|
||||||
|
### Figuring out what token a Textual Inversion uses
|
||||||
|
|
||||||
|
The base token, without any layer numbers, should be printed to the logs with the string `found embedding for token`:
|
||||||
|
|
||||||
|
```none
|
||||||
|
[2023-03-08 04:54:00,234] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token <concept>: torch.Size([768])
|
||||||
|
[2023-03-08 04:54:01,624] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 1 tokens
|
||||||
|
[2023-03-08 04:54:01,814] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving tokenizer for textual inversion
|
||||||
|
[2023-03-08 04:54:01,859] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving text encoder for textual inversion
|
||||||
|
```
|
||||||
|
|
||||||
|
If you have set a custom token, that will be shown instead. If more than one token has been added, they will be
|
||||||
|
numbered following the pattern `base-N`, starting with 0.
|
||||||
|
|
||||||
|
### Figuring out how many layers a Textual Inversion uses
|
||||||
|
|
||||||
|
Textual Inversions produced by [the Stable Conceptualizer notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)
|
||||||
|
only have a single layer, while many others have more than one.
|
||||||
|
|
||||||
|
The number of layers is shown in the server logs when the model is converted:
|
||||||
|
|
||||||
|
```none
|
||||||
|
[2023-03-08 04:54:00,234] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token <concept>: torch.Size([768])
|
||||||
|
[2023-03-08 04:54:01,624] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 1 tokens
|
||||||
|
[2023-03-08 04:54:01,814] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving tokenizer for textual inversion
|
||||||
|
[2023-03-08 04:54:01,859] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving text encoder for textual inversion
|
||||||
|
...
|
||||||
|
[2023-03-08 04:58:06,378] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: generating 74 layer tokens
|
||||||
|
[2023-03-08 04:58:06,379] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token ['goblin-0', 'goblin-1', 'goblin-2', 'goblin-3', 'goblin-4', 'goblin-5', 'goblin-6', 'gob
|
||||||
|
lin-7', 'goblin-8', 'goblin-9', 'goblin-10', 'goblin-11', 'goblin-12', 'goblin-13', 'goblin-14', 'goblin-15', 'goblin-16', 'goblin-17', 'goblin-18', 'goblin-19', 'goblin-20', 'goblin-21', 'goblin-22', 'goblin-23', 'goblin-24', 'goblin-25', 'goblin-26', 'goblin-27', 'goblin-28', 'goblin-29', 'goblin-30', 'goblin-31', 'goblin-32', 'goblin-33', 'goblin-34', 'goblin-35', 'goblin-36', 'goblin-37', 'goblin-38', 'goblin-39', 'goblin-40', 'goblin-41', 'goblin-42', 'goblin-43', 'goblin-44', 'goblin-45', 'goblin-46', 'goblin-47', 'goblin-48', 'goblin-49', 'goblin-50', 'goblin-51', 'goblin-52', 'goblin-53', 'goblin-54', 'goblin-55', 'goblin-56', 'goblin-57', 'goblin-58', 'goblin-59', 'goblin-60', 'goblin-61', 'goblin-62', 'goblin-63', 'goblin-64', 'goblin-65', 'goblin-66', 'goblin-67', 'goblin-68', 'goblin-69', 'goblin-70', 'goblin-71', 'goblin-72', 'goblin-73'] (*): torch.Size([74, 768])
|
||||||
|
[2023-03-08 04:58:07,685] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 74 tokens
|
||||||
|
[2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-0
|
||||||
|
[2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-1
|
||||||
|
[2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-2
|
||||||
|
[2023-03-08 04:58:07,875] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-3
|
||||||
|
```
|
||||||
|
|
||||||
|
Figuring out the number of layers after the model has been converted currently requires the original tensor file
|
||||||
|
(https://github.com/ssube/onnx-web/issues/212).
|
||||||
|
|
|
@ -31,6 +31,7 @@ Please see [the server admin guide](server-admin.md) for details on how to confi
|
||||||
- [Model sources](#model-sources)
|
- [Model sources](#model-sources)
|
||||||
- [Downloading models from Civitai](#downloading-models-from-civitai)
|
- [Downloading models from Civitai](#downloading-models-from-civitai)
|
||||||
- [Using a custom VAE](#using-a-custom-vae)
|
- [Using a custom VAE](#using-a-custom-vae)
|
||||||
|
- [Using and controlling Textual Inversions](#using-and-controlling-textual-inversions)
|
||||||
- [Tabs](#tabs)
|
- [Tabs](#tabs)
|
||||||
- [Txt2img tab](#txt2img-tab)
|
- [Txt2img tab](#txt2img-tab)
|
||||||
- [Scheduler parameter](#scheduler-parameter)
|
- [Scheduler parameter](#scheduler-parameter)
|
||||||
|
@ -300,6 +301,20 @@ Some common VAE models include:
|
||||||
- https://huggingface.co/stabilityai/sd-vae-ft-mse
|
- https://huggingface.co/stabilityai/sd-vae-ft-mse
|
||||||
- https://huggingface.co/stabilityai/sd-vae-ft-mse-original
|
- https://huggingface.co/stabilityai/sd-vae-ft-mse-original
|
||||||
|
|
||||||
|
### Using and controlling Textual Inversions
|
||||||
|
|
||||||
|
You can use a Textual Inversion along with a diffusion model by giving one or more of the tokens from the inversion
|
||||||
|
model. Some Textual Inversions only have a single layer and some have 75 or more.
|
||||||
|
|
||||||
|
You can provide more than one of the numbered layer tokens using the `base-{X,Y}` range syntax in your prompt. This
|
||||||
|
uses the Python range rules, so `X` is inclusive and `Y` is not. The range `autumn-{0,5}` will be expanded into the
|
||||||
|
tokens `autumn-0 autumn-1 autumn-2 autumn-3 autumn-4`. You can use the layer tokens individually, out of order, and
|
||||||
|
repeat some layers or omit them entirely. You can provide a step as the third parameter, which will skip layers:
|
||||||
|
`even-layers-{0,100,2}` will be expanded into
|
||||||
|
`even-layers-0 even-layers-2 even-layers-4 even-layers-6 ... even-layers-98`.
|
||||||
|
|
||||||
|
The range syntax does not currently work when the Long Prompt Weighting pipeline is enabled.
|
||||||
|
|
||||||
## Tabs
|
## Tabs
|
||||||
|
|
||||||
### Txt2img tab
|
### Txt2img tab
|
||||||
|
|
Loading…
Reference in New Issue