1
0
Fork 0

fix(docs): explain converting Textual Inversions, using layer tokens, and prompt range syntax (#179)

This commit is contained in:
Sean Sube 2023-03-07 23:14:29 -06:00
parent 30b08c6d6d
commit 78005812f3
Signed by: ssube
GPG Key ID: 3EED7B957D362AF1
2 changed files with 72 additions and 3 deletions

View File

@ -14,6 +14,9 @@ weights, to the directories used by `diffusers` and on to the ONNX models used b
- [Figuring out which script produced the LoRA weights](#figuring-out-which-script-produced-the-lora-weights) - [Figuring out which script produced the LoRA weights](#figuring-out-which-script-produced-the-lora-weights)
- [LoRA weights from cloneofsimo/lora](#lora-weights-from-cloneofsimolora) - [LoRA weights from cloneofsimo/lora](#lora-weights-from-cloneofsimolora)
- [LoRA weights from kohya-ss/sd-scripts](#lora-weights-from-kohya-sssd-scripts) - [LoRA weights from kohya-ss/sd-scripts](#lora-weights-from-kohya-sssd-scripts)
- [Converting Textual Inversion embeddings](#converting-textual-inversion-embeddings)
- [Figuring out what token a Textual Inversion uses](#figuring-out-what-token-a-textual-inversion-uses)
- [Figuring out how many layers a Textual Inversion uses](#figuring-out-how-many-layers-a-textual-inversion-uses)
## Conversion steps for each type of model ## Conversion steps for each type of model
@ -25,11 +28,14 @@ You can start from a diffusers directory, HuggingFace Hub repository, or an SD c
3. diffusers directory or LoRA weights from `cloneofsimo/lora` to... 3. diffusers directory or LoRA weights from `cloneofsimo/lora` to...
4. ONNX models 4. ONNX models
One disadvantage of using ONNX is that LoRA weights must be merged with the base model before being converted, Textual inversions can be converted directly to ONNX by merging them with the base model.
so the final output is roughly the size of the base model. Hopefully this can be reduced in the future.
One current disadvantage of using ONNX is that LoRA weights must be merged with the base model before being converted,
so the final output is roughly the size of the base model. Hopefully this can be reduced in the future
(https://github.com/ssube/onnx-web/issues/213).
If you are using the Auto1111 web UI or another tool, you may not need to convert the models to ONNX. In that case, If you are using the Auto1111 web UI or another tool, you may not need to convert the models to ONNX. In that case,
you will not have an `extras.json` file and should skip step 4. you will not have an `extras.json` file and should skip the last step.
## Converting diffusers models ## Converting diffusers models
@ -233,3 +239,51 @@ Make sure to set the `format` key and that it matches the format you used to exp
Based on docs in: Based on docs in:
- https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#%E3%83%9E%E3%83%BC%E3%82%B8%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%97%E3%83%88%E3%81%AB%E3%81%A4%E3%81%84%E3%81%A6 - https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#%E3%83%9E%E3%83%BC%E3%82%B8%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%97%E3%83%88%E3%81%AB%E3%81%A4%E3%81%84%E3%81%A6
## Converting Textual Inversion embeddings
You can convert Textual Inversion embeddings by merging their weights and tokens into a copy of their base model,
which is directly supported by the conversion script in `onnx-web` with no additional steps.
Textual Inversions may have more than one set of weights, which can be used and controlled separately. Some Textual
Inversions provide their own token, but you can set a custom token for any of them.
### Figuring out what token a Textual Inversion uses
The base token, without any layer numbers, should be printed to the logs with the string `found embedding for token`:
```none
[2023-03-08 04:54:00,234] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token <concept>: torch.Size([768])
[2023-03-08 04:54:01,624] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 1 tokens
[2023-03-08 04:54:01,814] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving tokenizer for textual inversion
[2023-03-08 04:54:01,859] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving text encoder for textual inversion
```
If you have set a custom token, that will be shown instead. If more than one token has been added, they will be
numbered following the pattern `base-N`, starting with 0.
### Figuring out how many layers a Textual Inversion uses
Textual Inversions produced by [the Stable Conceptualizer notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)
only have a single layer, while many others have more than one.
The number of layers is shown in the server logs when the model is converted:
```none
[2023-03-08 04:54:00,234] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token <concept>: torch.Size([768])
[2023-03-08 04:54:01,624] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 1 tokens
[2023-03-08 04:54:01,814] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving tokenizer for textual inversion
[2023-03-08 04:54:01,859] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving text encoder for textual inversion
...
[2023-03-08 04:58:06,378] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: generating 74 layer tokens
[2023-03-08 04:58:06,379] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token ['goblin-0', 'goblin-1', 'goblin-2', 'goblin-3', 'goblin-4', 'goblin-5', 'goblin-6', 'gob
lin-7', 'goblin-8', 'goblin-9', 'goblin-10', 'goblin-11', 'goblin-12', 'goblin-13', 'goblin-14', 'goblin-15', 'goblin-16', 'goblin-17', 'goblin-18', 'goblin-19', 'goblin-20', 'goblin-21', 'goblin-22', 'goblin-23', 'goblin-24', 'goblin-25', 'goblin-26', 'goblin-27', 'goblin-28', 'goblin-29', 'goblin-30', 'goblin-31', 'goblin-32', 'goblin-33', 'goblin-34', 'goblin-35', 'goblin-36', 'goblin-37', 'goblin-38', 'goblin-39', 'goblin-40', 'goblin-41', 'goblin-42', 'goblin-43', 'goblin-44', 'goblin-45', 'goblin-46', 'goblin-47', 'goblin-48', 'goblin-49', 'goblin-50', 'goblin-51', 'goblin-52', 'goblin-53', 'goblin-54', 'goblin-55', 'goblin-56', 'goblin-57', 'goblin-58', 'goblin-59', 'goblin-60', 'goblin-61', 'goblin-62', 'goblin-63', 'goblin-64', 'goblin-65', 'goblin-66', 'goblin-67', 'goblin-68', 'goblin-69', 'goblin-70', 'goblin-71', 'goblin-72', 'goblin-73'] (*): torch.Size([74, 768])
[2023-03-08 04:58:07,685] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 74 tokens
[2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-0
[2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-1
[2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-2
[2023-03-08 04:58:07,875] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-3
```
Figuring out the number of layers after the model has been converted currently requires the original tensor file
(https://github.com/ssube/onnx-web/issues/212).

View File

@ -31,6 +31,7 @@ Please see [the server admin guide](server-admin.md) for details on how to confi
- [Model sources](#model-sources) - [Model sources](#model-sources)
- [Downloading models from Civitai](#downloading-models-from-civitai) - [Downloading models from Civitai](#downloading-models-from-civitai)
- [Using a custom VAE](#using-a-custom-vae) - [Using a custom VAE](#using-a-custom-vae)
- [Using and controlling Textual Inversions](#using-and-controlling-textual-inversions)
- [Tabs](#tabs) - [Tabs](#tabs)
- [Txt2img tab](#txt2img-tab) - [Txt2img tab](#txt2img-tab)
- [Scheduler parameter](#scheduler-parameter) - [Scheduler parameter](#scheduler-parameter)
@ -300,6 +301,20 @@ Some common VAE models include:
- https://huggingface.co/stabilityai/sd-vae-ft-mse - https://huggingface.co/stabilityai/sd-vae-ft-mse
- https://huggingface.co/stabilityai/sd-vae-ft-mse-original - https://huggingface.co/stabilityai/sd-vae-ft-mse-original
### Using and controlling Textual Inversions
You can use a Textual Inversion along with a diffusion model by giving one or more of the tokens from the inversion
model. Some Textual Inversions only have a single layer and some have 75 or more.
You can provide more than one of the numbered layer tokens using the `base-{X,Y}` range syntax in your prompt. This
uses the Python range rules, so `X` is inclusive and `Y` is not. The range `autumn-{0,5}` will be expanded into the
tokens `autumn-0 autumn-1 autumn-2 autumn-3 autumn-4`. You can use the layer tokens individually, out of order, and
repeat some layers or omit them entirely. You can provide a step as the third parameter, which will skip layers:
`even-layers-{0,100,2}` will be expanded into
`even-layers-0 even-layers-2 even-layers-4 even-layers-6 ... even-layers-98`.
The range syntax does not currently work when the Long Prompt Weighting pipeline is enabled.
## Tabs ## Tabs
### Txt2img tab ### Txt2img tab