chore(docs): update conversion guide for prompt tokens and networks
This commit is contained in:
parent
fe498b16f0
commit
e19e36ae22
|
@ -1,7 +1,10 @@
|
|||
# Converting Models
|
||||
|
||||
This guide describes the process for converting models from various formats, including Dreambooth checkpoints and LoRA
|
||||
weights, to the directories used by `diffusers` and on to the ONNX models used by `onnx-web`.
|
||||
This guide describes the process for converting models and additional networks to the directories used by `diffusers`
|
||||
and on to the ONNX models used by `onnx-web`.
|
||||
|
||||
Using the `extras.json` file, you can convert SD and diffusers models to ONNX, and blend them with LoRA weights and
|
||||
Textual Inversion embeddings.
|
||||
|
||||
## Contents
|
||||
|
||||
|
@ -15,8 +18,7 @@ weights, to the directories used by `diffusers` and on to the ONNX models used b
|
|||
- [LoRA weights from cloneofsimo/lora](#lora-weights-from-cloneofsimolora)
|
||||
- [LoRA weights from kohya-ss/sd-scripts](#lora-weights-from-kohya-sssd-scripts)
|
||||
- [Converting Textual Inversion embeddings](#converting-textual-inversion-embeddings)
|
||||
- [Figuring out what token a Textual Inversion uses](#figuring-out-what-token-a-textual-inversion-uses)
|
||||
- [Figuring out how many layers a Textual Inversion uses](#figuring-out-how-many-layers-a-textual-inversion-uses)
|
||||
- [Figuring out how many layers are in a Textual Inversion](#figuring-out-how-many-layers-are-in-a-textual-inversion)
|
||||
|
||||
## Conversion steps for each type of model
|
||||
|
||||
|
@ -28,11 +30,9 @@ You can start from a diffusers directory, HuggingFace Hub repository, or an SD c
|
|||
3. diffusers directory or LoRA weights from `cloneofsimo/lora` to...
|
||||
4. ONNX models
|
||||
|
||||
Textual inversions can be converted directly to ONNX by merging them with the base model.
|
||||
|
||||
One current disadvantage of using ONNX is that LoRA weights must be merged with the base model before being converted,
|
||||
so the final output is roughly the size of the base model. Hopefully this can be reduced in the future
|
||||
(https://github.com/ssube/onnx-web/issues/213).
|
||||
LoRAs and Textual inversions can be temporarily blended with an ONNX model while the server is running using prompt
|
||||
tokens or permanently blended during model conversion using the `extras.json` file. LoRA and Textual Inversion models
|
||||
do not need to be converted to ONNX to be used with prompt tokens.
|
||||
|
||||
If you are using the Auto1111 web UI or another tool, you may not need to convert the models to ONNX. In that case,
|
||||
you will not have an `extras.json` file and should skip the last step.
|
||||
|
@ -110,8 +110,11 @@ Based on docs and code in:
|
|||
|
||||
## Converting LoRA weights
|
||||
|
||||
You can merge one or more sets of LoRA weights into their base models, and then use your `extras.json` file to
|
||||
convert them into usable ONNX models.
|
||||
You can merge one or more sets of LoRA weights into their base models using your `extras.json` file, which is directly
|
||||
supported by the conversion script in `onnx-web` with no additional steps.
|
||||
|
||||
This is not required to use LoRA weights in the prompt, but it can save memory and enable better caching for
|
||||
commonly-used model combinations.
|
||||
|
||||
LoRA weights produced by the `cloneofsimo/lora` repository can be converted to a diffusers directory and from there
|
||||
on to ONNX, while LoRA weights produced by the `kohya-ss/sd-scripts` repository must be converted to an SD checkpoint,
|
||||
|
@ -245,24 +248,13 @@ Based on docs in:
|
|||
You can convert Textual Inversion embeddings by merging their weights and tokens into a copy of their base model,
|
||||
which is directly supported by the conversion script in `onnx-web` with no additional steps.
|
||||
|
||||
Textual Inversions may have more than one set of weights, which can be used and controlled separately. Some Textual
|
||||
Inversions provide their own token, but you can set a custom token for any of them.
|
||||
This is not required to use LoRA weights in the prompt, but it can save memory and enable better caching for
|
||||
commonly-used model combinations.
|
||||
|
||||
### Figuring out what token a Textual Inversion uses
|
||||
Some Textual Inversions may have more than one set of weights, which can be used and controlled separately. Some
|
||||
Textual Inversions may provide their own token, but you can always use the filename to activate them in `onnx-web`.
|
||||
|
||||
The base token, without any layer numbers, should be printed to the logs with the string `found embedding for token`:
|
||||
|
||||
```none
|
||||
[2023-03-08 04:54:00,234] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token <concept>: torch.Size([768])
|
||||
[2023-03-08 04:54:01,624] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 1 tokens
|
||||
[2023-03-08 04:54:01,814] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving tokenizer for Textual Inversion
|
||||
[2023-03-08 04:54:01,859] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving text encoder for Textual Inversion
|
||||
```
|
||||
|
||||
If you have set a custom token, that will be shown instead. If more than one token has been added, they will be
|
||||
numbered following the pattern `base-N`, starting with 0.
|
||||
|
||||
### Figuring out how many layers a Textual Inversion uses
|
||||
### Figuring out how many layers are in a Textual Inversion
|
||||
|
||||
Textual Inversions produced by [the Stable Conceptualizer notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)
|
||||
only have a single layer, while many others have more than one.
|
||||
|
@ -285,5 +277,5 @@ lin-7', 'goblin-8', 'goblin-9', 'goblin-10', 'goblin-11', 'goblin-12', 'goblin-1
|
|||
[2023-03-08 04:58:07,875] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-3
|
||||
```
|
||||
|
||||
Figuring out the number of layers after the model has been converted currently requires the original tensor file
|
||||
(https://github.com/ssube/onnx-web/issues/212).
|
||||
You do not need to know how many layers a Textual Inversion has to use the base token, `goblin` or `goblin-all` in this
|
||||
example, but it does allow you to control the layers individually.
|
||||
|
|
Loading…
Reference in New Issue