fix(docs): explain converting Textual Inversions, using layer tokens, and prompt range syntax (#179)

2023-03-07 23:14:29 -06:00 · 2023-03-07 23:14:29 -06:00 · 78005812f3
parent 30b08c6d6d
commit 78005812f3
2 changed files with 72 additions and 3 deletions
--- a/docs/converting-models.md
+++ b/docs/converting-models.md
@ -14,6 +14,9 @@ weights, to the directories used by `diffusers` and on to the ONNX models used b
    - [Figuring out which script produced the LoRA weights](#figuring-out-which-script-produced-the-lora-weights)
    - [LoRA weights from cloneofsimo/lora](#lora-weights-from-cloneofsimolora)
    - [LoRA weights from kohya-ss/sd-scripts](#lora-weights-from-kohya-sssd-scripts)
  - [Converting Textual Inversion embeddings](#converting-textual-inversion-embeddings)
    - [Figuring out what token a Textual Inversion uses](#figuring-out-what-token-a-textual-inversion-uses)
    - [Figuring out how many layers a Textual Inversion uses](#figuring-out-how-many-layers-a-textual-inversion-uses)
 ## Conversion steps for each type of model
@ -25,11 +28,14 @@ You can start from a diffusers directory, HuggingFace Hub repository, or an SD c
 3. diffusers directory or LoRA weights from `cloneofsimo/lora` to...
 4. ONNX models
-One disadvantage of using ONNX is that LoRA weights must be merged with the base model before being converted,
+Textual inversions can be converted directly to ONNX by merging them with the base model.
-so the final output is roughly the size of the base model. Hopefully this can be reduced in the future.
+
 One current disadvantage of using ONNX is that LoRA weights must be merged with the base model before being converted,
 so the final output is roughly the size of the base model. Hopefully this can be reduced in the future
 (https://github.com/ssube/onnx-web/issues/213).
 If you are using the Auto1111 web UI or another tool, you may not need to convert the models to ONNX. In that case,
-you will not have an `extras.json` file and should skip step 4.
+you will not have an `extras.json` file and should skip the last step.
 ## Converting diffusers models
@ -233,3 +239,51 @@ Make sure to set the `format` key and that it matches the format you used to exp
 Based on docs in:
 - https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#%E3%83%9E%E3%83%BC%E3%82%B8%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%97%E3%83%88%E3%81%AB%E3%81%A4%E3%81%84%E3%81%A6
 ## Converting Textual Inversion embeddings
 You can convert Textual Inversion embeddings by merging their weights and tokens into a copy of their base model,
 which is directly supported by the conversion script in `onnx-web` with no additional steps.
 Textual Inversions may have more than one set of weights, which can be used and controlled separately. Some Textual
 Inversions provide their own token, but you can set a custom token for any of them.
 ### Figuring out what token a Textual Inversion uses
 The base token, without any layer numbers, should be printed to the logs with the string `found embedding for token`:
 ```none
 [2023-03-08 04:54:00,234] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token <concept>: torch.Size([768])
 [2023-03-08 04:54:01,624] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 1 tokens
 [2023-03-08 04:54:01,814] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving tokenizer for textual inversion
 [2023-03-08 04:54:01,859] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving text encoder for textual inversion
 ```
 If you have set a custom token, that will be shown instead. If more than one token has been added, they will be
 numbered following the pattern `base-N`, starting with 0.
 ### Figuring out how many layers a Textual Inversion uses
 Textual Inversions produced by [the Stable Conceptualizer notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)
 only have a single layer, while many others have more than one.
 The number of layers is shown in the server logs when the model is converted:
 ```none
 [2023-03-08 04:54:00,234] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token <concept>: torch.Size([768])
 [2023-03-08 04:54:01,624] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 1 tokens
 [2023-03-08 04:54:01,814] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving tokenizer for textual inversion
 [2023-03-08 04:54:01,859] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving text encoder for textual inversion
 ...
 [2023-03-08 04:58:06,378] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: generating 74 layer tokens
 [2023-03-08 04:58:06,379] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token ['goblin-0', 'goblin-1', 'goblin-2', 'goblin-3', 'goblin-4', 'goblin-5', 'goblin-6', 'gob
 lin-7', 'goblin-8', 'goblin-9', 'goblin-10', 'goblin-11', 'goblin-12', 'goblin-13', 'goblin-14', 'goblin-15', 'goblin-16', 'goblin-17', 'goblin-18', 'goblin-19', 'goblin-20', 'goblin-21', 'goblin-22', 'goblin-23', 'goblin-24', 'goblin-25', 'goblin-26', 'goblin-27', 'goblin-28', 'goblin-29', 'goblin-30', 'goblin-31', 'goblin-32', 'goblin-33', 'goblin-34', 'goblin-35', 'goblin-36', 'goblin-37', 'goblin-38', 'goblin-39', 'goblin-40', 'goblin-41', 'goblin-42', 'goblin-43', 'goblin-44', 'goblin-45', 'goblin-46', 'goblin-47', 'goblin-48', 'goblin-49', 'goblin-50', 'goblin-51', 'goblin-52', 'goblin-53', 'goblin-54', 'goblin-55', 'goblin-56', 'goblin-57', 'goblin-58', 'goblin-59', 'goblin-60', 'goblin-61', 'goblin-62', 'goblin-63', 'goblin-64', 'goblin-65', 'goblin-66', 'goblin-67', 'goblin-68', 'goblin-69', 'goblin-70', 'goblin-71', 'goblin-72', 'goblin-73'] (*): torch.Size([74, 768])
 [2023-03-08 04:58:07,685] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 74 tokens
 [2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-0
 [2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-1
 [2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-2
 [2023-03-08 04:58:07,875] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-3
 ```
 Figuring out the number of layers after the model has been converted currently requires the original tensor file
 (https://github.com/ssube/onnx-web/issues/212).
--- a/docs/user-guide.md
+++ b/docs/user-guide.md
@ -31,6 +31,7 @@ Please see [the server admin guide](server-admin.md) for details on how to confi
    - [Model sources](#model-sources)
      - [Downloading models from Civitai](#downloading-models-from-civitai)
    - [Using a custom VAE](#using-a-custom-vae)
    - [Using and controlling Textual Inversions](#using-and-controlling-textual-inversions)
  - [Tabs](#tabs)
    - [Txt2img tab](#txt2img-tab)
      - [Scheduler parameter](#scheduler-parameter)
@ -300,6 +301,20 @@ Some common VAE models include:
 - https://huggingface.co/stabilityai/sd-vae-ft-mse
 - https://huggingface.co/stabilityai/sd-vae-ft-mse-original
 ### Using and controlling Textual Inversions
 You can use a Textual Inversion along with a diffusion model by giving one or more of the tokens from the inversion
 model. Some Textual Inversions only have a single layer and some have 75 or more.
 You can provide more than one of the numbered layer tokens using the `base-{X,Y}` range syntax in your prompt. This
 uses the Python range rules, so `X` is inclusive and `Y` is not. The range `autumn-{0,5}` will be expanded into the
 tokens `autumn-0 autumn-1 autumn-2 autumn-3 autumn-4`. You can use the layer tokens individually, out of order, and
 repeat some layers or omit them entirely. You can provide a step as the third parameter, which will skip layers:
 `even-layers-{0,100,2}` will be expanded into
 `even-layers-0 even-layers-2 even-layers-4 even-layers-6 ... even-layers-98`.
 The range syntax does not currently work when the Long Prompt Weighting pipeline is enabled.
 ## Tabs
 ### Txt2img tab