1
0
Fork 0

fix(api): combine names for ONNX fp16 optimization

This commit is contained in:
Sean Sube 2023-03-27 08:55:01 -05:00
parent 73e9cf8b66
commit c2f8fb1d31
Signed by: ssube
GPG Key ID: 3EED7B957D362AF1
3 changed files with 19 additions and 15 deletions

View File

@ -478,7 +478,7 @@ def main() -> int:
logger.info("CLI arguments: %s", args) logger.info("CLI arguments: %s", args)
ctx = ConversionContext.from_environ() ctx = ConversionContext.from_environ()
ctx.half = args.half or "onnx-internal-fp16" in ctx.optimizations ctx.half = args.half or "onnx-fp16" in ctx.optimizations
ctx.opset = args.opset ctx.opset = args.opset
ctx.token = args.token ctx.token = args.token
logger.info("converting models in %s using %s", ctx.model_path, ctx.training_device) logger.info("converting models in %s using %s", ctx.model_path, ctx.training_device)

View File

@ -102,9 +102,7 @@ Others:
- `onnx-deterministic-compute` - `onnx-deterministic-compute`
- enable ONNX deterministic compute - enable ONNX deterministic compute
- `onnx-fp16` - `onnx-fp16`
- force 16-bit floating point values when running pipelines - convert model nodes to 16-bit floating point values internally while leaving 32-bit inputs
- use with https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/stable_diffusion#optimize-onnx-pipeline
and the `--float16` flag
- `onnx-graph-*` - `onnx-graph-*`
- `onnx-graph-disable` - `onnx-graph-disable`
- disable all ONNX graph optimizations - disable all ONNX graph optimizations
@ -112,9 +110,6 @@ Others:
- enable basic ONNX graph optimizations - enable basic ONNX graph optimizations
- `onnx-graph-all` - `onnx-graph-all`
- enable all ONNX graph optimizations - enable all ONNX graph optimizations
- `onnx-internal-fp16`
- convert internal model nodes to 16-bit floating point values
- does not reduce disk space as much as `onnx-fp16` or `torch-fp16`, but does not incur as many extra conversions
- `onnx-low-memory` - `onnx-low-memory`
- disable ONNX features that allocate more memory than is strictly required or keep memory after use - disable ONNX features that allocate more memory than is strictly required or keep memory after use
- `torch-*` - `torch-*`

View File

@ -725,20 +725,29 @@ Some common VAE models include:
### Optimizing models for lower memory usage ### Optimizing models for lower memory usage
Running Stable Diffusion with ONNX acceleration uses more memory by default than some other methods, but there are a Running Stable Diffusion with ONNX acceleration uses more memory by default than some other methods, but there are a
number of optimizations that you can apply to reduce the memory usage. number of [server optimizations](server-admin.md#pipeline-optimizations) that you can apply to reduce the memory usage:
- `diffusers-attention-slicing`
- `onnx-fp16`
- `onnx-graph-all`
- `onnx-low-memory`
- `torch-fp16`
At least 12GB of VRAM is recommended for running all of the models in the extras file, but `onnx-web` should work on At least 12GB of VRAM is recommended for running all of the models in the extras file, but `onnx-web` should work on
most 8GB cards and may work on some 6GB cards. 4GB is not supported yet, but [it should be most 8GB cards and may work on some 6GB cards. 4GB is not supported yet, but [it should be
possible](https://github.com/ssube/onnx-web/issues/241#issuecomment-1475341043). possible](https://github.com/ssube/onnx-web/issues/241#issuecomment-1475341043).
- `diffusers-attention-slicing` Based on somewhat limited testing, the model size memory usage for each optimization level is approximately:
- `onnx-fp16`
- `onnx-internal-fp16`
- `onnx-graph-all`
- `onnx-low-memory`
- `torch-fp16`
TODO: memory at different optimization levels | Optimizations | Disk Size | Memory Usage - 1 @ 512x512 | Supported Platforms |
| --------------------------- | --------- | -------------------------- | ------------------- |
| none | 4.0G | 11.5G | all |
| `onnx-fp16` | 2.2G | 9.9G | all |
| ORT script | 4.0G | 6.6G | CUDA only |
| ORT script with `--float16` | 2.1G | 5.8G | CUDA only |
| `torch-fp16` | 2.0G | 5.9G | CUDA only |
- https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/stable_diffusion#cuda-optimizations-for-stable-diffusion
### Permanently blending additional networks ### Permanently blending additional networks