fix(api): combine names for ONNX fp16 optimization
This commit is contained in:
parent
73e9cf8b66
commit
c2f8fb1d31
|
@ -478,7 +478,7 @@ def main() -> int:
|
||||||
logger.info("CLI arguments: %s", args)
|
logger.info("CLI arguments: %s", args)
|
||||||
|
|
||||||
ctx = ConversionContext.from_environ()
|
ctx = ConversionContext.from_environ()
|
||||||
ctx.half = args.half or "onnx-internal-fp16" in ctx.optimizations
|
ctx.half = args.half or "onnx-fp16" in ctx.optimizations
|
||||||
ctx.opset = args.opset
|
ctx.opset = args.opset
|
||||||
ctx.token = args.token
|
ctx.token = args.token
|
||||||
logger.info("converting models in %s using %s", ctx.model_path, ctx.training_device)
|
logger.info("converting models in %s using %s", ctx.model_path, ctx.training_device)
|
||||||
|
|
|
@ -102,9 +102,7 @@ Others:
|
||||||
- `onnx-deterministic-compute`
|
- `onnx-deterministic-compute`
|
||||||
- enable ONNX deterministic compute
|
- enable ONNX deterministic compute
|
||||||
- `onnx-fp16`
|
- `onnx-fp16`
|
||||||
- force 16-bit floating point values when running pipelines
|
- convert model nodes to 16-bit floating point values internally while leaving 32-bit inputs
|
||||||
- use with https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/stable_diffusion#optimize-onnx-pipeline
|
|
||||||
and the `--float16` flag
|
|
||||||
- `onnx-graph-*`
|
- `onnx-graph-*`
|
||||||
- `onnx-graph-disable`
|
- `onnx-graph-disable`
|
||||||
- disable all ONNX graph optimizations
|
- disable all ONNX graph optimizations
|
||||||
|
@ -112,9 +110,6 @@ Others:
|
||||||
- enable basic ONNX graph optimizations
|
- enable basic ONNX graph optimizations
|
||||||
- `onnx-graph-all`
|
- `onnx-graph-all`
|
||||||
- enable all ONNX graph optimizations
|
- enable all ONNX graph optimizations
|
||||||
- `onnx-internal-fp16`
|
|
||||||
- convert internal model nodes to 16-bit floating point values
|
|
||||||
- does not reduce disk space as much as `onnx-fp16` or `torch-fp16`, but does not incur as many extra conversions
|
|
||||||
- `onnx-low-memory`
|
- `onnx-low-memory`
|
||||||
- disable ONNX features that allocate more memory than is strictly required or keep memory after use
|
- disable ONNX features that allocate more memory than is strictly required or keep memory after use
|
||||||
- `torch-*`
|
- `torch-*`
|
||||||
|
|
|
@ -725,20 +725,29 @@ Some common VAE models include:
|
||||||
### Optimizing models for lower memory usage
|
### Optimizing models for lower memory usage
|
||||||
|
|
||||||
Running Stable Diffusion with ONNX acceleration uses more memory by default than some other methods, but there are a
|
Running Stable Diffusion with ONNX acceleration uses more memory by default than some other methods, but there are a
|
||||||
number of optimizations that you can apply to reduce the memory usage.
|
number of [server optimizations](server-admin.md#pipeline-optimizations) that you can apply to reduce the memory usage:
|
||||||
|
|
||||||
|
- `diffusers-attention-slicing`
|
||||||
|
- `onnx-fp16`
|
||||||
|
- `onnx-graph-all`
|
||||||
|
- `onnx-low-memory`
|
||||||
|
- `torch-fp16`
|
||||||
|
|
||||||
At least 12GB of VRAM is recommended for running all of the models in the extras file, but `onnx-web` should work on
|
At least 12GB of VRAM is recommended for running all of the models in the extras file, but `onnx-web` should work on
|
||||||
most 8GB cards and may work on some 6GB cards. 4GB is not supported yet, but [it should be
|
most 8GB cards and may work on some 6GB cards. 4GB is not supported yet, but [it should be
|
||||||
possible](https://github.com/ssube/onnx-web/issues/241#issuecomment-1475341043).
|
possible](https://github.com/ssube/onnx-web/issues/241#issuecomment-1475341043).
|
||||||
|
|
||||||
- `diffusers-attention-slicing`
|
Based on somewhat limited testing, the model size memory usage for each optimization level is approximately:
|
||||||
- `onnx-fp16`
|
|
||||||
- `onnx-internal-fp16`
|
|
||||||
- `onnx-graph-all`
|
|
||||||
- `onnx-low-memory`
|
|
||||||
- `torch-fp16`
|
|
||||||
|
|
||||||
TODO: memory at different optimization levels
|
| Optimizations | Disk Size | Memory Usage - 1 @ 512x512 | Supported Platforms |
|
||||||
|
| --------------------------- | --------- | -------------------------- | ------------------- |
|
||||||
|
| none | 4.0G | 11.5G | all |
|
||||||
|
| `onnx-fp16` | 2.2G | 9.9G | all |
|
||||||
|
| ORT script | 4.0G | 6.6G | CUDA only |
|
||||||
|
| ORT script with `--float16` | 2.1G | 5.8G | CUDA only |
|
||||||
|
| `torch-fp16` | 2.0G | 5.9G | CUDA only |
|
||||||
|
|
||||||
|
- https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/stable_diffusion#cuda-optimizations-for-stable-diffusion
|
||||||
|
|
||||||
### Permanently blending additional networks
|
### Permanently blending additional networks
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue