5.7 KiB
Server Administration
This is the server administration guide for ONNX web.
Please see the user guide for descriptions of the client and each of the parameters.
Contents
Configuration
Configuration is still very simple, loading models from a directory and parameters from a single JSON file. Some
additional configuration can be done through environment variables starting with ONNX_WEB
.
Debug Mode
Setting the DEBUG
variable to any value except false
will enable debug mode, which will print garbage
collection details and save some extra images to disk.
The images are:
output/last-mask.png
- the last
mask
image submitted with an inpaint request
- the last
output/last-noise.png
- the last noise source generated for an inpaint request
output/last-source.png
- the last
source
image submitted with an img2img, inpaint, or upscale request
- the last
These extra images can be helpful when debugging inpainting, especially poorly blended edges or visible noise.
Environment Variables
Paths:
ONNX_WEB_BUNDLE_PATH
- path where client bundle files can be found
ONNX_WEB_MODEL_PATH
- path where models can be found
ONNX_WEB_OUTPUT_PATH
- path where output images should be saved
ONNX_WEB_PARAMS_PATH
- path to the directory where the
params.json
file can be found
- path to the directory where the
Others:
ONNX_WEB_ANY_PLATFORM
- whether or not to include the
any
option in the platform list
- whether or not to include the
ONNX_WEB_BLOCK_PLATFORMS
- comma-delimited list of platforms that should not be presented to users
- further filters the list of available platforms returned by ONNX runtime
- can be used to prevent CPU generation on shared servers
ONNX_WEB_CACHE_MODELS
- the number of recent models to keep in memory
- setting this to 0 will disable caching and free VRAM between images
ONNX_WEB_CORS_ORIGIN
- comma-delimited list of allowed origins for CORS headers
ONNX_WEB_DEFAULT_PLATFORM
- the default platform to show in the client
- overrides the
params.json
file
ONNX_WEB_NUM_WORKERS
- number of background workers for image pipelines
- this should be equal to or less than the number of available GPUs
ONNX_WEB_SHOW_PROGRESS
- show progress bars in the logs
- disabling this can reduce noise in server logs, especially when logging to a file
ONNX_WEB_OPTIMIZATIONS
- comma-delimited list of optimizations to enable
Pipeline Optimizations
diffusers-*
diffusers-attention-slicing
diffusers-cpu-offload-*
diffusers-cpu-offload-sequential
- not available for ONNX pipelines (most of them)
- https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings
diffusers-cpu-offload-model
- not available for ONNX pipelines (most of them)
- https://huggingface.co/docs/diffusers/optimization/fp16#model-offloading-for-fast-inference-and-memory-savings
diffusers-memory-efficient-attention
diffusers-vae-slicing
- not available for ONNX pipelines (most of them)
- https://huggingface.co/docs/diffusers/optimization/fp16#sliced-vae-decode-for-larger-batches
onnx-*
onnx-low-memory
- disable ONNX features that allocate more memory than is strictly required or keep memory after use
onnx-graph-*
onnx-graph-disable
- disable all ONNX graph optimizations
onnx-graph-basic
- enable basic ONNX graph optimizations
onnx-graph-all
- enable all ONNX graph optimizations
onnx-deterministic-compute
- enable ONNX deterministic compute
Server Parameters
You can limit the image parameters in user requests to a reasonable range using values in the params.json
file.
The keys share the same name as the query string parameter, and the format for each numeric value is:
{
"default": 50,
"min": 1,
"max": 100,
"step": 1
}
Setting the step
to a decimal value between 0 and 1 will allow decimal inputs, but the client is hard-coded to send 2
decimal places in the query and only some parameters are parsed as floats, so values below 0.01
will effect the GUI
but not the output images, and some controls effectively force a step of 1
.
Containers
CPU
This is the simplest container to run and does not require any drivers or devices, but is also the slowest to generate images.
CUDA
Requires CUDA container runtime and 11.x driver on the host.
ROCm
Requires ROCm driver on the host.
Run with podman using:
> podman run -it \
--device=/dev/dri \
--device=/dev/kfd \
--group-add video \
--security-opt seccomp=unconfined \
-e ONNX_WEB_MODEL_PATH=/data/models \
-e ONNX_WEB_OUTPUT_PATH=/data/outputs \
-v /var/lib/onnx-web/models:/data/models:rw \
-v /var/lib/onnx-web/outputs:/data/outputs:rw \
-p 5000:5000 \
docker.io/ssube/onnx-web-api:main-rocm-ubuntu
Rootless podman does not appear to work and will show a root does not belong to group 'video'
error, which does
not make much sense on its own, but appears to refers to the user who launched the container.