onnx-web/docs/server-admin.md

# Server Administration

This is the server administration guide for ONNX web.

Please see [the user guide](user-guide.md) for descriptions of the client and each of the parameters.

## Contents

- [Server Administration](#server-administration)
  - [Contents](#contents)
  - [Configuration](#configuration)
    - [Debug Mode](#debug-mode)
    - [Environment Variables](#environment-variables)
    - [Pipeline Optimizations](#pipeline-optimizations)
    - [Server Parameters](#server-parameters)
  - [Containers](#containers)
    - [CPU](#cpu)
    - [CUDA](#cuda)
    - [ROCm](#rocm)

## Configuration

Configuration is still very simple, loading models from a directory and parameters from a single JSON file. Some
additional configuration can be done through environment variables starting with `ONNX_WEB`.

### Debug Mode

Setting the `DEBUG` variable to any value except `false` will enable debug mode, which will print garbage
collection details and save some extra images to disk.

The images are:

- `output/last-mask.png`
  - the last `mask` image submitted with an inpaint request
- `output/last-noise.png`
  - the last noise source generated for an inpaint request
- `output/last-source.png`
  - the last `source` image submitted with an img2img, inpaint, or upscale request

These extra images can be helpful when debugging inpainting, especially poorly blended edges or visible noise.

### Environment Variables

Paths:

- `ONNX_WEB_BUNDLE_PATH`
  - path where client bundle files can be found
- `ONNX_WEB_MODEL_PATH`
  - path where models can be found
- `ONNX_WEB_OUTPUT_PATH`
  - path where output images should be saved
- `ONNX_WEB_PARAMS_PATH`
  - path to the directory where the `params.json` file can be found

Others:

- `ONNX_WEB_ANY_PLATFORM`
  - whether or not to include the `any` option in the platform list
- `ONNX_WEB_BLOCK_PLATFORMS`
  - comma-delimited list of platforms that should not be presented to users
  - further filters the list of available platforms returned by ONNX runtime
  - can be used to prevent CPU generation on shared servers
- `ONNX_WEB_CACHE_MODELS`
  - the number of recent models to keep in memory
  - setting this to 0 will disable caching and free VRAM between images
- `ONNX_WEB_CORS_ORIGIN`
  - comma-delimited list of allowed origins for CORS headers
- `ONNX_WEB_DEFAULT_PLATFORM`
  - the default platform to show in the client
  - overrides the `params.json` file
- `ONNX_WEB_NUM_WORKERS`
  - number of background workers for image pipelines
  - this should be equal to or less than the number of available GPUs
- `ONNX_WEB_SHOW_PROGRESS`
  - show progress bars in the logs
  - disabling this can reduce noise in server logs, especially when logging to a file
- `ONNX_WEB_OPTIMIZATIONS`
  - comma-delimited list of optimizations to enable
- `ONNX_WEB_EXTRA_ARGS`
  - extra arguments to the launch script
  - set this to `--half` to convert models to fp16
- `ONNX_WEB_EXTRA_MODELS`
  - extra model files to be loaded
  - one or more filenames or paths, to JSON or YAML files matching [the extras schema](../api/schemas/extras.yaml)

### Pipeline Optimizations

- `diffusers-*`
  - `diffusers-attention-slicing`
    - https://huggingface.co/docs/diffusers/optimization/fp16#sliced-attention-for-additional-memory-savings
  - `diffusers-cpu-offload-*`
    - `diffusers-cpu-offload-sequential`
      - not available for ONNX pipelines (most of them)
      - https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings
    - `diffusers-cpu-offload-model`
      - not available for ONNX pipelines (most of them)
      - https://huggingface.co/docs/diffusers/optimization/fp16#model-offloading-for-fast-inference-and-memory-savings
  - `diffusers-memory-efficient-attention`
    - requires [the `xformers` library](https://huggingface.co/docs/diffusers/optimization/xformers)
    - https://huggingface.co/docs/diffusers/optimization/fp16#memory-efficient-attention
  - `diffusers-vae-slicing`
    - not available for ONNX pipelines (most of them)
    - https://huggingface.co/docs/diffusers/optimization/fp16#sliced-vae-decode-for-larger-batches
- `onnx-*`
  - `onnx-low-memory`
    - disable ONNX features that allocate more memory than is strictly required or keep memory after use
  - `onnx-graph-*`
    - `onnx-graph-disable`
      - disable all ONNX graph optimizations
    - `onnx-graph-basic`
      - enable basic ONNX graph optimizations
    - `onnx-graph-all`
      - enable all ONNX graph optimizations
  - `onnx-deterministic-compute`
    - enable ONNX deterministic compute

### Server Parameters

You can limit the image parameters in user requests to a reasonable range using values in the `params.json` file.

The keys share the same name as the query string parameter, and the format for each numeric value is:

```json
{
  "default": 50,
  "min": 1,
  "max": 100,
  "step": 1
}
```

Setting the `step` to a decimal value between 0 and 1 will allow decimal inputs, but the client is hard-coded to send 2
decimal places in the query and only some parameters are parsed as floats, so values below `0.01` will effect the GUI
but not the output images, and some controls effectively force a step of `1`.

## Containers

### CPU

This is the simplest container to run and does not require any drivers or devices, but is also the slowest to
generate images.

### CUDA

Requires CUDA container runtime and 11.x driver on the host.

### ROCm

Requires ROCm driver on the host.

Run with podman using:

```shell
> podman run -it \
    --device=/dev/dri \
    --device=/dev/kfd \
    --group-add video \
    --security-opt seccomp=unconfined \
    -e ONNX_WEB_MODEL_PATH=/data/models \
    -e ONNX_WEB_OUTPUT_PATH=/data/outputs \
    -v /var/lib/onnx-web/models:/data/models:rw \
    -v /var/lib/onnx-web/outputs:/data/outputs:rw \
    -p 5000:5000 \
    docker.io/ssube/onnx-web-api:main-rocm-ubuntu
```

Rootless podman does not appear to work and will show a `root does not belong to group 'video'` error, which does
not make much sense on its own, but appears to refers to the user who launched the container.
fix(docs): add server admin guide, cross-link with user guide 2023-01-22 22:10:49 +00:00			`# Server Administration`

			`This is the server administration guide for ONNX web.`

			`Please see [the user guide](user-guide.md) for descriptions of the client and each of the parameters.`

			`## Contents`

			`- [Server Administration](#server-administration)`
			`- [Contents](#contents)`
			`- [Configuration](#configuration)`
feat(docs): note debug mode, server env vars 2023-01-25 05:49:14 +00:00			`- [Debug Mode](#debug-mode)`
			`- [Environment Variables](#environment-variables)`
chore(docs): explain model optimizations 2023-02-18 22:06:05 +00:00			`- [Pipeline Optimizations](#pipeline-optimizations)`
fix(docs): add server admin guide, cross-link with user guide 2023-01-22 22:10:49 +00:00			`- [Server Parameters](#server-parameters)`
			`- [Containers](#containers)`
			`- [CPU](#cpu)`
			`- [CUDA](#cuda)`
			`- [ROCm](#rocm)`

			`## Configuration`

feat(docs): note debug mode, server env vars 2023-01-25 05:49:14 +00:00			`Configuration is still very simple, loading models from a directory and parameters from a single JSON file. Some`
			additional configuration can be done through environment variables starting with `ONNX_WEB`.

			`### Debug Mode`

			Setting the `DEBUG` variable to any value except `false` will enable debug mode, which will print garbage
			`collection details and save some extra images to disk.`

			`The images are:`

			- `output/last-mask.png`
			- the last `mask` image submitted with an inpaint request
			- `output/last-noise.png`
			`- the last noise source generated for an inpaint request`
			- `output/last-source.png`
			- the last `source` image submitted with an img2img, inpaint, or upscale request

			`These extra images can be helpful when debugging inpainting, especially poorly blended edges or visible noise.`

			`### Environment Variables`

chore(docs): explain custom VAE 2023-02-17 05:12:44 +00:00			`Paths:`

feat(docs): note debug mode, server env vars 2023-01-25 05:49:14 +00:00			- `ONNX_WEB_BUNDLE_PATH`
			`- path where client bundle files can be found`
			- `ONNX_WEB_MODEL_PATH`
			`- path where models can be found`
			- `ONNX_WEB_OUTPUT_PATH`
			`- path where output images should be saved`
			- `ONNX_WEB_PARAMS_PATH`
			- path to the directory where the `params.json` file can be found
chore(docs): explain custom VAE 2023-02-17 05:12:44 +00:00
			`Others:`

			- `ONNX_WEB_ANY_PLATFORM`
			- whether or not to include the `any` option in the platform list
			- `ONNX_WEB_BLOCK_PLATFORMS`
			`- comma-delimited list of platforms that should not be presented to users`
			`- further filters the list of available platforms returned by ONNX runtime`
			`- can be used to prevent CPU generation on shared servers`
			- `ONNX_WEB_CACHE_MODELS`
			`- the number of recent models to keep in memory`
			`- setting this to 0 will disable caching and free VRAM between images`
feat(docs): note debug mode, server env vars 2023-01-25 05:49:14 +00:00			- `ONNX_WEB_CORS_ORIGIN`
			`- comma-delimited list of allowed origins for CORS headers`
chore(docs): explain custom VAE 2023-02-17 05:12:44 +00:00			- `ONNX_WEB_DEFAULT_PLATFORM`
			`- the default platform to show in the client`
			- overrides the `params.json` file
feat(docs): note debug mode, server env vars 2023-01-25 05:49:14 +00:00			- `ONNX_WEB_NUM_WORKERS`
			`- number of background workers for image pipelines`
			`- this should be equal to or less than the number of available GPUs`
feat(api): add flag to disable progress bars (#158) 2023-02-18 15:25:01 +00:00			- `ONNX_WEB_SHOW_PROGRESS`
			`- show progress bars in the logs`
			`- disabling this can reduce noise in server logs, especially when logging to a file`
chore(docs): explain model optimizations 2023-02-18 22:06:05 +00:00			- `ONNX_WEB_OPTIMIZATIONS`
			`- comma-delimited list of optimizations to enable`
fix(docs): describe ONNX_WEB_EXTRA_ARGS and _EXTRA_MODELS in admin guide 2023-03-09 14:08:27 +00:00			- `ONNX_WEB_EXTRA_ARGS`
			`- extra arguments to the launch script`
			- set this to `--half` to convert models to fp16
			- `ONNX_WEB_EXTRA_MODELS`
			`- extra model files to be loaded`
			`- one or more filenames or paths, to JSON or YAML files matching [the extras schema](../api/schemas/extras.yaml)`
chore(docs): explain model optimizations 2023-02-18 22:06:05 +00:00
			`### Pipeline Optimizations`

			- `diffusers-*`
			- `diffusers-attention-slicing`
			`- https://huggingface.co/docs/diffusers/optimization/fp16#sliced-attention-for-additional-memory-savings`
			- `diffusers-cpu-offload-*`
			- `diffusers-cpu-offload-sequential`
			`- not available for ONNX pipelines (most of them)`
			`- https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings`
			- `diffusers-cpu-offload-model`
			`- not available for ONNX pipelines (most of them)`
			`- https://huggingface.co/docs/diffusers/optimization/fp16#model-offloading-for-fast-inference-and-memory-savings`
			- `diffusers-memory-efficient-attention`
			- requires [the `xformers` library](https://huggingface.co/docs/diffusers/optimization/xformers)
			`- https://huggingface.co/docs/diffusers/optimization/fp16#memory-efficient-attention`
			- `diffusers-vae-slicing`
			`- not available for ONNX pipelines (most of them)`
			`- https://huggingface.co/docs/diffusers/optimization/fp16#sliced-vae-decode-for-larger-batches`
			- `onnx-*`
			- `onnx-low-memory`
			`- disable ONNX features that allocate more memory than is strictly required or keep memory after use`
			- `onnx-graph-*`
			- `onnx-graph-disable`
			`- disable all ONNX graph optimizations`
			- `onnx-graph-basic`
			`- enable basic ONNX graph optimizations`
			- `onnx-graph-all`
			`- enable all ONNX graph optimizations`
			- `onnx-deterministic-compute`
			`- enable ONNX deterministic compute`
fix(docs): add server admin guide, cross-link with user guide 2023-01-22 22:10:49 +00:00
			`### Server Parameters`

feat(docs): note debug mode, server env vars 2023-01-25 05:49:14 +00:00			You can limit the image parameters in user requests to a reasonable range using values in the `params.json` file.
fix(docs): add server admin guide, cross-link with user guide 2023-01-22 22:10:49 +00:00
			`The keys share the same name as the query string parameter, and the format for each numeric value is:`

			```json
			`{`
			`"default": 50,`
			`"min": 1,`
			`"max": 100,`
			`"step": 1`
			`}`
			```

			Setting the `step` to a decimal value between 0 and 1 will allow decimal inputs, but the client is hard-coded to send 2
			decimal places in the query and only some parameters are parsed as floats, so values below `0.01` will effect the GUI
			but not the output images, and some controls effectively force a step of `1`.

			`## Containers`

			`### CPU`

			`This is the simplest container to run and does not require any drivers or devices, but is also the slowest to`
			`generate images.`

			`### CUDA`

			`Requires CUDA container runtime and 11.x driver on the host.`

			`### ROCm`

			`Requires ROCm driver on the host.`

			`Run with podman using:`

			```shell
fix(docs): add podman rocm command to admin guide 2023-01-22 22:17:12 +00:00			`> podman run -it \`
			`--device=/dev/dri \`
			`--device=/dev/kfd \`
			`--group-add video \`
			`--security-opt seccomp=unconfined \`
			`-e ONNX_WEB_MODEL_PATH=/data/models \`
			`-e ONNX_WEB_OUTPUT_PATH=/data/outputs \`
			`-v /var/lib/onnx-web/models:/data/models:rw \`
			`-v /var/lib/onnx-web/outputs:/data/outputs:rw \`
			`-p 5000:5000 \`
			`docker.io/ssube/onnx-web-api:main-rocm-ubuntu`
fix(docs): add server admin guide, cross-link with user guide 2023-01-22 22:10:49 +00:00			```

fix(docs): add podman rocm command to admin guide 2023-01-22 22:17:12 +00:00			Rootless podman does not appear to work and will show a `root does not belong to group 'video'` error, which does
			`not make much sense on its own, but appears to refers to the user who launched the container.`