Saturday 29 May 2021

How to use Nvidia Triton Inference Server?

ubuntu16/18, driver 440.33.01,, NVIDIA Triton Inference Server 1.12.0

step: pull the triton docker
> docker pull

step: download the model example
> git clone
(triton server root: /home/ninja/server)
> git checkout r20.03
> cd /home/ninja/server/docs/examples
> ./
(to check the model configuration, go to /home/ninja/server/docs/examples/model_repository/*)
> cd /home/ninja/server/docs/examples
> mkdir -p ensemble_model_repository/preprocess_resnet50_ensemble/1

step: up the triton server
> docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/home/ninja/server/docs/examples/model_repository:/models trtserver --model-repository=/models


> docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/ninja/server/docs/examples/model_repository:/models trtserver --model-repository=/models

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
> sudo apt install nvidia-container-toolkit
> sudo systemctl restart docker

step: check the triton server status
> curl localhost:8000/api/status
(ready_state: SERVER_READY)

step: pull and run the client example
> docker run -it --rm --net=host
(inside docker now)
> root@luke:/workspace#


step: run the model inference
> root@luke:/workspace# image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

or using python

> cd /workspace/src/clients/python/api_v1/examples
> root@luke:/workspace/src/clients/python/api_v1/examples# python -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

or using GRPC instead of HTTP as previous two examples

> root@luke:/workspace# image_client -i grpc -u localhost:8001 -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

or using -c to see more top n classification results

> root@luke:/workspace# image_client -m resnet50_netdef -s INCEPTION -c 5 /workspace/images/mug.jpg

or the -b flag allows you to send a batch of images for inferencing

> root@luke:/workspace# image_client -m resnet50_netdef -s INCEPTION -c 3 -b 2 /workspace/images/mug.jpg

or provide a directory instead of a single image to perform inferencing on all images in the directory

> root@luke:/workspace# image_client -i grpc -u localhost:8001 -m densenet_onnx -c 5 -s INCEPTION /workspace/images/

or Ensemble Image Classification Example Application ???, need to restart docker ???

> ensemble_image_client /workspace/images/mug.jpg


step: to benchmark the model
root@luke:/workspace# perf_client -m resnet50_netdef --concurrency-range 1:4 -f perf.csv
root@luke:/workspace# cp perf.csv /temp (host directory)
(then, make a copy
(and, import the perf.csv into google docs copy)

step: how to use dynamic batching and multiple instances of a single model

example of dynamic batching and multiple instances for config.pbtxt, need to restart triton-client docker
name: "resnet50_netdef"
platform: "caffe2_netdef"
max_batch_size: 128
dynamic_batching { preferred_batch_size: [ 4 ] }
instance_group [ { count: 2 }]
input [
    name: "gpu_0/data"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
output [
    name: "gpu_0/softmax"
    data_type: TYPE_FP32
    dims: [ 1000 ]
    label_filename: "resnet50_labels.txt"

root@luke:/workspace# perf_client -m resnet50_netdef --concurrency-range 4


