Saturday, 29 May 2021

How to use Nvidia Triton Inference Server?

ubuntu16/18, driver 440.33.01, nvcr.io/nvidia/tritonserver:20.03-py3, NVIDIA Triton Inference Server 1.12.0
======

step: pull the triton docker
> docker pull nvcr.io/nvidia/tritonserver:20.03-py3

step: download the model example
> git clone https://github.com/triton-inference-server/server.git
(triton server root: /home/ninja/server)
> git checkout r20.03
> cd /home/ninja/server/docs/examples
> ./fetch_models.sh
(to check the model configuration, go to /home/ninja/server/docs/examples/model_repository/*)
> cd /home/ninja/server/docs/examples
> mkdir -p ensemble_model_repository/preprocess_resnet50_ensemble/1

step: up the triton server
> docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/home/ninja/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:20.03-py3 trtserver --model-repository=/models

or

> docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/ninja/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:20.03-py3 trtserver --model-repository=/models

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
> sudo apt install nvidia-container-toolkit
> sudo systemctl restart docker

step: check the triton server status
> curl localhost:8000/api/status
(ready_state: SERVER_READY)

step: pull and run the client example
> docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:20.03-py3-clientsdk
(inside docker now)
> root@luke:/workspace#

======

step: run the model inference
> root@luke:/workspace# image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

or using python

> cd /workspace/src/clients/python/api_v1/examples
> root@luke:/workspace/src/clients/python/api_v1/examples# python image_client.py -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

or using GRPC instead of HTTP as previous two examples

> root@luke:/workspace# image_client -i grpc -u localhost:8001 -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg

or using -c to see more top n classification results

> root@luke:/workspace# image_client -m resnet50_netdef -s INCEPTION -c 5 /workspace/images/mug.jpg

or the -b flag allows you to send a batch of images for inferencing

> root@luke:/workspace# image_client -m resnet50_netdef -s INCEPTION -c 3 -b 2 /workspace/images/mug.jpg

or provide a directory instead of a single image to perform inferencing on all images in the directory

> root@luke:/workspace# image_client -i grpc -u localhost:8001 -m densenet_onnx -c 5 -s INCEPTION /workspace/images/

or Ensemble Image Classification Example Application ???, need to restart docker ???

> ensemble_image_client /workspace/images/mug.jpg


======

step: to benchmark the model
root@luke:/workspace# perf_client -m resnet50_netdef --concurrency-range 1:4 -f perf.csv
root@luke:/workspace# cp perf.csv /temp (host directory)
(then, make a copy https://docs.google.com/spreadsheets/d/1IsdW78x_F-jLLG4lTV0L-rruk0VEBRL7Mnb-80RGLL4/edit#gid=1572240508)
(and, import the perf.csv into google docs copy)

step: how to use dynamic batching and multiple instances of a single model
https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton_inference_server_1120/triton-inference-server-guide/docs/optimization.html

example of dynamic batching and multiple instances for config.pbtxt, need to restart triton-client docker
name: "resnet50_netdef"
platform: "caffe2_netdef"
max_batch_size: 128
dynamic_batching { preferred_batch_size: [ 4 ] }
instance_group [ { count: 2 }]
input [
  {
    name: "gpu_0/data"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
]
output [
  {
    name: "gpu_0/softmax"
    data_type: TYPE_FP32
    dims: [ 1000 ]
    label_filename: "resnet50_labels.txt"
  }
]

root@luke:/workspace# perf_client -m resnet50_netdef --concurrency-range 4

reference:
- https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md
- https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton_inference_server_1120/triton-inference-server-guide/docs/run.html#checking-inference-server-status

No comments:

Post a Comment