ubuntu16/18, driver 440.33.01, nvcr.io/nvidia/tritonserver:20.03-py3, NVIDIA Triton Inference Server 1.12.0
======
step: pull the triton docker
> docker pull nvcr.io/nvidia/tritonserver:20.03-py3
step: download the model example
> git clone https://github.com/triton-inference-server/server.git
(triton server root: /home/ninja/server)
> git checkout r20.03
> cd /home/ninja/server/docs/examples
> ./fetch_models.sh
(to check the model configuration, go to /home/ninja/server/docs/examples/model_repository/*)
> cd /home/ninja/server/docs/examples
> mkdir -p ensemble_model_repository/preprocess_resnet50_ensemble/1
step: up the triton server
> docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/home/ninja/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:20.03-py3 trtserver --model-repository=/models
or
> docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/ninja/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:20.03-py3 trtserver --model-repository=/models
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
> sudo apt install nvidia-container-toolkit
> sudo systemctl restart docker
step: check the triton server status
> curl localhost:8000/api/status
(ready_state: SERVER_READY)
step: pull and run the client example
> docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:20.03-py3-clientsdk
(inside docker now)
> root@luke:/workspace#
======
step: run the model inference
> root@luke:/workspace# image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg
or using python
> cd /workspace/src/clients/python/api_v1/examples
> root@luke:/workspace/src/clients/python/api_v1/examples# python image_client.py -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg
or using GRPC instead of HTTP as previous two examples
> root@luke:/workspace# image_client -i grpc -u localhost:8001 -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg
or using -c to see more top n classification results
> root@luke:/workspace# image_client -m resnet50_netdef -s INCEPTION -c 5 /workspace/images/mug.jpg
or the -b flag allows you to send a batch of images for inferencing
> root@luke:/workspace# image_client -m resnet50_netdef -s INCEPTION -c 3 -b 2 /workspace/images/mug.jpg
or provide a directory instead of a single image to perform inferencing on all images in the directory
> root@luke:/workspace# image_client -i grpc -u localhost:8001 -m densenet_onnx -c 5 -s INCEPTION /workspace/images/
or Ensemble Image Classification Example Application ???, need to restart docker ???
> ensemble_image_client /workspace/images/mug.jpg
======
step: to benchmark the model
root@luke:/workspace# perf_client -m resnet50_netdef --concurrency-range 1:4 -f perf.csv
root@luke:/workspace# cp perf.csv /temp (host directory)
(then, make a copy https://docs.google.com/spreadsheets/d/1IsdW78x_F-jLLG4lTV0L-rruk0VEBRL7Mnb-80RGLL4/edit#gid=1572240508)
(and, import the perf.csv into google docs copy)
step: how to use dynamic batching and multiple instances of a single model
https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton_inference_server_1120/triton-inference-server-guide/docs/optimization.html
example of dynamic batching and multiple instances for config.pbtxt, need to restart triton-client docker
name: "resnet50_netdef"
platform: "caffe2_netdef"
max_batch_size: 128
dynamic_batching { preferred_batch_size: [ 4 ] }
instance_group [ { count: 2 }]
input [
{
name: "gpu_0/data"
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 224, 224 ]
}
]
output [
{
name: "gpu_0/softmax"
data_type: TYPE_FP32
dims: [ 1000 ]
label_filename: "resnet50_labels.txt"
}
]
root@luke:/workspace# perf_client -m resnet50_netdef --concurrency-range 4
reference:
- https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md
- https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton_inference_server_1120/triton-inference-server-guide/docs/run.html#checking-inference-server-status
Saturday, 29 May 2021
How to use Nvidia Triton Inference Server?
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment