Monday, 5 December 2022

How to use Nvidia Nsight Systems for Profiling inside Nvidia docker?

Step: download the nvidia nsight run file from this page

https://developer.nvidia.com/gameworksdownload#?search=Nsight

https://developer.nvidia.com/gameworksdownload#?dn=nsight-systems-2021-3-1-54

I am using this version when doing the testing: https://developer.nvidia.com/rdp/assets/nsight-systems-2021-3-linux-installer


Step: move the installer into docker and install inside docker, commit it


SQL

(inside docker)

sh NsightSystems-linux-public-2021.3.1.54-ee9c30a.run

(check nsys status)

nsys status -e

===

Sampling Environment Check

Linux Kernel Paranoid Level = -1: OK

Linux Distribution = Ubuntu

Linux Kernel Version = 5.4.0-81: OK

Linux perf_event_open syscall available: Fail

Sampling trigger event available: Fail

Intel(c) Last Branch Record support: Not Available

Sampling Environment: Fail

===

Commit the docker first and exit the docker to resolve the Fail issues



Step: to resolve the Fail of "nsys status -e" in the previous step


Groovy

(exit docker now, at host)

sudo sh -c 'echo kernel.perf_event_paranoid=2 > /etc/sysctl.d/local.conf' (reboot)

sudo vi /proc/sys/kernel/perf_event_paranoid (change 3 to 2)

cat /proc/sys/kernel/perf_event_paranoid

(the perf paranoid level on the target system must be ≤2)


adding a new flag to run the docker as following:-


docker run --cap-add=SYS_ADMIN --shm-size=1g --ulimit memlock=-1 --ulimit 

stack=67108864 --rm  -it  --runtime nvidia  --net=host  --security-opt 

apparmor:unconfined -e DISPLAY=$DISPLAY  -v /home/ninja/temp:/workspace 

-w /workspace nvcr.io/nvidia/deepstream:5.1-21.02-triton



(inside docker now)

nsys status -e

===

Sampling Environment Check

Linux Kernel Paranoid Level = -1: OK

Linux Distribution = Ubuntu

Linux Kernel Version = 5.4.0-81: OK

Linux perf_event_open syscall available: OK

Sampling trigger event available: OK

Intel(c) Last Branch Record support: Available

Sampling Environment: OK



Step: to run the profiling inside docker

Default analysis run



JavaScript

(profile cpu only)

nsys profile sh run.sh

or

nsys profile -o report1 ./main 1 rtsp://192.168.80.100

(after a while, stop it, it will generate a nsight report as report1.qdrep)

Limited trace only run




Nginx

nsys profile --trace=cuda,nvtx -d 20 --sample=none -o report2 sh run.sh


Apache

nsys profile -e TEST_ONLY=0 -y 20 -o report3 sh run.sh



Step: Configure the Nsight for the Python program to be profiled

  1. Install nvtx

pip install nvtx
  1. Write a python code with nvtx annotation, nvtx-quickstart.py


Python

import time

import nvtx



@nvtx.annotate(“f()”, color="purple")

def f():

for i in range(5):

with nvtx.annotate("loop", color="red"):

time.sleep(i)


f()


        3. Execute the profile command

CUDA_LAUNCH_BLOCKING=1 nsys profile python main.py 


Nginx

nsys profile -t nvtx,osrt --force-overwrite=true --stats=true \

--output=quickstart python nvtx-quickstart.py




SQL



(python37) ninja@luke:~/workspace/opencv_pyspace$ nsys profile \

-t nvtx,osrt --force-overwrite=true --stats=true --output=quickstart python test_nvtx.py




Collecting data...

Processing events...

Saving temporary "/tmp/nsys-report-4ebb-30b6-cd44-22af.qdstrm" file to disk...


Creating final output files...

Processing [===============================================================100%]

Saved report file to "/tmp/nsys-report-4ebb-30b6-cd44-22af.qdrep"

Exporting 1341 events: [===================================================100%]


Exported successfully to

/tmp/nsys-report-4ebb-30b6-cd44-22af.sqlite



Operating System Runtime API Statistics:


Time(%) Total Time (ns) Num Calls Average (ns) Minimum (ns) Maximum (ns) StdDev (ns) Name

------- --------------- --------- --------------- ------------- ------------- --------------- ---------

100.0 10,008,223,944 4 2,502,055,986.0 1,000,630,612 4,003,725,880 1,292,144,336.4 select

0.0 86,495 28 3,089.1 1,007 8,992 2,473.7 read

0.0 81,246 41 1,981.6 1,498 2,522 209.0 open64

0.0 15,682 9 1,742.4 1,542 2,018 163.1 mmap64

0.0 7,951 4 1,987.8 1,031 2,441 645.5 fopen64

0.0 3,262 3 1,087.3 1,067 1,100 17.8 fclose

0.0 1,106 1 1,106.0 1,106 1,106 0.0 sigaction

0.0 1,089 1 1,089.0 1,089 1,089 0.0 fflush




NVTX Range Statistics:


Time(%) Total Time (ns) Instances Average (ns) Minimum (ns) Maximum (ns) StdDev (ns) Style Range

------- --------------- --------- ---------------- -------------- -------------- --------------- ------- -----

50.0 10,008,687,995 1 10,008,687,995.0 10,008,687,995 10,008,687,995 0.0 PushPop f()

50.0 10,008,485,171 5 2,001,697,034.2 2,930 4,003,762,411 1,582,499,850.8 PushPop loop


Report file moved to "/home/ninja/workspace/opencv_pyspace/quickstart.qdrep"

Report file moved to "/home/ninja/workspace/opencv_pyspace/quickstart.sqlite"


  1. Open and view the nsight visual report, /home/ninja/workspace/opencv_pyspace/quickstart.qdrep





Step: to view the nsight report

Then open the nsight at host and load the report1.qdrep as following:-


Step: Run the Nvidia Nsight Systems in a terminal



Nginx

sudo sh -c 'echo 2 >/proc/sys/kernel/perf_event_paranoid'

sudo nsys-ui


Step: Configure the Nsight for the C++ program to be profiled






No comments:

Post a Comment