Step: download the nvidia nsight run file from this page
https://developer.nvidia.com/gameworksdownload#?search=Nsight
https://developer.nvidia.com/gameworksdownload#?dn=nsight-systems-2021-3-1-54
I am using this version when doing the testing: https://developer.nvidia.com/rdp/assets/nsight-systems-2021-3-linux-installer
Step: move the installer into docker and install inside docker, commit it
SQL (inside docker) sh NsightSystems-linux-public-2021.3.1.54-ee9c30a.run (check nsys status) nsys status -e === Sampling Environment Check Linux Kernel Paranoid Level = -1: OK Linux Distribution = Ubuntu Linux Kernel Version = 5.4.0-81: OK Linux perf_event_open syscall available: Fail Sampling trigger event available: Fail Intel(c) Last Branch Record support: Not Available Sampling Environment: Fail === |
Commit the docker first and exit the docker to resolve the Fail issues
Step: to resolve the Fail of "nsys status -e" in the previous step
Groovy (exit docker now, at host) sudo sh -c 'echo kernel.perf_event_paranoid=2 > /etc/sysctl.d/local.conf' (reboot) sudo vi /proc/sys/kernel/perf_event_paranoid (change 3 to 2) cat /proc/sys/kernel/perf_event_paranoid (the perf paranoid level on the target system must be ≤2) adding a new flag to run the docker as following:- docker run --cap-add=SYS_ADMIN --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --rm -it --runtime nvidia --net=host --security-opt apparmor:unconfined -e DISPLAY=$DISPLAY -v /home/ninja/temp:/workspace -w /workspace nvcr.io/nvidia/deepstream:5.1-21.02-triton (inside docker now) nsys status -e === Sampling Environment Check Linux Kernel Paranoid Level = -1: OK Linux Distribution = Ubuntu Linux Kernel Version = 5.4.0-81: OK Linux perf_event_open syscall available: OK Sampling trigger event available: OK Intel(c) Last Branch Record support: Available Sampling Environment: OK |
Default analysis run
JavaScript (profile cpu only) nsys profile sh run.sh or nsys profile -o report1 ./main 1 rtsp://192.168.80.100 (after a while, stop it, it will generate a nsight report as report1.qdrep) |
Limited trace only run
Nginx nsys profile --trace=cuda,nvtx -d 20 --sample=none -o report2 sh run.sh |
Apache nsys profile -e TEST_ONLY=0 -y 20 -o report3 sh run.sh |
Install nvtx
pip install nvtx
Write a python code with nvtx annotation, nvtx-quickstart.py
Python import time import nvtx @nvtx.annotate(“f()”, color="purple") def f(): for i in range(5): with nvtx.annotate("loop", color="red"): time.sleep(i) f() |
Nginx nsys profile -t nvtx,osrt --force-overwrite=true --stats=true \ --output=quickstart python nvtx-quickstart.py |
SQL
-t nvtx,osrt --force-overwrite=true --stats=true --output=quickstart python test_nvtx.py Collecting data... Processing events... Saving temporary "/tmp/nsys-report-4ebb-30b6-cd44-22af.qdstrm" file to disk... Creating final output files... Processing [===============================================================100%] Saved report file to "/tmp/nsys-report-4ebb-30b6-cd44-22af.qdrep" Exporting 1341 events: [===================================================100%] Exported successfully to /tmp/nsys-report-4ebb-30b6-cd44-22af.sqlite Operating System Runtime API Statistics: Time(%) Total Time (ns) Num Calls Average (ns) Minimum (ns) Maximum (ns) StdDev (ns) Name ------- --------------- --------- --------------- ------------- ------------- --------------- --------- 100.0 10,008,223,944 4 2,502,055,986.0 1,000,630,612 4,003,725,880 1,292,144,336.4 select 0.0 86,495 28 3,089.1 1,007 8,992 2,473.7 read 0.0 81,246 41 1,981.6 1,498 2,522 209.0 open64 0.0 15,682 9 1,742.4 1,542 2,018 163.1 mmap64 0.0 7,951 4 1,987.8 1,031 2,441 645.5 fopen64 0.0 3,262 3 1,087.3 1,067 1,100 17.8 fclose 0.0 1,106 1 1,106.0 1,106 1,106 0.0 sigaction 0.0 1,089 1 1,089.0 1,089 1,089 0.0 fflush NVTX Range Statistics: Time(%) Total Time (ns) Instances Average (ns) Minimum (ns) Maximum (ns) StdDev (ns) Style Range ------- --------------- --------- ---------------- -------------- -------------- --------------- ------- ----- 50.0 10,008,687,995 1 10,008,687,995.0 10,008,687,995 10,008,687,995 0.0 PushPop f() 50.0 10,008,485,171 5 2,001,697,034.2 2,930 4,003,762,411 1,582,499,850.8 PushPop loop Report file moved to "/home/ninja/workspace/opencv_pyspace/quickstart.qdrep" Report file moved to "/home/ninja/workspace/opencv_pyspace/quickstart.sqlite" |
Open and view the nsight visual report, /home/ninja/workspace/opencv_pyspace/quickstart.qdrep
Then open the nsight at host and load the report1.qdrep as following:-
Step: Run the Nvidia Nsight Systems in a terminal
Nginx sudo sh -c 'echo 2 >/proc/sys/kernel/perf_event_paranoid' sudo nsys-ui |
No comments:
Post a Comment