Sunday 28 February 2021

How to compare two arrays is the same in Python?

 np.testing.assert_allclose(to_numpy(img1y), ort_outs[0], rtol=1e-03, atol=1e-05)

Saturday 13 February 2021

How to run distributed data parallel (DDP) training using Pytorch?

step 1: generate the keygen in main machine

ssh-keygen -t rsa


step 2: copy paste the id_rsa and id_rsa.pub from main machine ~/.ssh to the child machine ~/.ssh (or remote machines)


step 3: setup the ssh connection from main machine to child machine, and child machine to machine

eg1

(in main machine): 

cat ~/.ssh/id_rsa.pub | ssh USER@CHILD1_IP "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"
cat ~/.ssh/id_rsa.pub | ssh USER@CHILD2_IP "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"



(in child machine): 

cat ~/.ssh/id_rsa.pub | ssh USER@MAIN_IP "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"


step 4: test the ssh connection, open a terminal and type

ssh 192.168.x.x (it should connect automatically to that ip address)

otherwise, use "ssh -vvv ip_address" to check the issue


step 5: install the same cuda, cudnn, nccl and conda environment in all machines


step 6: run the code in main machine, then run the code in child machine. If everything is ok, you should see a change in nvidia-smi in both machines

reference: https://cv.gluon.ai/build/examples_torch_action_recognition/ddp_pytorch.html

How to run the conda init from a bash script?

 

eval "$(conda shell.bash hook)"
conda activate gluoncv
python test_ddp_pytorch.py --config i3d_resnet50_v1_kinetics400.yaml


How to measure flops, #parameters, fps and latency?

 """4. Computing FLOPS, latency and fps of a model
=======================================================

It is important to have an idea of how to measure a video model's speed, so that you can choose the model that suits best for your use case.
In this tutorial, we provide two simple scripts to help you compute (1) FLOPS, (2) number of parameters, (3) fps and (4) latency.
These four numbers will help you evaluate the speed of this model.
To be specific, FLOPS means floating point operations per second, and fps means frame per second.
In terms of comparison, (1) FLOPS, the lower the better,
(2) number of parameters, the lower the better,
(3) fps, the higher the better,
(4) latency, the lower the better.


In terms of input, we use the setting in each model's training config.
For example, I3D models will use 32 frames with stride 2 in crop size 224, but R2+1D models will use 16 frames with stride 2 in crop size 112.
This will make sure that the speed performance here correlates well with the reported accuracy number.
We list these four numbers and the models' accuracy on Kinetics400 dataset in the table below.


+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
|  Model                                 | FLOPS          | # params   | fps                 | Latency         | Top-1 Accuracy  |
+========================================+================+============+=====================+=================+=================+
| resnet18_v1b_kinetics400               | 1.819          | 11.382     | 264.01              | 0.0038          | 66.73           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| resnet34_v1b_kinetics400               | 3.671          | 21.49      | 151.96              | 0.0066          | 69.85           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| resnet50_v1b_kinetics400               | 4.110          | 24.328     | 114.05              | 0.0088          | 70.88           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| resnet101_v1b_kinetics400              | 7.833          | 43.320     | 59.56               | 0.0167          | 72.25           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| resnet152_v1b_kinetics400              | 11.558         | 58.963     | 36.93               | 0.0271          | 72.45           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_resnet50_v1_kinetics400            | 33.275         | 28.863     | 1719.50             | 0.0372          | 74.87           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_resnet101_v1_kinetics400           | 51.864         | 52.574     | 1137.74             | 0.0563          | 75.10           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_nl5_resnet50_v1_kinetics400        | 47.737         | 38.069     | 1403.16             | 0.0456          | 75.17           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_nl10_resnet50_v1_kinetics400       | 62.199         | 42.275     | 1200.69             | 0.0533          | 75.93           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_nl5_resnet101_v1_kinetics400       | 66.326         | 61.780     | 999.94              | 0.0640          | 75.81           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_nl10_resnet101_v1_kinetics400      | 80.788         | 70.985     | 890.33              | 0.0719          | 75.93           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet50_f8s8_kinetics400     | 41.919         | 32.454     | 1702.60             | 0.0376          | 74.41           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet50_f16s4_kinetics400    | 83.838         | 32.454     | 1406.00             | 0.0455          | 76.36           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet50_f32s2_kinetics400    | 167.675        | 32.454     | 860.74              | 0.0744          | 77.89           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet101_f8s8_kinetics400    | 85.675         | 60.359     | 1114.22             | 0.0574          | 76.15           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet101_f16s4_kinetics400   | 171.348        | 60.359     | 876.20              | 0.0730          | 77.11           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet101_f32s2_kinetics400   | 342.696        | 60.359     | 541.16              | 0.1183          | 78.57           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| r2plus1d_v1_resnet18_kinetics400       | 40.645         | 31.505     | 804.31              | 0.0398          | 71.72           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| r2plus1d_v1_resnet34_kinetics400       | 75.400         | 61.832     | 503.17              | 0.0636          | 72.63           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| r2plus1d_v1_resnet50_kinetics400       | 65.543         | 53.950     | 667.06              | 0.0480          | 74.92           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| r2plus1d_v2_resnet152_kinetics400      | 252.900        | 118.227    | 546.19              | 0.1172          | 81.34           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| ircsn_v2_resnet152_f32s2_kinetics400   | 74.758         | 29.704     | 435.77              | 0.1469          | 83.18           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| slowfast_4x16_resnet50_kinetics400     | 27.820         | 34.480     | 1396.45             | 0.0458          | 75.25           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| slowfast_8x8_resnet50_kinetics400      | 50.583         | 34.566     | 1297.24             | 0.0493          | 76.66           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| slowfast_8x8_resnet101_kinetics400     | 96.794         | 62.827     | 889.62              | 0.0719          | 76.95           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet50_f8s8_kinetics400          | 50.457         | 71.800     | 1350.39             | 0.0474          | 77.04           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet50_f16s4_kinetics400         | 99.929         | 71.800     | 1128.39             | 0.0567          | 77.33           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet50_f32s2_kinetics400         | 198.874        | 71.800     | 716.89              | 0.0893          | 78.90           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet101_f8s8_kinetics400         | 94.366         | 99.705     | 942.61              | 0.0679          | 78.10           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet101_f16s4_kinetics400        | 187.594        | 99.705     | 754.00              | 0.0849          | 79.39           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet101_f32s2_kinetics400        | 374.048        | 99.705     | 479.77              | 0.1334          | 79.70           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+


.. note::

    Feel free to skip the tutorial because the speed computation scripts are self-complete and ready to launch.

    :download:`Download Full Python Script: get_flops.py<../../../scripts/action-recognition/get_flops.py>`

    :download:`Download Full Python Script: get_fps.py<../../../scripts/action-recognition/get_fps.py>`

    You can reproduce the numbers in the above table by

    ``python get_flops.py --config-file CONFIG`` and ``python get_fps.py --config-file CONFIG``

    If you encouter missing dependecy issue of ``thop``, please install the package first.

    ``pip install thop``

"""

 

=====

 get_flops.py

=====

"""
Script to compute FLOPs of a model
"""
import os
import argparse

import torch
from gluoncv.torch.model_zoo import get_model
from gluoncv.torch.engine.config import get_cfg_defaults

from thop import profile, clever_format


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Compute FLOPs of a model.')
    parser.add_argument('--config-file', type=str, help='path to config file.')
    parser.add_argument('--num-frames', type=int, default=32, help='temporal clip length.')
    parser.add_argument('--input-size', type=int, default=224,
                        help='size of the input image size. default is 224')

    args = parser.parse_args()
    cfg = get_cfg_defaults()
    cfg.merge_from_file(args.config_file)

    model = get_model(cfg)
    input_tensor = torch.autograd.Variable(torch.rand(1, 3, args.num_frames, args.input_size, args.input_size))

    macs, params = profile(model, inputs=(input_tensor,))
    macs, params = clever_format([macs, params], "%.3f")
    print("FLOPs: ", macs, "; #params: ", params)

 

 

 

=====

 get_fps.py

=====

"""
Script to compute latency and fps of a model
"""
import os
import argparse
import time

import torch
from gluoncv.torch.model_zoo import get_model
from gluoncv.torch.engine.config import get_cfg_defaults


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Compute FLOPs of a model.')
    parser.add_argument('--config-file', type=str, help='path to config file.')
    parser.add_argument('--num-frames', type=int, default=32, help='temporal clip length.')
    parser.add_argument('--input-size', type=int, default=224,
                        help='size of the input image size. default is 224')
    parser.add_argument('--num-runs', type=int, default=105,
                        help='number of runs to compute average forward timing. default is 105')
    parser.add_argument('--num-warmup-runs', type=int, default=5,
                        help='number of warmup runs to avoid initial slow speed. default is 5')

    args = parser.parse_args()
    cfg = get_cfg_defaults()
    cfg.merge_from_file(args.config_file)

    model = get_model(cfg)
    model.eval()
    model.cuda()
    input_tensor = torch.autograd.Variable(torch.rand(1, 3, args.num_frames, args.input_size, args.input_size)).cuda()
    print('Model is loaded, start forwarding.')

    with torch.no_grad():
        for i in range(args.num_runs):
            if i == args.num_warmup_runs:
                start_time = time.time()
            pred = model(input_tensor)

    end_time = time.time()
    total_forward = end_time - start_time
    print('Total forward time is %4.2f seconds' % total_forward)

    actual_num_runs = args.num_runs - args.num_warmup_runs
    latency = total_forward / actual_num_runs
    fps = (cfg.CONFIG.DATA.CLIP_LEN * cfg.CONFIG.DATA.FRAME_RATE) * actual_num_runs / total_forward

    print("FPS: ", fps, "; Latency: ", latency)


=====

reference: https://cv.gluon.ai/build/examples_torch_action_recognition/speed.html

=====


Wednesday 3 February 2021

How to build FFMPEG in Ubuntu18?

 

sudo apt-get update -qq && sudo apt-get -y install \ autoconf \ automake \ build-essential \ cmake \ git-core \ libass-dev \ libfreetype6-dev \ libgnutls28-dev \ libsdl2-dev \ libtool \ libva-dev \ libvdpau-dev \ libvorbis-dev \ libxcb1-dev \ libxcb-shm0-dev \ libxcb-xfixes0-dev \ meson \ ninja-build \ pkg-config \ texinfo \ wget \ yasm \ zlib1g-dev
sudo apt-get install nasm
sudo apt-get install libx264-dev

sudo apt-get install libx265-dev libnuma-dev
sudo apt-get install libvpx-dev
sudo apt-get install libfdk-aac-dev
sudo apt-get install libmp3lame-dev
sudo apt-get install libopus-dev


cd ~/ffmpeg_sources && \
git -C aom pull 2> /dev/null || git clone --depth 1 https://aomedia.googlesource.com/aom && \
mkdir -p aom_build && \
cd aom_build && \
PATH="$HOME/bin:$PATH" cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX="$HOME/ffmpeg_build" -DENABLE_NASM=on -DBUILD_SHARED_LIBS=on ../aom && \
PATH="$HOME/bin:$PATH" make -j32 && \
make install

cd ~/ffmpeg_sources && \
git -C SVT-AV1 pull 2> /dev/null || git clone https://github.com/AOMediaCodec/SVT-AV1.git && \
mkdir -p SVT-AV1/build && \
cd SVT-AV1/build && \
PATH="$HOME/bin:$PATH" cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX="$HOME/ffmpeg_build" -DCMAKE_BUILD_TYPE=Release -DBUILD_DEC=OFF -DBUILD_SHARED_LIBS=ON .. && \
PATH="$HOME/bin:$PATH" make && \
make install

cd ~/ffmpeg_sources && \
wget -O ffmpeg-snapshot.tar.bz2 https://ffmpeg.org/releases/ffmpeg-snapshot.tar.bz2 && \
tar xjvf ffmpeg-snapshot.tar.bz2 && \
cd ffmpeg && \
PATH="$HOME/bin:$PATH" PKG_CONFIG_PATH="$HOME/ffmpeg_build/lib/pkgconfig" ./configure \
  --prefix="$HOME/ffmpeg_build" \
  --pkg-config-flags="--static" \
  --extra-cflags="-I$HOME/ffmpeg_build/include" \
  --extra-ldflags="-L$HOME/ffmpeg_build/lib" \
  --extra-libs="-lpthread -lm" \
  --bindir="$HOME/bin" \
  --enable-gpl \
  --enable-gnutls \
  --enable-libaom \
  --enable-libass \
  --enable-libfdk-aac \
  --enable-libfreetype \
  --enable-libmp3lame \
  --enable-libopus \
  --enable-libsvtav1 \
  --enable-libdav1d \
  --enable-libvorbis \
  --enable-libvpx \
  --enable-libx264 \
  --enable-libx265 \
  --disable-static \
  --enable-shared \
  --enable-nonfree && \
PATH="$HOME/bin:$PATH" make -j32 && \
make install && \
hash -r

 

Reference: https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu