Goku Coding Experience: February 2021

"""4. Computing FLOPS, latency and fps of a model
=======================================================

It is important to have an idea of how to measure a video model's speed, so that you can choose the model that suits best for your use case.
In this tutorial, we provide two simple scripts to help you compute (1) FLOPS, (2) number of parameters, (3) fps and (4) latency.
These four numbers will help you evaluate the speed of this model.
To be specific, FLOPS means floating point operations per second, and fps means frame per second.
In terms of comparison, (1) FLOPS, the lower the better,
(2) number of parameters, the lower the better,
(3) fps, the higher the better,
(4) latency, the lower the better.

In terms of input, we use the setting in each model's training config.
For example, I3D models will use 32 frames with stride 2 in crop size 224, but R2+1D models will use 16 frames with stride 2 in crop size 112.
This will make sure that the speed performance here correlates well with the reported accuracy number.
We list these four numbers and the models' accuracy on Kinetics400 dataset in the table below.

+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| Model                                 | FLOPS          | # params   | fps                 | Latency         | Top-1 Accuracy |
+========================================+================+============+=====================+=================+=================+
| resnet18_v1b_kinetics400               | 1.819          | 11.382     | 264.01              | 0.0038          | 66.73           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| resnet34_v1b_kinetics400               | 3.671          | 21.49      | 151.96              | 0.0066          | 69.85           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| resnet50_v1b_kinetics400               | 4.110          | 24.328     | 114.05              | 0.0088          | 70.88           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| resnet101_v1b_kinetics400              | 7.833          | 43.320     | 59.56               | 0.0167          | 72.25           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| resnet152_v1b_kinetics400              | 11.558         | 58.963     | 36.93               | 0.0271          | 72.45           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_resnet50_v1_kinetics400            | 33.275         | 28.863     | 1719.50             | 0.0372          | 74.87           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_resnet101_v1_kinetics400           | 51.864         | 52.574     | 1137.74             | 0.0563          | 75.10           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_nl5_resnet50_v1_kinetics400        | 47.737         | 38.069     | 1403.16             | 0.0456          | 75.17           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_nl10_resnet50_v1_kinetics400       | 62.199         | 42.275     | 1200.69             | 0.0533          | 75.93           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_nl5_resnet101_v1_kinetics400       | 66.326         | 61.780     | 999.94              | 0.0640          | 75.81           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_nl10_resnet101_v1_kinetics400      | 80.788         | 70.985     | 890.33              | 0.0719          | 75.93           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet50_f8s8_kinetics400     | 41.919         | 32.454     | 1702.60             | 0.0376          | 74.41           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet50_f16s4_kinetics400    | 83.838         | 32.454     | 1406.00             | 0.0455          | 76.36           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet50_f32s2_kinetics400    | 167.675        | 32.454     | 860.74              | 0.0744          | 77.89           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet101_f8s8_kinetics400    | 85.675         | 60.359     | 1114.22             | 0.0574          | 76.15           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet101_f16s4_kinetics400   | 171.348        | 60.359     | 876.20              | 0.0730          | 77.11           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| i3d_slow_resnet101_f32s2_kinetics400   | 342.696        | 60.359     | 541.16              | 0.1183          | 78.57           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| r2plus1d_v1_resnet18_kinetics400       | 40.645         | 31.505     | 804.31              | 0.0398          | 71.72           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| r2plus1d_v1_resnet34_kinetics400       | 75.400         | 61.832     | 503.17              | 0.0636          | 72.63           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| r2plus1d_v1_resnet50_kinetics400       | 65.543         | 53.950     | 667.06              | 0.0480          | 74.92           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| r2plus1d_v2_resnet152_kinetics400      | 252.900        | 118.227    | 546.19              | 0.1172          | 81.34           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| ircsn_v2_resnet152_f32s2_kinetics400   | 74.758         | 29.704     | 435.77              | 0.1469          | 83.18           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| slowfast_4x16_resnet50_kinetics400     | 27.820         | 34.480     | 1396.45             | 0.0458          | 75.25           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| slowfast_8x8_resnet50_kinetics400      | 50.583         | 34.566     | 1297.24             | 0.0493          | 76.66           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| slowfast_8x8_resnet101_kinetics400     | 96.794         | 62.827     | 889.62              | 0.0719          | 76.95           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet50_f8s8_kinetics400          | 50.457         | 71.800     | 1350.39             | 0.0474          | 77.04           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet50_f16s4_kinetics400         | 99.929         | 71.800     | 1128.39             | 0.0567          | 77.33           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet50_f32s2_kinetics400         | 198.874        | 71.800     | 716.89              | 0.0893          | 78.90           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet101_f8s8_kinetics400         | 94.366         | 99.705     | 942.61              | 0.0679          | 78.10           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet101_f16s4_kinetics400        | 187.594        | 99.705     | 754.00              | 0.0849          | 79.39           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+
| tpn_resnet101_f32s2_kinetics400        | 374.048        | 99.705     | 479.77              | 0.1334          | 79.70           |
+----------------------------------------+----------------+------------+---------------------+-----------------+-----------------+

.. note::

    Feel free to skip the tutorial because the speed computation scripts are self-complete and ready to launch.

    :download:`Download Full Python Script: get_flops.py<../../../scripts/action-recognition/get_flops.py>`

    :download:`Download Full Python Script: get_fps.py<../../../scripts/action-recognition/get_fps.py>`

    You can reproduce the numbers in the above table by

    ``python get_flops.py --config-file CONFIG`` and ``python get_fps.py --config-file CONFIG``

    If you encouter missing dependecy issue of ``thop``, please install the package first.

    ``pip install thop``

"""

=====

get_flops.py

=====

"""
Script to compute FLOPs of a model
"""
import os
import argparse

import torch
from gluoncv.torch.model_zoo import get_model
from gluoncv.torch.engine.config import get_cfg_defaults

from thop import profile, clever_format

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Compute FLOPs of a model.')
    parser.add_argument('--config-file', type=str, help='path to config file.')
    parser.add_argument('--num-frames', type=int, default=32, help='temporal clip length.')
    parser.add_argument('--input-size', type=int, default=224,
                        help='size of the input image size. default is 224')

    args = parser.parse_args()
    cfg = get_cfg_defaults()
    cfg.merge_from_file(args.config_file)

    model = get_model(cfg)
    input_tensor = torch.autograd.Variable(torch.rand(1, 3, args.num_frames, args.input_size, args.input_size))

    macs, params = profile(model, inputs=(input_tensor,))
    macs, params = clever_format([macs, params], "%.3f")
    print("FLOPs: ", macs, "; #params: ", params)

=====

get_fps.py

=====

"""
Script to compute latency and fps of a model
"""
import os
import argparse
import time

import torch
from gluoncv.torch.model_zoo import get_model
from gluoncv.torch.engine.config import get_cfg_defaults

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Compute FLOPs of a model.')
    parser.add_argument('--config-file', type=str, help='path to config file.')
    parser.add_argument('--num-frames', type=int, default=32, help='temporal clip length.')
    parser.add_argument('--input-size', type=int, default=224,
                        help='size of the input image size. default is 224')
    parser.add_argument('--num-runs', type=int, default=105,
                        help='number of runs to compute average forward timing. default is 105')
    parser.add_argument('--num-warmup-runs', type=int, default=5,
                        help='number of warmup runs to avoid initial slow speed. default is 5')

    args = parser.parse_args()
    cfg = get_cfg_defaults()
    cfg.merge_from_file(args.config_file)

    model = get_model(cfg)
    model.eval()
    model.cuda()
    input_tensor = torch.autograd.Variable(torch.rand(1, 3, args.num_frames, args.input_size, args.input_size)).cuda()
    print('Model is loaded, start forwarding.')

    with torch.no_grad():
        for i in range(args.num_runs):
            if i == args.num_warmup_runs:
                start_time = time.time()
            pred = model(input_tensor)

    end_time = time.time()
    total_forward = end_time - start_time
    print('Total forward time is %4.2f seconds' % total_forward)

    actual_num_runs = args.num_runs - args.num_warmup_runs
    latency = total_forward / actual_num_runs
    fps = (cfg.CONFIG.DATA.CLIP_LEN * cfg.CONFIG.DATA.FRAME_RATE) * actual_num_runs / total_forward

    print("FPS: ", fps, "; Latency: ", latency)

=====

reference: https://cv.gluon.ai/build/examples_torch_action_recognition/speed.html

=====

sudo apt-get update -qq && sudo apt-get -y install \ autoconf \ automake \ build-essential \ cmake \ git-core \ libass-dev \ libfreetype6-dev \ libgnutls28-dev \ libsdl2-dev \ libtool \ libva-dev \ libvdpau-dev \ libvorbis-dev \ libxcb1-dev \ libxcb-shm0-dev \ libxcb-xfixes0-dev \ meson \ ninja-build \ pkg-config \ texinfo \ wget \ yasm \ zlib1g-dev
sudo apt-get install nasm
sudo apt-get install libx264-dev

sudo apt-get install libx265-dev libnuma-dev
sudo apt-get install libvpx-dev
sudo apt-get install libfdk-aac-dev
sudo apt-get install libmp3lame-dev
sudo apt-get install libopus-dev

cd ~/ffmpeg_sources && \
git -C aom pull 2> /dev/null || git clone --depth 1 https://aomedia.googlesource.com/aom && \
mkdir -p aom_build && \
cd aom_build && \
PATH="$HOME/bin:$PATH" cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX="$HOME/ffmpeg_build" -DENABLE_NASM=on -DBUILD_SHARED_LIBS=on ../aom && \
PATH="$HOME/bin:$PATH" make -j32 && \
make install

cd ~/ffmpeg_sources && \
git -C SVT-AV1 pull 2> /dev/null || git clone https://github.com/AOMediaCodec/SVT-AV1.git && \
mkdir -p SVT-AV1/build && \
cd SVT-AV1/build && \
PATH="$HOME/bin:$PATH" cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX="$HOME/ffmpeg_build" -DCMAKE_BUILD_TYPE=Release -DBUILD_DEC=OFF -DBUILD_SHARED_LIBS=ON .. && \
PATH="$HOME/bin:$PATH" make && \
make install

cd ~/ffmpeg_sources && \
wget -O ffmpeg-snapshot.tar.bz2 https://ffmpeg.org/releases/ffmpeg-snapshot.tar.bz2 && \
tar xjvf ffmpeg-snapshot.tar.bz2 && \
cd ffmpeg && \
PATH="$HOME/bin:$PATH" PKG_CONFIG_PATH="$HOME/ffmpeg_build/lib/pkgconfig" ./configure \
--prefix="$HOME/ffmpeg_build" \
--pkg-config-flags="--static" \
--extra-cflags="-I$HOME/ffmpeg_build/include" \
--extra-ldflags="-L$HOME/ffmpeg_build/lib" \
--extra-libs="-lpthread -lm" \
--bindir="$HOME/bin" \
--enable-gpl \
--enable-gnutls \
--enable-libaom \
--enable-libass \
--enable-libfdk-aac \
--enable-libfreetype \
--enable-libmp3lame \
--enable-libopus \
--enable-libsvtav1 \
--enable-libdav1d \
--enable-libvorbis \
--enable-libvpx \
--enable-libx264 \
--enable-libx265 \
--disable-static \
--enable-shared \
--enable-nonfree && \
PATH="$HOME/bin:$PATH" make -j32 && \
make install && \
hash -r

Reference: https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu

Goku Coding Experience

Sunday, 28 February 2021

How to compare two arrays is the same in Python?

Saturday, 13 February 2021

How to run distributed data parallel (DDP) training using Pytorch?

How to run the conda init from a bash script?

How to measure flops, #parameters, fps and latency?

Wednesday, 3 February 2021

How to build FFMPEG in Ubuntu18?