Goku Coding Experience: 2024

Tuesday, 17 December 2024

How to setup NTP server in ubuntu20?

Reference: https://www.digitalocean.com/community/tutorials/how-to-set-up-time-synchronization-on-ubuntu-20-04

First, run apt update to refresh your local package index:

sudo apt update

Then, run apt install ntp to install this package:

sudo apt install ntp

ntpd will begin automatically after your installation completes. You can verify that everything is working correctly by querying ntpd for status information:

ntpq -p

If you want to setup a local server:

sudo vim /etc/ntp.conf
---
# Point to our network's master time server
server 192.168.1.234 iburst
restrict default
driftfile /var/lib/ntp/ntp.drift
minpoll 4
maxpoll 5

Then, you can create a drift file as following:

sudo vim /var/lib/ntp/ntp.drift
---
0.000

Finally you can restart the service:

sudo service ntp restart
sudo service ntp status

Monday, 23 September 2024

How to install wifi driver in ubuntu?

ref: https://github.com/McMCCRU/rtl8188gu

sudo apt-get install build-essential git dkms
git clone https://github.com/McMCCRU/rtl8188gu.git
cd rtl8188gu
make
sudo make install
cd .. 
rm -rf rtl8188gu
sudo reboot

Thursday, 19 September 2024

How to restrict or limit a new ubuntu user from accessing other folder?

1. add a new user

sudo adduser newuser

2. add user to sudo group

sudo usermod -aG sudo newuser

3. add user to ssh group

sudo vim /etc/ssh/sshd_config

>>AllowUsers newuser

4. add restriction to the new user (space sensitive)

sudo visudo -f /etc/sudoers.d/newuser

>>newuser ALL=(ALL) ALL, !sudoedit, !/usr/bin/su, !/bin/su, !/bin/bash, !/bin/sh, !/usr/bin/chmod, !/usr/bin/chown, !/usr/bin/docker, !/usr/bin/passwd, !/usr/sbin/visudo

if we want to allow some command to be executed without key in password:

newuser ALL=(ALL) NOPASSWD: /usr/bin/find

5. check the new restriction

sudo visudo -c

If it is ok, you will the following message print out:

/etc/sudoers: parsed OK

/etc/sudoers.d/README: parsed OK

/etc/sudoers.d/newuser: parsed OK

6. apply folder restriction to current user folder

sudo chmod -R 700 /home/currentuser

7. reboot the machine and check the restriction

sudo reboot

cd /home/currentuser

sudo chmod -R 777 /home/currentuser

sudo docker ps

How to ssh or scp without password?

Method 1: Use SSH Key-Based Authentication

The most secure and recommended way is to set up SSH key-based authentication, which doesn't require you to include a password in the command.

1. Generate SSH Key Pair (if you don’t have one already):

ssh-keygen -t rsa -b 4096

Save the key in the default location (~/.ssh/id_rsa).

2. Copy Public Key to the Remote Machine:

ssh-copy-id username@remote_host -p 22

Replace username and remote_host with your remote machine's username and IP address or hostname.

3. Run rsync without Password or ssh into another machine without Password:

rsync -avz /path/to/source/ username@remote_host:/path/to/destination/

ssh username@remote_host -p 22

4. Change folder permission on destination folder.

sudo chmod -R 777 destination_folder

Monday, 22 July 2024

How to install ffmpeg offline using a static version?

Ref: https://johnvansickle.com/ffmpeg/

1. What is a static build and how do I install it?

A static build is basically a binary with all the libs included inside the binary itself. There's no installation necessary in order to use a static binary, but you may want to place it in your shell's PATH to easily call it from the command line. Otherwise you can use the binary's absolute path. Here's a quick walkthrough:

Download the latest git build.

$ wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz

$ wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz.md5

With the build and the build's md5 hash downloaded you can check its integrity.

$ md5sum -c ffmpeg-git-amd64-static.tar.xz.md5
ffmpeg-git-amd64-static.tar.xz: OK

Unpack the build. Note: If you need to do this on Windows, use 7-Zip to unpack it. You may have to run it twice; once to uncompress and again to untar the directory.

$ tar xvf ffmpeg-git-amd64-static.tar.xz

Now I have the directory "ffmpeg-git-20180203-amd64-static".

$ ls ffmpeg-git-20180203-amd64-static
ffmpeg  ffprobe  GPLv3.txt  manpages  model  qt-faststart  readme.txt

Please read readme.txt! (hit "q" to exit out of "less")

$ less ffmpeg-git-20180203-amd64-static/readme.txt

Without any further steps I can start using ffmpeg with my relative path to the binary.

$ ./ffmpeg-git-20180203-amd64-static/ffmpeg
ffmpeg version N-89948-ge3d946b3f4-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 6.4.0 (Debian 6.4.0-11) 20171206
(snipped output to save space)

Or using the absolute path to the binary.

$ /home/john/ffmpeg-git-20180203-amd64-static/ffmpeg
ffmpeg version N-89948-ge3d946b3f4-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 6.4.0 (Debian 6.4.0-11) 20171206
(snipped output to save space)

To globally install it I need to move the binary into my shell's path. "PATH" is a variable in your environment set to a list of colon seperated directories the shell uses to locate binaries. Here's my system's path.

$ echo $PATH
/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/home/john/.local/bin:/home/john/bin

Your output may look different than mine, but it will be a somewhat similar list of directories. When I run the command "ffmpeg", the shell will look in /usr/local/bin first and then the next directory to the right in above list until it's found. If there's not a binary named "ffmpeg" in any of the above directories the shell will return "ffmpeg: command not found".

Before moving the ffmpeg binary into the shell's path, check to see if an older version of ffmpeg is already installed.

$ whereis ffmpeg 
ffmpeg: /usr/bin/ffmpeg

This lists an older version of ffmpeg in /usr/bin installed via my package manager. I can either uninstall the older version or place the newer static ffmpeg binary in a path that's searched before /usr/bin. According to my shell's path that would be /usr/local/bin.

Move the static binaries ffmpeg and ffprobe into the shell's path.

$ sudo mv ffmpeg-git-20180203-amd64-static/ffmpeg ffmpeg-git-20180203-amd64-static/ffprobe /usr/local/bin/

$ whereis ffmpeg
ffmpeg: /usr/local/bin/ffmpeg

$ whereis ffprobe
ffprobe: /usr/local/bin/ffprobe

Now ffmpeg is globally installed and you're done!

$ ffmpeg
ffmpeg version N-89948-ge3d946b3f4-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 6.4.0 (Debian 6.4.0-11) 20171206
  (snipped output to save space)

Uninstall.

$ sudo rm /usr/local/bin/ffmpeg /usr/local/bin/ffprobe

Thursday, 4 July 2024

How to encode and decode character into ascii code?

https://www.w3schools.com/tags/ref_urlencode.ASP

Character	From Windows-1252	From UTF-8
space	%20	%20
!	%21	%21
"	%22	%22
#	%23	%23
$	%24	%24
%	%25	%25
&	%26	%26
'	%27	%27
(	%28	%28
)	%29	%29
*	%2A	%2A
+	%2B	%2B
,	%2C	%2C
-	%2D	%2D
.	%2E	%2E
/	%2F	%2F
0	%30	%30
1	%31	%31
2	%32	%32
3	%33	%33
4	%34	%34
5	%35	%35
6	%36	%36
7	%37	%37
8	%38	%38
9	%39	%39
:	%3A	%3A
;	%3B	%3B
<	%3C	%3C
=	%3D	%3D
>	%3E	%3E
?	%3F	%3F
@	%40	%40

Thursday, 20 June 2024

How to use docker registry locally?

Thursday, 4 April 2024

How to print out ubuntu specs? Neofetch

sudo apt install neofetch -y

neofetch

neofetch --off (without any logo printed)

Monday, 1 April 2024

How to push the opencv result into a rtsp link?

1. download the rtsp server from this link and run it before step2 below,

https://github.com/bluenviron/mediamtx/releases/download/v1.6.0/mediamtx_v1.6.0_linux_amd64.tar.gz

2. create an example main.cpp using the following code:-

#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/video.hpp>
#include <opencv2/videoio.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>

using namespace cv;
using namespace std;

int main()
{

   //VideoWriter writer("appsrc ! videoconvert ! video/x-raw,format=I420 ! x264enc speed-preset=ultrafast bitrate=600 key-int-max=30 ! video/x-h264,profile=baseline ! rtspclientsink location=rtsp://localhost:8554/mystream",

   VideoWriter writer("appsrc ! videoconvert ! video/x-raw,format=I420 ! x264enc tune=zerolatency byte-stream=true threads=4 ! h264parse ! rtspclientsink location=rtsp://localhost:8554/mystream",
0,
15,
Size(1920, 1080),
true);

   VideoCapture cap("../sample_1080p_h265.mp4");
   //VideoCapture cap("rtsp://192.168.80.203");

   for (;;)
   {
       if (!cap.isOpened()) {
           cout << "Video Capture Fail" << endl;
           break;
       }

       Mat img;
       cap >> img;
       std::cout << "img shape: " << img.rows << ", " << img.cols << std::endl;

       //namedWindow("Display window", WINDOW_AUTOSIZE);// Create a window for display.
       //imshow("Display window", img);
       //waitKey(1);

writer.write(img);
}
}

3. create the CMakeLists.txt and build the program

cmake_minimum_required(VERSION 2.8)

project( main )

find_package( OpenCV REQUIRED )

include_directories( ${OpenCV_INCLUDE_DIRS} )

add_executable( main main.cpp )

target_link_libraries( main ${OpenCV_LIBS} )

4. view the live result using vlc, rtsp://localhost:8554/mystream

Monday, 11 March 2024

Ubuntu Cheatsheet

ref: https://github.com/markwdalton/lambdalabs/blob/main/documentation/cheatsheets.txt

Lambda Cheat Sheets
This is the conceptual starter - provides the idea


Lambda Ubuntu Linux Command Line Cheat Sheet (http://lambdalabs.com/)
Working with Files:
  Basics:
    * pwd – Show the ‘present working directory’
    * ls     - see files in your current directory
    * cd <name> - to change to a new directory
    * find . -name ‘example*’   - Find files that start with the name ‘example’
  ls - List files 
    * ls -alrt - List all files in directory long format 
    * ls -a  - list hidden files
    * ls -CF - List in columns and classify files
    * ls -lR - Long format recursive
  Show sizes of files:
    * du -s <filename> - in KBs
    * du -sh <filename> - human readable
    * du -s * | sort -n  - Easy way to find the largest files/directories in a directory
  Moving Files:
    * mv <filename> <new filename>
    * mv <name> <location/name>
  Move a file to new location/name:
    * mv foo /tmp/user/foo.txt
         Move a directory to new name:
    * mv data save/data.bak
  Copying files:
    * cp <file> <new_file>
    * cp <file> <dir>/<new_file>
      Copy a directory to new name/location:
    * cp -a <dir> <new_dir>
  Remote copy:
    * sftp
    * rsync
    * scp file remote:./file
    * scp -rq directory remote:.
    * sshfs user@remote-host:directory ./mount
      Example:
        $ mkdir myhome
        $ sshfs 192.168.1.122:/home ./myhome
        $ df -h ./myhome
          Filesystem           Size  Used Avail Use% Mounted on
          192.168.1.122:/home  480G   73G  384G  16% /home/user/myhome
        $ umount ./myhome
    * Tunnel a port to a remote hosts local interface to your machine:
        - Use case to securely access over ssh jupyter-notebook on a remote hosts that is not exposed to the internet.
        $ ssh -N -L 8888:localhost:8888 <gpuserver>
          * Substitute the 8888 port as jupyter notebook may assign others
         
    * Tunnel a port for a remote host to access from your local machine through a jump host:
        $ ssh mdalton@50.211.197.34 -L 8080:10.1.10.69:80
       
Checking space on a file system:
  Disk space: (-h is ‘human readable’ or translated to M, G based on 1024 or 1000 for inodes).
   * df -h .  - Check your current location
   * df -h  - Check all filesystems mounted.   You should be concerned over 90%.
        Check Inodes (number of files):
   * df -i  - Number of inodes, used, available. Be concerned over 90%.

Show CPU Utilization:
   * top
   * htop
   * ps
     ps -elf
     ps aux
     ps -flu <user>

* Show Memory Utilization on Linux:
   * free
   * free -h
   * top

Disk commands:
   * lsblk - list drives seen
   * df - show sizes of mounted
   * mount - show mounts/options
   * fdisk -l - show drive partitions
   * smartctl - check drives for information or errors.
     Example:
      smartctl -x /dev/nvme0n0

Show GPU Utilization:
      * nvtop
      * nvidia-smi 
      * nvidia-smi -q - this provides additional information.
      * nvidia-smi pmon

NVlink information:
      * nvidia-smi topo -m  - To see the NVLink connection topology
      * nvidia-smi nvlink -s - To see rates per link

GPU Debugging:
      * ‘nvidia-smi Failed to initialize NVML: Driver/library version mismatch’
      * This can normally be resolved with a reboot.
      * This occurs when the command is newer than the nvidia kernel module.
      * I do not see any GPUs:
      * If it is using  an old CUDA version that does not support current GPUs.  Like CUDA 10 (nvidia-cuda-toolkit) with Ampere GPUs (30## series or A## series GPUs).
      * If you are using Anaconda:
       - If it did not load or  have the correct CUDA installed.
       - If you did not set the LD_LIBRARY_PATH to the cuDNN version Anaconda installed. 
         * See logged errors:
         * grep "kernel: NVRM: Xid" /var/log/kern.log
         * The main Xid errors:
            All GPUs Xid: 79
            A100 Xid: 64, 94

Xid Error References:
      * GPU Debug guide: https://docs.nvidia.com/deploy/gpu-debug-guidelines/index.html
      * GPU Error Definitions: https://docs.nvidia.com/deploy/xid-errors/index.html
      * A100 Xids: https://docs.nvidia.com/deploy/a100-gpu-mem-error-mgmt/index.html

    NVidia Fabric manager/NVSwitch:
      * Fabric manager guide - https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf

PCI devices:
      * lspci - lists devices on the PCI bus 
      * lspci -vvv - provides more verbose output

USB:
      * lsusb - list seen USB devices
      * sudo dmesg - will also show when they are discovered

Linux/Ubuntu/NVIDIA tools for monitoring utilization
      * For GPUs it is important to associate the GPUs PCI Address with the GPU UUID (index is relative)
      * nvidia-smi --query-gpu=index,pci.bus_id,uuid --format=csv
      * top  - Show the linux top process based on CPU, memory (rss) virtual memory
      * htop 

View the processes on GPUs
      * nvidia-smi pmon
      * nvidia-smi dmon -s pc

Show GPU view power, temp, memory on GPUs over time
      * nvidia-smi dmon

Show GPU stats and environment over time in CSV format
      * nvidia-smi --query-gpu=index,pci.bus_id,uuid,fan.speed,utilization.gpu,utilization.memory,temperature.gpu,power.draw --format=csv -l
Find various options for –query-gpu:
      * nvidia-smi --help-query-gpu
On Commercial GPUs like A100’s there are some special options like:
      * GPU Memory Temperature, Memory errors, Memory remapping
      * You can see these all through:
           nvidia-smi -q
      * You can monitor for memory temperature also:
           nvidia-smi --query-gpu=index,pci.bus_id,uuid,pstate,fan.speed,utilization.gpu,utilization.memory,temperature.gpu,temperature.memory,power.draw --format=csv -l
   
      * Watch for remapped memory (requires a reboot/reset of the GPU):
        nvidia-smi --query-remapped-rows=gpu_bus_id,gpu_uuid,remapped_rows.correctable,remapped_rows.uncorrectable,remapped_rows.pending,remapped_rows.failure --format=csv
            * 8 banks in a row can be remapped, but requires a reboot between each remap.
            * After 8 banks in a row are remapped the GPU or chassis (SXM) needs to be reworked.
            * If remapped_rows.failure == yes  ; Disable GPU ; Machine needs a RMA to repair
            * If remapped_rows.pending == yes ; then GPU needs to be reset (commonly high number of aggregate errors).

      * Watch for Volatile (current boot session - more accurate) and Aggregate (life time of GPU - in theory all but misses some) memory errors:
          To see the various memory errors to track:
            nvidia-smi --help-query-gpu | grep "ecc.err"
            For example:
             All volatile memory errors (this boot session or since a GPU reset):
              nvidia-smi --query-gpu=index,pci.bus_id,uuid,ecc.errors.corrected.volatile.dram,ecc.errors.corrected.volatile.sram  --format=csv
              All volatile uncorrected memory errors:
                nvidia-smi --query-gpu=index,pci.bus_id,uuid,ecc.errors.corrected.volatile.dram,ecc.errors.corrected.volatile.sram  --format=csv
             All Aggregate corrected memory errors:
               nvidia-smi --query-gpu=index,pci.bus_id,uuid,ecc.errors.uncorrected.aggregate.dram,ecc.errors.uncorrected.aggregate.sram --format=csv
            All Aggregate uncorrected memory errors:
              nvidia-smi --query-gpu=index,pci.bus_id,uuid,ecc.errors.uncorrected.aggregate.dram,ecc.errors.uncorrected.aggregate.sram --format=csv
	

Linux/Ubuntu commands for Lambda
  system monitoring
      * top
      * htop
      * nvtop
      * nvidia-smi pmon 
      * ps -elf (ps aux) to see running processes
      * free – see the amount of memory and swap;  used and available

  Navigating and finding files
      * pwd – Present working directory
      * ls     - see files in your current directory
      * cd <name> - to change to a new directory
      * find . -name ‘example*’   - Find files that start with the name ‘example’

  Disk and file systems
      * df -h – Show how much space is in all file systems
      * df -ih – show how many inodes (number of files) on each file system
      * du -s * | sort -n    - Show the largest files/directories in the current directory
      * du -sh example.tar.gz – show how large a file ‘example.tar.gz’
      * duf - a little more friendly format (sudo apt install duf)
      * Graphical view: 
         $ sudo apt install xdiskusage
         $ sudo xdiskusage


  Networking:
      * ip address show - Long list of information about interfaces
      * ip -br address show -  more brief version of the commands
      * ip addr show dev <dev> - show address for one interface
      * example: ip addr show dev eth0
      * ip link  - show links
      * ip -br link – brief view of the links
      * ip route – show routes on your system
      * ip tunnel
      * ip n - replace arp find MAC and IP addresses on the network
      * ping -c 3 10.0.0.1    - Ping the IP address 10.0.0.1  three times
      * traceroute 10.0.0.1  - Check the route and performance to IP address 10.0.0.1
      * /etc/netplan – Location of the network interface configurations.


  Managing users:
      * group <username> - Check if there is a ‘user’ and which groups they are in
      * sudo adduser <username> - Adding a new user
      * sudo deluser <username>  - Deleting a existing username
      * sudo adduser <username> <group> - Add a ‘user’ to a ‘group’ both that existing
      * sudo deluser <username> <group>  - Delete/remove a ‘user’ from a ‘group’


  Firewall:
      * sudo iptables -L  - List iptables rules
         This is switching to ‘nftables’
      * sudo nft -a list ruleset
      * sudo ufw status  - Show the status of the ufw
      * Example adding ssh to UFW firewall:
      * sudo ufw allow ssh


  Linux and Lambda Stack upgrades and packaging:
      * sudo apt-get update          - Update the list of packages from repository (sync up)
      * apt list --upgradeable       - list upgradable packages (after the update)
      * sudo apt-get upgrade         - Upgrade packages
      * sudo apt-get dist-upgrade    - more aggressive upgrade - can remove packages
      * sudo apt full-upgrade        - more aggressive upgrade - can remove packages
      * dpkg -L <installed package>  - List the contents of a given packages
      * dpkg -S <full path to file>  - show the package a file came from
      * dpkg –list                   - Show the list of packages
      * apt list --installed

Linux/Ubuntu security managing user access:
      * iptables -L                  - List firewall rules
      * /etc/sudoers                 - Contains a list of sudo rules
      * visudo                       - to edit sudoers to change rules
      * sudo adduser <use> sudo      - Add a user to the sudo group, which gives them full root access via sudo, use caution.
        Example: (Add the user 'john' to the 'sudo' group
           $ sudo adduser john sudo

Linux/Ubuntu NVIDIA GPU
      * nvtop - watch the GPUs utilization and memory utilization
      * nvidia-smi - see the driver version (supported CUDA and usage)
      * note the persistence mode
      * nvidia-smi -q - gives more detailed information for each GPU
   NVlink information:
      * nvidia-smi topo -m  - To see the NVLink connection topology
      * nvidia-smi nvlink -s - To see the rates per link
   See logged errors:
      * grep "kernel: NVRM: Xid" /var/log/kern.log

Boot modes for linux:
  Find the current setting for boot level:
      $ systemctl get-default
  Set to boot to Multi-user (non-graphical):
      $ sudo systemctl set-default multi-user
  Set to boot to Graphical mode:
      $ sudo systemctl set-default graphical

  Change now (temporarily) to multi-user:
      $ sudo systemctl isolate multi-user
  Change now (temporarily) to Graphical:
      $ sudo systemctl isolate graphical


Containers and Virtual Environments:
      See examples:
         https://github.com/markwdalton/lambdalabs/tree/main/documentation/software/examples/virtual-environments
      * Docker/Singularity - Make use of NVIDIA's Container Catalog: https://catalog.ngc.nvidia.com/
      * Python venv - Developed by Python - so this is recommended - supports isolated or using system site default packages.
      * viritualenv - a group independently developed for Python 2.0, still around but recommend move to python venv.
      * Anaconda - Recent years license changes - companies should be aware of the license.

    Docker
      See the Lambda Docker PyTorch Tutorial:
        https://lambdalabs.com/blog/nvidia-ngc-tutorial-run-pytorch-docker-container-using-nvidia-container-toolkit-on-ubuntu
      Install docker (with Lambda Stack installed):
        * sudo apt-get install -y docker.io nvidia-container-toolkit
        * sudo systemctl daemon-reload
        * sudo systemctl restart docker
      Finding many docker images for Deep Learning
        * https://catalog.ngc.nvidia.com/
      Pull a docker image
        * sudo docker pull <image name>
        * sudo docker pull nvcr.io/nvidia/tensorflow:22.05-tf1-py3
        * sudo docker pull nvcr.io/nvidia/pytorch:22.05-py3
      Run Docker (it will pull the image if not found)
        * sudo docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:22.05-py3
      List running docker containers
        * docker ps
      List docker images
        * docker images
      Mount a directory in a docker image on start up
        * sudo docker run --gpus all -it --rm -v `pwd`/data:/data/ nvcr.io/nvidia/pytorch:22.05-py3 
         You can add a command to the end of the line: ls, python code
         Mounts the ‘data’ directory from the current directory into the container as /data.
      Copy a file from the host to the container
        * docker cp input.txt container_id:/input.txt
      Copy a file from the container to the local file system
        * docker cp container_id:/output.txt output.txt
      Copy a group of files in the ‘data’ directory to the container
        * docker cp data/. container_id:/target
      Copy a group of files in the container ‘output’ directory to local host
        * docker cp container_id:/output/. target
        * mark@lambda-dual:~/lambda/tickets/8021$ cat ~/lambda/docker.txt
        * docker create or docker run:
        *                   *   -a, --attach               # attach stdout/err
        *   -i, --interactive          # attach stdin (interactive)
        *   -t, --tty                  # pseudo-tty
        *       --name NAME            # name your image
        *   -p, --publish 5000:5000    # port map
        *       --expose 5432          # expose a port to linked containers
        *   -P, --publish-all          # publish all ports
        *       --link container:alias # linking
        *   -v, --volume `pwd`:/app    # mount (absolute paths needed)
        *   -e, --env NAME=hello       # env vars
        *                   * for 'docker run':
        *   --rm true|false
        *        Automatically remove the container when it exits. The default is false.

      Example to run on ALL GPUs, interactive, with a tty, and remove the running container on exit.
          $ docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:22.05-py3 nvidia-smi
      List the ports mapped example:
        Look for running images:
          $ docker ps
            CONTAINER ID   IMAGE                              COMMAND                  CREATED        STATUS       PORTS                                       NAMES
            e7fa01d97208   dalle-playground_dalle-interface   "docker-entrypoint.s…"   3 months ago   Up 8 hours   0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   dalle-interface

        Look at the port mapping for that running image:
          $ docker port e7fa01d97208
            3000/tcp -> 0.0.0.0:3000
            3000/tcp -> :::3000

Kubernetes
   List all pods in the current namespace:
      kubectl get pods
   List all pods in all namespaces:
      kubectl get pods --all-namespaces
   List all services in the current namespace:
   kubectl get services
   List all deployments in the current namespace:
      kubectl get deployments
   List all nodes in the cluster:
      kubectl get nodes
   Describe a pod:
      kubectl describe pod <pod-name>
   Describe a service:
      kubectl describe service <service-name>
   Describe a deployment:
      kubectl describe deployment <deployment-name>
   Create a new deployment:
      kubectl create deployment <deployment-name> --image=<image-name>
   Update a deployment:
      kubectl set image deployment <deployment-name> <container-name>=<new-image-name>
   Scale a deployment:
      kubectl scale deployment <deployment-name> --replicas=<replica-count>
   Delete a deployment:
      kubectl delete deployment <deployment-name>
   Delete a pod:
      kubectl delete pod <pod-name>
   Delete a service:
      kubectl delete service <service-name>


For servers with IPMI:
Install ipmitool:
  $ sudo apt-get install ipmitool
List the users
  $ sudo ipmitool user list
Change the password for User ID 2 (from previous ‘user list’)
  $ sudo ipmitool user set password 2
        Then, enter the new password twice.
Cold reset the BMC - normally only needed when the BMC is not getting updates:
  $ sudo ipmitool mc reset cold
Print the BMC network information:
  $ sudo ipmitool lan print
Print the BMC Event log
  $ sudo ipmitool sel elist
Print Sensor information:
  $ sudo ipmitool sdr
  $ sudo ipmitool sensor
Print Information about the system:
  $ sudo ipmitool fru
Power Status:
     $ sudo ipmitool power status
Power control server:
     $ sudo ipmitool power [status|on|off|cycle|reset|diag|soft]
Power on server:
     $ sudo ipmitool power on
Power off server:
     $ sudo ipmitool power off
Power cycle server:
     $ sudo ipmitool power cycle
Power reset the server:
     $ sudo ipmitool power reset
Check or Set the BMC time:
     $ sudo ipmitool sel time get
     $ sudo ipmitool sel time set "$(date '+%m/%d/%Y %H:%M:%S')"
     $ ipmitool sel time get
          Or
     $ sudo ipmitool sel time set now
     $ sudo hwclock --systohc


IPMI Example setting up a static IP address:
If you were given:
   IPMI/BMC IP address: 10.100.1.132
    Netmask: 255.255.255.0
    Gateway: 10.100.1.1


Then the configuration would be:
                  * Confirm current settings:
           $ sudo ipmitool lan print 1
                  * Set the IPMI Interface to Static (default is dhcp)
           $ sudo ipmitool lan set 1 ipsrc static
                  * Set the IP Address:
           $ sudo ipmitool lan set 1 ipaddr 129.105.61.139
                  * Set the netmask for this network:
           $ sudo ipmitool lan set 1 netmask 255.255.255.0
                  * Set the Default Gateway
           $ sudo ipmitool lan set 1 defgw ipaddr 129.105.61.1
                  * Set/confirm that LAN (network) access is enabled:
           $ sudo ipmitool lan set 1 access on


Common request is getting the sensor and event log output..
1. On the node from linux:
        $ sudo apt install ipmitool
        $ sudo ipmitool sdr >& ipmi-sdr.txt
        $ sudo ipmitool sel elist >& ipmi-sel.txt
        
      Or from a remote linux machine:
         $ ipmitool -I lanplus -H IP_ADDRESS -U ADMIN -P "PASSWORD" sel elist >& ipmi-sel.txt
         $ ipmitool -I lanplus -H IP_ADDRESS -U ADMIN -P "PASSWORD" sdr >& ipmi-sdr.txt
            ** Where 'PASSWORD' would be your IPMI password, and IP_ADDRESS is your  
               machines BMC/IPMI ip address.
 
Or at least the Web GUI can save the cvs of the eventlog.
     BMC/IPMI -> Logs and Reports -> Event Log -> Save to excel (CSV).


Networking -> Infiniband
     * lsmod | egrep “mlx|ib”
     * ibstat
     * ibstatus
     * ibv_devinfo
     * ibswitch
     * ibhosts
     * lspci | grep Mellanox
     * lspci | egrep -i "mellanox|mlnx|mlx[0-9]_core|mlnx[0-9]_ib"
     * dmesg | egrep -i "mellanox|mlnx|mlx[0-9]_core|mlnx[0-9]_ib"
     * Check for errors like insufficient power
     * lsmod | grep rdma
     * mst start
     * mst status -v

* opensm needs to be running either on the switch or at least one of the nodes