Monday, 21 October 2019

A typical training process of neural networks

source: https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py

A typical training procedure for a neural network is as follows:
  • Define the neural network that has some learnable parameters (or weights)
  • Iterate over a dataset of inputs
  • Process input through the network
  • Compute the loss (how far is the output from being correct)
  • Propagate gradients back into the network’s parameters
  • Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient

torch - convert to tensor or numpy

# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU

if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

torch - convert numpy to a tensor

Converting NumPy Array to Torch Tensor ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

See how changing the np array changed the Torch Tensor automatically
[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

torch - convert a tensor to numpy

Converting a Torch Tensor to a NumPy Array ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]
See how the numpy array changed in value.

torch resize using view

Resizing: If you want to resize/reshape tensor, you can use torch.view:

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])

torch tensor in-place operation

tensor([[-0.0111,  1.2464,  1.5858],
        [ 0.8375, -0.4832,  0.8757],
        [ 0.4529,  1.5446,  1.0424],
        [-0.4078, -1.1831, -0.8830],
        [-0.5558, -1.2425,  0.1504]])

Note

Any operation that mutates a tensor in-place is post-fixed with an ``_``. For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``.

Model checkpointed using torch.save() unable to be loaded using torch.load() #12042

deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)

RuntimeError: storage has wrong size: expected -5099839699493302364 got 589824

This usually happens when multiple processes try to write to a single file.
However, this should be prevented with the if condition if rank == 0:.


https://discuss.pytorch.org/t/unable-to-load-waveglow-checkpoint-after-training-with-multiple-gpus/47959/2

https://github.com/pytorch/examples/blob/master/imagenet/main.py#L252

Thursday, 10 October 2019

conda import export yml env current environment

1. To export conda environment:-
conda env export | grep -v "^prefix: " > environment.yml

2. To import conda environment:-
conda env create -f env.yml

3. To export pip requirements:-
pip list --format=freeze > requirements.txt

4. To import pip requirements:-
pip install -r requirements.txt

python numpy vectorization

[source: https://www.kdnuggets.com/2019/06/speeding-up-python-code-numpy.html]

Python is huge.
Over the past several years the popularity of Python has grown rapidly. A big part of that has been the rise of Data Science, Machine Learning, and AI, all of which have high-level Python libraries to work with!
When using Python for those types of work, it’s often necessary to work with very large datasets. Those large datasets get read directly into memory, and are stored and processed as Python arrays, lists, or dictionaries.
Working with such huge arrays can be time consuming; really that’s just the nature of the problem. You have thousands, millions, or even billions of data points. Every microsecond added to the processing of a single one of those points can drastically slow you down as a result of the large scale of the data you’re working with.

The slow way


The slow way of processing large datasets is by using raw Python. We can demonstrate this with a very simple example.
The code below multiplies the value of 1.0000001 by itself, 5 million times!
import time

start_time = time.time()

num_multiplies = 5000000
data = range(num_multiplies)
number = 1

for i in data:
    number *= 1.0000001

end_time = time.time()

print(number)
print("Run time = {}".format(end_time - start_time))

I have a pretty decent CPU at home, Intel i7–8700k plus 32GB of 3000MHz RAM. Yet still, multiplying those 5 million data points took 0.21367 seconds. If instead I change the value of num_multiplies to 1 billion times, the process took 43.24129 seconds!
Let’s try another one with an array.
We’ll build a Numpy array of size 1000x1000 with a value of 1 at each and again try to multiple each element by a float 1.0000001. The code is shown below.
On the same machine, multiplying those array values by 1.0000001 in a regular floating point loop took 1.28507 seconds.
import time
import numpy as np

start_time = time.time()

data = np.ones(shape=(1000, 1000), dtype=np.float)

for i in range(1000):
    for j in range(1000):
        data[i][j] *= 1.0000001
        data[i][j] *= 1.0000001
        data[i][j] *= 1.0000001
        data[i][j] *= 1.0000001
        data[i][j] *= 1.0000001

end_time = time.time()

print("Run time = {}".format(end_time - start_time))


What is Vectorization?


Numpy is designed to be efficient with matrix operations. More specifically, most processing in Numpy is vectorized.
Vectorization involves expressing mathematical operations, such as the multiplication we’re using here, as occurring on entire arrays rather than their individual elements (as in our for-loop).
With vectorization, the underlying code is parallelized such that the operation can be run on multiply array elements at once, rather than looping through them one at a time. As long as the operation you are applying does not rely on any other array elements, i.e a “state”, then vectorization will give you some good speed ups.
Looping over Python arrays, lists, or dictionaries, can be slow. Thus, vectorized operations in Numpy are mapped to highly optimized C code, making them much faster than their standard Python counterparts.

The fast way


Here’s the fast way to do things — by using Numpy the way it was designed to be used.
There’s a couple of points we can follow when looking to speed things up:
  • If there’s a for-loop over an array, there’s a good chance we can replace it with some built-in Numpy function
  • If we see any type of math, there’s a good chance we can replace it with some built-in Numpy function
Both of these points are really focused on replace non-vectorized Python code with optimised, vectorized, low-level C code.
Check out the fast version of our first example from before, this time with 1 billion multiplications.
We’ve done something very simple: we saw that we had a for-loop in which we were repeating the same mathematical operation many times. That should trigger immediately that we should go look for a Numpy function that can replace it.
We found one — the power function which simply applies a certain power to an input value. The dramatically sped of the code to run in 7.6293e-6 seconds — that’s a
import time
import numpy as np

start_time = time.time()

num_multiplies = 1000000000
data = range(num_multiplies)
number = 1

number *= np.power(1.0000001, num_multiplies)

end_time = time.time()

print(number)
print("Run time = {}".format(end_time - start_time))


It’s a very similar idea with multiplying values into Numpy arrays. We see that we’re using a double for-loop and should immediately recognised that there should be a faster way.
Conveniently, Numpy will automatically vectorise our code if we multiple our 1.0000001 scalar directly. So, we can write our multiplication in the same way as if we were multiplying by a Python list.
The code below demonstrates this and runs in 0.003618 seconds — that’s a 355X speedup!
import time
import numpy as np

start_time = time.time()

data = np.ones(shape=(1000, 1000), dtype=np.float)

for i in range(5):
    data *= 1.0000001

end_time = time.time()

print("Run time = {}".format(end_time - start_time))

pytorch RuntimeError: CUDA error: invalid device ordinal

export CUDA_VISIBLE_DEVICES=0,1

Monday, 7 October 2019

Version between Keras and Tensorflow

Keras 2.2.4 -> Tensorflow 1.13.1

Keras 2.2.5 -> Tensorflow 1.14.0

Thursday, 3 October 2019

how to convert list to array by removing comma

before:
new_bbox [array([559.        , 125.        , 607.        , 276.        ,
         0.64699197])] [[559.03448486 125.92767334 607.16009521 276.85614014   0.64699197]]

after:
new_bbox [[559. 125. 607. 276.   1.]] [[559. 125. 607. 276.   1.]]

please note that comma have been removed from list
use np.array to remove all comma

    new_bbox.append([xw1, yw1, xw2, yw2, s])
return np.array(new_bbox)

Tuesday, 1 October 2019

how to debug in python

1a. insert a breakpoint() before any line you want to check

1b. then in the pdb mode:-
use dir to check the object values
use c to continue
use type to check data type
use esc to quit

or

2a. use the following lines to track the bug

try:
    something()
except:
    breakpoint()

the program will enter pdb mode and you can use the same thing as 1.