[source: https://www.kdnuggets.com/2019/06/speeding-up-python-code-numpy.html]
Python is huge.
Over the past several years the popularity of Python has grown
rapidly. A big part of that has been the rise of Data Science, Machine
Learning, and AI, all of which have high-level Python libraries to work
with!
When using Python for those types of work, it’s often necessary to
work with very large datasets. Those large datasets get read directly
into memory, and are stored and processed as Python arrays, lists, or
dictionaries.
Working with such huge arrays can be time consuming; really that’s
just the nature of the problem. You have thousands, millions, or even
billions of data points. Every microsecond added to the processing of a
single one of those points can drastically slow you down as a result of
the large scale of the data you’re working with.
The slow way
The slow way of processing large datasets is by using raw Python. We can demonstrate this with a very simple example.
The code below multiplies the value of 1.0000001 by itself, 5 million times!
import time
start_time = time.time()
num_multiplies = 5000000
data = range(num_multiplies)
number = 1
for i in data:
number *= 1.0000001
end_time = time.time()
print(number)
print("Run time = {}".format(end_time - start_time))
I have a pretty decent CPU at home, Intel i7–8700k plus 32GB of
3000MHz RAM. Yet still, multiplying those 5 million data points took
0.21367 seconds. If instead I change the value of
num_multiplies
to 1 billion times, the process took 43.24129 seconds!
Let’s try another one with an array.
We’ll build a Numpy array of size 1000x1000 with a value of 1 at each
and again try to multiple each element by a float 1.0000001. The code
is shown below.
On the same machine, multiplying those array values by 1.0000001 in a regular floating point loop took 1.28507 seconds.
import time
import numpy as np
start_time = time.time()
data = np.ones(shape=(1000, 1000), dtype=np.float)
for i in range(1000):
for j in range(1000):
data[i][j] *= 1.0000001
data[i][j] *= 1.0000001
data[i][j] *= 1.0000001
data[i][j] *= 1.0000001
data[i][j] *= 1.0000001
end_time = time.time()
print("Run time = {}".format(end_time - start_time))
What is Vectorization?
Numpy is designed to be efficient with matrix operations. More specifically, most processing in Numpy is
vectorized.
Vectorization involves expressing mathematical operations, such as
the multiplication we’re using here, as occurring on entire arrays
rather than their individual elements (as in our for-loop).
With vectorization, the underlying code is parallelized such that the
operation can be run on multiply array elements at once, rather than
looping through them one at a time. As long as the operation you are
applying does not rely on any other array elements, i.e a “state”, then
vectorization will give you some good speed ups.
Looping over Python arrays, lists, or dictionaries, can be slow.
Thus, vectorized operations in Numpy are mapped to highly optimized C
code, making them much faster than their standard Python counterparts.
The fast way
Here’s the fast way to do things — by using Numpy the way it was
designed to be used.
There’s a couple of points we can follow when looking to speed things up:
- If there’s a for-loop over an array, there’s a good chance we can replace it with some built-in Numpy function
- If we see any type of math, there’s a good chance we can replace it with some built-in Numpy function
Both of these points are really focused on replace non-vectorized Python code with optimised, vectorized, low-level C code.
Check out the fast version of our first example from before, this time with 1 billion multiplications.
We’ve done something very simple: we saw that we had a for-loop in
which we were repeating the same mathematical operation many times. That
should trigger immediately that we should go look for a Numpy function
that can replace it.
We found one — the
power
function which simply applies a
certain power to an input value. The dramatically sped of the code to
run in 7.6293e-6 seconds — that’s a
import time
import numpy as np
start_time = time.time()
num_multiplies = 1000000000
data = range(num_multiplies)
number = 1
number *= np.power(1.0000001, num_multiplies)
end_time = time.time()
print(number)
print("Run time = {}".format(end_time - start_time))
It’s a very similar idea with multiplying values into Numpy arrays.
We see that we’re using a double for-loop and should immediately
recognised that there should be a faster way.
Conveniently, Numpy will automatically vectorise our code if we
multiple our 1.0000001 scalar directly. So, we can write our
multiplication in the same way as if we were multiplying by a Python
list.
The code below demonstrates this and runs in 0.003618 seconds — that’s a 355X speedup!
import time
import numpy as np
start_time = time.time()
data = np.ones(shape=(1000, 1000), dtype=np.float)
for i in range(5):
data *= 1.0000001
end_time = time.time()
print("Run time = {}".format(end_time - start_time))