Trying to change the threads to improve Numpy’s efficiency

5. Trying to change the threads to improve Numpy’s efficiency#

We can change the number of threads by using MKL implementation.

5.1. MKL#

MKL (Intel Math Kernel Library) is a maths library optimised for Intel processors, which provides highly optimised mathematical routines, especially for multi-core processors. When NumPy is built with MKL support, NumPy can take advantage of the parallelisation routines provided by MKL to accelerate numerical computation.

Currently, numpy is usually bundled with MKL, so we can do thread checking with the following code.

5.2. Check if MKL already exists in the system#

import numpy as np
from mkl import set_num_threads, get_max_threads

current_threads = get_max_threads()
print("Default threads：", current_threads)

Default threads： 4

As you can see, the default number of threads shown here is 4, not 1, which matches the parameters of my computer. This means we don’t need to do more operations, MKL has already implemented multi-threaded calculations for us.

But if we want to change the number of threads, we can use set_num_threads().

5.3. Example of changing the number of threads#

Here we use a simple matrix multiplication example to try the effect of changing the threads (number of threads changed to 1).

import timeit
import numpy as np
from mkl import set_num_threads, get_max_threads

def caculation_1():
    current_threads = get_max_threads()    # Get the current number of MKL threads

    set_num_threads(1)    # Set the number of MKL threads to 1

    updated_threads = get_max_threads()   # Get the updated MKL thread count

    print("current_threads：", current_threads)
    print("updated_threads：", updated_threads)

    size = (10000, 100000)    # Create a large random matrix
    matrix = np.random.rand(*size)

    result = np.dot(matrix, matrix)    # Perform matrix multiplication
    
    max_threads = get_max_threads()
    print("max threads：", max_threads)    # Get the max thread count
    
    return result


compute_time_threads_1 = timeit.timeit(lambda: caculation_1(), number=1)    # Record execution time

print("threads_1 execution time:", compute_time_threads_1)

current_threads： 4
updated_threads： 1
max threads： 1
threads_1 execution time: 34.72792249999975

Let’s change the thread to 4:

import timeit
import numpy as np
from mkl import set_num_threads, get_max_threads

def caculation_1():
    current_threads = get_max_threads()    # Get the current number of MKL threads

    set_num_threads(4)    # Set the number of MKL threads to 1

    updated_threads = get_max_threads()   # Get the updated MKL thread count

    print("current_threads：", current_threads)
    print("updated_threads：", updated_threads)

    size = (10000, 100000)    # Create a large random matrix
    matrix = np.random.rand(*size)

    result = np.dot(matrix, matrix)    # Perform matrix multiplication
    
    max_threads = get_max_threads()
    print("max threads：", max_threads)    # Get the max thread count
    
    return result


compute_time_threads_1 = timeit.timeit(lambda: caculation_1(), number=1)    # Record execution time

print("threads_1 execution time:", compute_time_threads_1)

current_threads： 1
updated_threads： 4
max threads： 4
threads_4 execution time: 23.206560399999944

5.4. Conclusion#

Finally, let’s compare speeds of example with the following code.

import pandas as pd
from IPython.display import HTML

data = {
    'Methods': ['threads：1', 'threads：4'],
    'Excution time(s)': [compute_time_threads_1, compute_time_threads_4],
    'Speed up': [1, compute_time_threads_1/compute_time_threads_4]
}
df = pd.DataFrame(data)

# Creating style functions
def add_border(val):
    return 'border: 1px solid black'

# Applying style functions to data boxes
styled_df = df.style.applymap(add_border)

# Defining CSS styles
table_style = [
    {'selector': 'table', 'props': [('border-collapse', 'collapse')]},
    {'selector': 'th, td', 'props': [('border', '1px solid black')]}
]

# Adding styles to stylised data boxes
styled_df.set_table_styles(table_style)

# Displaying stylised data boxes in Jupyter Notebook
HTML(styled_df.to_html())

	Methods	Excution time(s)	Speed up
0	threads：1	34.727922	1.000000
1	threads：4	23.206560	1.496470

You can see that it runs faster when the number of threads is 4. Nevertheless, we don’t need to change this when optimising Numpy, we just need to make sure that MKL has automatically turned on maximum threads to speed up calculations.