Loops are fundamental and effective for simple tasks. Yet, their efficiency decreases with larger ranges or more complex operations. This article compares various methods of performing iterations in Python to determine the fastest approach.
First, we’ll establish our measurement criteria. While asymptotic notations are an option, a funnier approach involves running each function n times, measuring their execution times, and calculating the average.
Measuring Execution Time: How Is It Done?
There are two primary types of time measurements: wall time and CPU time.
Wall time represents the total elapsed time from the start to the end of execution, analogous to tracking time using a wall clock.
User-CPU time refers to the duration the CPU spends executing the user’s code exclusively, excluding any kernel operations.
System-CPU time encompasses the entire CPU runtime, including system calls, I/O operations, and other kernel tasks.
If the wall time is less than the CPU time, this typically indicates parallel processing, where tasks are executed simultaneously, leading to an accumulation of CPU time that surpasses the actual elapsed time. On the other hand, if the wall time exceeds the CPU time, it often points to delays caused by disk operations, the impact of other running programs, or similar inefficiencies.
Various methods exist to measure execution time in Python.
#a simple function to measure
def foo(x):
total = 0
for i in range(x):
total += i
return total
- datetime: Timestamps for the start and end can be recorded using the
datetime
module to measure wall time.
from datetime import datetime
start = datetime.now()
foo(100000000)
end = datetime.now()
et = (end - start).total_seconds()
print(f"Execution time : in seconds: {(end - start).total_seconds() },
in time format: {(end - start)}")
"""
Execution time : in seconds: 5.709269, in time format: 0:00:05.709269
"""
- time: Python’s standard library. It can measure both the wall time and the CPU time.
import time
start = time.time()
foo(100000000)
end = time.time()
print(f"Execution time : in seconds: {(end - start)},
in time format: {time.strftime('%H:%M:%S', time.gmtime((end - start)))}")
"""
Execution time : in seconds: 5.703110933303833, in time format: 00:00:05
"""
start = time.process_time()
foo(100000000)
end = time.process_time()
print(f"Execution process time : in seconds: {(end - start)},
in time format: {time.strftime('%H:%M:%S', time.gmtime((end - start)))}")
"""
Execution process time : in seconds: 5.514226999999999, in time format: 00:00:05
"""
- timeit: This module offers a straightforward method to measure wall time. It disables the garbage collector, repeats the task n times, and returns the total time taken, allowing for the calculation of the average execution time across n executions.
import timeit
n=10
result = timeit.timeit(stmt='foo(100000000)', globals=globals(), number=n)
print(f"Execution process time : in seconds: {(result/n)}")
"""
Execution process time : in seconds: 5.475671691600001
"""
I plan to use timeit
for measuring execution time, setting the number of repeats to n=10.
💡 Speed up your blog creation with DifferAI.
Available for free exclusively on the free and open blogging platform, Differ.
FOR LOOP
A “for-loop” iterates over an iterable.
liste = ["Apple", "Orange", "Strawberry"]
for fruit in liste:
print(fruit)
"""
Apple
Orange
Strawberry
"""
In Python, as in all programming languages, different methods can perform the same task as a for loop, each with varying completion times.
import timeit
import numpy as np
def foo(x):
total = 0
for i in range(x):
total += i
return total
def boo(x):
total = sum(range(x))
return total
def nuu(x):
total = np.sum(np.arange(x))
return total
n=10
result = timeit.timeit(stmt='foo(100000000)', globals=globals(), number=n)
print(f"Foo: {(result/n)}")
result = timeit.timeit(stmt='boo(100000000)', globals=globals(), number=n)
print(f"Boo: {(result/n)}")
result = timeit.timeit(stmt='nuu(100000000)', globals=globals(), number=n)
print(f"Nuu: {(result/n)}")
"""
Foo: 5.413630695899999
Boo: 1.9532857375
Nuu: 0.23119822499999998
"""
Utilizing Python’s built-in sum
function is more efficient than a simple loop. Moreover, numpy outperforms both (you will hear this a lot).
How Does the For-loop Work?
Simply put, an iterable object is iterated over.
An iterable object yields its members sequentially. When an iterable object is passed to the iter
function, it returns an iterator.
print(iter([1,2,3]))
print(iter("ASD"))
"""
<list_iterator object at 0x7fbdd8134af0>
<str_iterator object at 0x7fbdd8134af0>
"""
In an iterator, you advance to the next value using the next
function, provided there is a subsequent element available.
it = iter([1,2,3,4])
print(it)
print(next(it))
print(next(it))
print(next(it))
print(next(it))
print(next(it))
"""
<list_iterator object at 0x7f8f380ccfd0>
1
2
3
4
Traceback (most recent call last):
File "/Users/okanyenigun/Desktop/codes/draft.py", line 42, in <module>
print(next(it))
StopIteration
"""
The Python interpreter executes byte code, processing instructions one at a time, including invoking the next
function repeatedly within a for-loop and performing tasks like lock acquisition and variable resolution during each iteration. As an interpreted language, Python generally exhibits slower performance compared to compiled languages, although this can vary based on the task and optimizations applied.
In Python, various operations can achieve the same outcome as a for-loop. Let’s examine these alternatives.
Enumeration
Enumeration enables iteration over an iterable with access to each item’s index. It produces a tuple during each iteration, consisting of the index and the current item’s value.
Additionally, the index for each iteration can be obtained by utilizing the range
function in a for-loop.
Using enumerate
is slightly faster than using a range
iterator.
def foo(liste):
"simple for loop"
total = 0
for i in range(len(liste)):
total += liste[i]
return total
def boo(liste):
"enumeration"
total = 0
for i, j in enumerate(liste):
total += j
return total
n= 10
liste = list(range(10000000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
foo: 0.7428947792
boo: 0.6568550417
"""
List Comprehension
List comprehension transforms iterative statements into concise expressions, used to generate new lists. It offers a shorter syntax and faster execution but loads the entire output list into memory, which requires caution with large datasets.
[expression for item in list]
liste = [1,2,3,4,5]
newlist = [x**2 for x in liste]
print(newlist)
"""
[1, 4, 9, 16, 25]
"""
string = "cristiano ronaldo"
liste = [s.upper() for s in string]
print(liste)
"""
['C', 'R', 'I', 'S', 'T', 'I', 'A', 'N', 'O', ' ', 'R', 'O', 'N', 'A', 'L', 'D', 'O']
"""
We can use if-else conditions in list comprehensions.
liste = [1,2,3,4,5]
a = [x for x in liste if x > 3]
b = [x if x < 3 else x**3 for x in liste]
print("a: ",a)
print("b: ",b)
"""
a: [4, 5]
b: [1, 2, 27, 64, 125]
"""
We can create a matrix or a nested list:
matrix = [[j+1 for j in range(3)] for i in range(5)]
print(matrix)
nested = [[j+1 for j in range(i+1)] for i in range(5)]
print(nested)
"""
[[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]]
"""
#nested if
it = range(100)
liste = [x for x in it if x % 3 == 0 if x < 50]
print(liste)
"""
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48]
"""
def foo(numbers):
"simple loop"
liste = []
for number in numbers:
if number < 5000000:
liste.append(number*1.08)
else:
liste.append(number*2)
return liste
def boo(numbers):
"list comprehension"
return [x*1.08 if x < 5000000 else x*2 for x in numbers]
n= 10
liste = list(range(100000000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
foo: 9.10908775
boo: 7.0431246583
"""
Lambda
Lambda functions are anonymous functions (functions without a name) that can accept any number of arguments but are limited to a single expression.
result = lambda x: x ** 3
print(result(5))
"""
125
"""
Map, Filter & Reduce
The map
function enables applying a specified function to each item in an iterable, such as a list.
map(function, list)
#we can use lambda function
liste = [10,20,30,40,50]
result = map(lambda x: x**2, liste)
result_list = list(result)
print(result)
print(result_list)
"""
<map object at 0x7fea380badf0>
[100, 400, 900, 1600, 2500]
"""
#we can use list of functions
def square(x):
return x*x
def cube(x):
return x ** 3
list_of_func = [square, cube]
for i in range(3):
result = list(map(lambda x: x(i), list_of_func))
print(result)
"""
[0, 0]
[1, 1]
[4, 8]
"""
def foo(liste):
new_list = []
for l in liste:
new_list.append(l ** 2)
return new_list
def foo_range(liste):
new_list = []
for i in range(len(liste)):
new_list.append(liste[i] ** 2)
return new_list
def boo(liste):
return list(map(lambda x: x ** 2, liste))
n= 10
liste = list(range(10000000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='foo_range(liste)', globals=globals(), number=n)
print(f"foo with range: {(result/n)}")
result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
foo: 5.2132118541
foo with range: 5.531390737499999
boo: 2.8367943749999993
"""
#map accepts only one argument
liste= [(1, 2), (3, 2), (2, 4)]
list(map(lambda x, y: x *y, liste))
"""
TypeError: <lambda>() missing 1 required positional argument: 'y'
"""
The filter
function filters elements of an iterable based on a specified function that evaluates to True or False.
liste = range(10)
filtered_list = list(filter(lambda x : x <5, liste))
print(filtered_list)
"""
[0, 1, 2, 3, 4]
"""
def foo(liste):
new_list = []
for l in liste:
if l % 2 == 0:
new_list.append(l)
elif l % 3 == 0:
new_list.append(l**2)
return new_list
def sub_boo(val):
if val % 2 == 0:
return val
elif val % 3 == 0:
return val ** 2
def boo(liste):
return list(filter(sub_boo, liste))
n= 10
liste = list(range(10000000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
foo: 1.5751615709000002
boo: 1.3204897791
"""
The reduce
function allows applying a cumulative operation to a list, resulting in a single value.
from functools import reduce
liste = range(11)
reduced_value = reduce((lambda x, y: x + y), liste)
print(reduced_value)
"""
55
"""
from functools import reduce
def foo(liste):
total = 0
for l in liste:
total += l
return total
def boo(liste):
return reduce((lambda x, y : x+y), liste)
n= 10
liste = list(range(100000000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
foo: 4.1751244416999995
boo: 7.651879245900001
"""
Itertools
The itertools
module offers a variety of looping techniques, all of which return iterators.
The combinations
method generates subsequences from an iterable, consisting of all possible combinations of a specified length.
import itertools
def foo(liste):
new_list = []
for i in liste:
for j in liste:
if i < j:
new_list.append((i,j))
return new_list
def boo(liste):
return itertools.combinations(liste, 2)
vals = boo([1,2,3,4])
for v in vals:
print(v)
n = 10
liste = list(range(10000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)
foo: 6.4279789417
boo: 0.000488679100000411
"""
The permutations
method generates successive permutations of a given iterable.
string = "CRISTIANO"
print(list(itertools.permutations(string, 2)))
"""
[('C', 'R'), ('C', 'I'), ('C', 'S'), ('C', 'T'), ('C', 'I'), ('C', 'A'), ('C', 'N'), ('C', 'O'), ('R', 'C'), ('R', 'I'), ('R', 'S'), ('R', 'T'), ('R', 'I'), ('R', 'A'), ('R', 'N'), ('R', 'O'), ('I', 'C'), ('I', 'R'), ('I', 'S'), ('I', 'T'), ('I', 'I'), ('I', 'A'), ('I', 'N'), ('I', 'O'), ('S', 'C'), ('S', 'R'), ('S', 'I'), ('S', 'T'), ('S', 'I'), ('S', 'A'), ('S', 'N'), ('S', 'O'), ('T', 'C'), ('T', 'R'), ('T', 'I'), ('T', 'S'), ('T', 'I'), ('T', 'A'), ('T', 'N'), ('T', 'O'), ('I', 'C'), ('I', 'R'), ('I', 'I'), ('I', 'S'), ('I', 'T'), ('I', 'A'), ('I', 'N'), ('I', 'O'), ('A', 'C'), ('A', 'R'), ('A', 'I'), ('A', 'S'), ('A', 'T'), ('A', 'I'), ('A', 'N'), ('A', 'O'), ('N', 'C'), ('N', 'R'), ('N', 'I'), ('N', 'S'), ('N', 'T'), ('N', 'I'), ('N', 'A'), ('N', 'O'), ('O', 'C'), ('O', 'R'), ('O', 'I'), ('O', 'S'), ('O', 'T'), ('O', 'I'), ('O', 'A'), ('O', 'N')]
"""
The product
function calculates the Cartesian product of the given iterables.
import itertools
def foo(list1, list2):
new_list = []
for i in list1:
for j in list2:
new_list.append((i,j))
return new_list
def boo(list1, list2):
return itertools.product(list1,list2)
vals = boo([1,2],["a","b","c"])
for v in vals:
print(v)
list1 =range(1000000)
list2 = ["a","b","c","d","e","f"]
n = 10
liste = list(range(10000))
result = timeit.timeit(stmt='foo(list1, list2)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(list1, list2)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
(1, 'a')
(1, 'b')
(1, 'c')
(2, 'a')
(2, 'b')
(2, 'c')
foo: 0.5739818416
boo: 0.025386108300000033
"""
starmap
is a variant of the map
function that allows passing multiple arguments to the function.
import itertools
numbers = list(itertools.combinations(range(1,10), 2))
print("numbers: ", numbers)
def foo(x, y):
return x * y
result = list(itertools.starmap(foo, numbers))
print("result: " ,result)
"""
numbers: [(1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (5, 6), (5, 7), (5, 8), (5, 9), (6, 7), (6, 8), (6, 9), (7, 8), (7, 9), (8, 9)]
result: [2, 3, 4, 5, 6, 7, 8, 9, 6, 8, 10, 12, 14, 16, 18, 12, 15, 18, 21, 24, 27, 20, 24, 28, 32, 36, 30, 35, 40, 45, 42, 48, 54, 56, 63, 72]
"""
def operation(x, y):
return x * y
def foo(numbers):
for num in numbers:
operation(num[0], num[1])
def boo(numbers):
list(itertools.starmap(operation, numbers))
n = 10
numbers = list(itertools.combinations(range(1,1000), 2))
result = timeit.timeit(stmt='foo(numbers)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(numbers)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
foo: 0.0595270958
boo: 0.037608983400000004
"""
compress
filters an iterable based on the boolean values from another iterable.
cars = ["Mercedes", "Audi", "Bmw"]
i_have = [1,0,0]
print(list(itertools.compress(cars, i_have)))
"""
['Mercedes']
"""
The groupby
function allows grouping elements of an iterable according to a specified criterion.
liste = range(1,100)
group = itertools.groupby(liste, key = lambda x : x < 50)
for key, value in group:
print(key, list(value))
"""
True [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
False [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
"""
The accumulate
function cumulatively processes values in a sequence by summing them.
liste = range(1,10)
print(list(itertools.accumulate(liste)))
"""
[1, 3, 6, 10, 15, 21, 28, 36, 45]
"""
Numba
Numba is a compiler library, a just-in-time (JIT) compiler, that translates Python code into optimized machine code at runtime. It’s a super cool feature, it doesn’t require changing your interpreter; instead, you simply apply a decorator to your functions.
You can read the official documentation here.
pip install numba
import random
import numba
import timeit
def foo(n):
count = 0
for i in range(n):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
count += 1
else:
count -= 1
return count
@numba.jit()
def boo(n):
count = 0
for i in range(n):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
count += 1
else:
count -= 1
return count
n = 10
result = timeit.timeit(stmt='foo(10000000)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(10000000)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
foo: 3.3778286208000003
boo: 0.0987006749999999
"""
JIT options:
- nopython: if True, then Numba compiles the code without any Python dependencies.
- parallel: if True, then Numba uses parallelism via multiprocessing.
- nogil: if True, then Numba releases GIL. So, it will be free to run other parts of your code. To make it work, you should have set nopython as True, too.
- cache: if True, it saves the compiled binary code into your pycache folder. Next time, it will use it instead of starting from scratch (only if the file hasn’t been changed).
- fastmath: this option allows faster mathematical operations but the drawback is that it uses less safe floating-point transformations. So, use it if your data is not likely to create inf or Nan values.
- boundscheck: use it when debugging. It ensures array access will not go out of bounds.
Joblib Parallelisation
Joblib is a collection of Python tools designed for lightweight pipelining and parallel computing. Its parallelization features can be integrated into for-loops.
import timeit
import numpy as np
from joblib import Parallel, delayed
def detect_prime_number(x):
if x <= 3:
return x > 1
if (np.mod(x, 2) == 0) or (np.mod(x, 3) == 0):
return False
sqrt_n = int(np.floor(np.sqrt(x)))
p = 5
while p <= sqrt_n:
if (np.mod(x, p) == 0) or (np.mod(x, p + 2) == 0):
return False
p += 6
return True
def foo(numbers):
liste = []
for number in numbers:
liste.append(detect_prime_number(number))
return liste
def boo(numbers):
return Parallel(n_jobs=8)(delayed(detect_prime_number)(n) for n in numbers)
numbers = range(1000000)
n = 10
result = timeit.timeit(stmt='foo(numbers)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(numbers)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
foo: 28.189341283300003
boo: 7.506871979099998
"""
Multiprocessing
The multiprocessing
package in Python can also be utilized.
import timeit
import numpy as np
from multiprocessing import Pool
def detect_prime_number(x):
if x <= 3:
return x > 1
if (np.mod(x, 2) == 0) or (np.mod(x, 3) == 0):
return False
sqrt_n = int(np.floor(np.sqrt(x)))
p = 5
while p <= sqrt_n:
if (np.mod(x, p) == 0) or (np.mod(x, p + 2) == 0):
return False
p += 6
return True
def foo(numbers):
liste = []
for number in numbers:
liste.append(detect_prime_number(number))
return liste
def boo(numbers):
with Pool() as pool:
liste = pool.map(detect_prime_number, numbers)
return liste
if __name__ == '__main__':
numbers = range(1000000)
n = 10
result = timeit.timeit(stmt='foo(numbers)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(numbers)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
foo: 28.122645595799998
boo: 6.724766995800001
"""
For more information on multiprocessing in Python:
Numpy
Python is slow for repeated execution of low-level tasks due to the overhead from type-checking and reference counting. For operations like a + b, Python checks the types of a and b to determine the correct operation to execute. Additionally, it manages reference counting for objects. These overheads accumulate significantly during cycles of repetitive tasks.
Numpy is fast because it utilizes densely packed arrays and benefits from operations implemented in C, thus avoiding Python’s performance pitfalls like pointer indirection and dynamic type checking. It shifts execution to compiled code, enabling rapid processing. For instance, type-checking occurs just once for the entire array, rather than in each loop iteration.
Numpy performs operations on an array through vectorization, treating the array as a whole rather than iterating over its elements.
With the timeit
module, the %timeit
command can be used in notebooks to measure execution time. Below are some examples that utilize this command.
The arange
method in Numpy creates an evenly spaced array, similar to how the range
function works for lists. The reshape
method can alter the array's shape, and the flatten
method can revert it to one-dimensional.
arr = np.arange(start=0, stop=12, step=2)
#[ 0 2 4 6 8 10]
arr = arr.reshape(2,3)
#[[ 0 2 4]
#[ 6 8 10]]
arr = arr.flatten()
#[ 0 2 4 6 8 10]
Universal Functions (ufuncs)
These functions operate element-wise on arrays, with all arithmetic operators being overloaded for Numpy arrays. The complete list of available universal functions (ufuncs) can be found here.
import timeit
import numpy as np
#pythonic way
liste = [1,2,3,4,5,6,7,8,9,10]
result = [x + 10 for x in liste]
#numpy way
arr = np.array(liste)
result = arr + 10
In Numpy, the for-loop operations occur within the compiled core.
liste = list(range(100000))
%timeit [x+10 for x in liste]
#4.05 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
arr = np.array(liste)
%timeit arr+10
#19.9 µs ± 14 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The matmul
method calculates the product of two arrays.
def foo(m1, m2):
result = np.zeros((len(m1), len(m2[0])))
for i in range(len(m1)):
for k in range(len(m2)):
for j in range(len(m2[0])):
result[i][j] += m1[i][k] * m2[k][j]
return result
def nuu(m1, m2):
return np.matmul(m1, m2)
m1 = np.random.randint(low=1, high=100, size=(100, 100))
m2 = np.random.randint(low=1, high=100, size=(100, 100))
%timeit foo(m1, m2)
%timeit nuu(m1, m2)
"""
790 ms ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
463 µs ± 4.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
"""
The accumulate
function allows for the application of operations cumulatively across an array's elements.
arr1 = np.array([1,2,3,4,5])
print(np.add.accumulate(arr1))
#[ 1 3 6 10 15]
print(np.multiply.accumulate(arr1))
#[ 1 2 6 24 120]
arr1 = np.arange(start=0, stop=12, step=2).reshape(3,2)
print("arr1: \n",arr1)
print("add: \n", np.add.accumulate(arr1))
print("multiply: \n",np.multiply.accumulate(arr1))
"""
arr1:
[[ 0 2]
[ 4 6]
[ 8 10]]
add:
[[ 0 2]
[ 4 8]
[12 18]]
multiply:
[[ 0 2]
[ 0 12]
[ 0 120]]
"""
Aggregations
Aggregation functions summarize the values in an array, including operations like min, max, and mean.
from random import random
liste = [random() for i in range(100000)]
#min
%timeit min(liste)
#1.29 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
arr = np.array(liste)
%timeit arr.min()
#18.9 µs ± 148 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#sum
arr = np.random.randint(low=0, high=100, size=1000000)
def foo(arr):
total = 0
for i in arr:
total += i
return total
%timeit foo(arr) #94.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.sum(arr) #399 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Broadcasting
Broadcasting refers to a set of rules that allow ufuncs to operate on arrays of different sizes and dimensions.
The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. [source]
The isin
method allows you to check if elements in one array are present in another array.
import timeit
import numpy as np
#example
arr1 = np.array([1,2,3])
arr2 = np.array([2,4,5])
print(np.isin(arr1, arr2))
print(arr1[np.isin(arr1, arr2)])
#comparison
#pythonic way
def foo(arr1, arr2):
comparison =[]
for i in arr1:
for j in arr2:
if(i==j):
comparison.append(i)
#numpy way
def nuu(arr1, arr2):
comparison = arr1[np.isin(arr1 , arr2)]
n = 10
arr1 = np.random.randint(low=10, high=100000, size=1000)
arr2 = np.random.randint(low=10, high=100000, size=1000)
result = timeit.timeit(stmt='foo(arr1, arr2)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='nuu(arr1, arr2)', globals=globals(), number=n)
print(f"noo: {(result/n)}")
"""
#example
[False True False]
[2]
#comparison
foo: 0.06589524579999999
noo: 0.0002577209000000025
"""
Masking
In Numpy, masking (also known as fancy indexing) can be used as an alternative to slicing.
#masking
arr = np.array([1,4,5,6,7,14])
mask = (arr < 4) | (arr > 8)
arr_new = arr[mask]
print(arr_new)
#[ 1 14]
#fancy indexing
arr = np.array([1,4,5,6,7,14])
indices = [0,3,2]
print(arr[indices])
#[1 6 5]
arr = np.arange(6).reshape(2,3)
print(arr)
#[[0 1 2]
#[3 4 5]]
print(arr[[1,0], :2])
#[[3 4]
# [0 1]]
Some other methods:
The nditer
method offers a sophisticated approach to iteration. 'C' order performs iteration in the same manner as the flatten
method, while 'F' order follows Fortran-style (column-based) iteration.
arr = np.arange(start=0, stop=12, step=2).reshape(2,3)
for x in np.nditer(arr, order='C'):
print(x)
"""
0
2
4
6
8
10
"""
for x in np.nditer(arr, order='F'):
print(x)
"""
0
6
2
8
4
10
"""
To print an entire column during each iteration, supply the appropriate flags.
for x in np.nditer(arr, order='F', flags=['external_loop']):
print(x)
"""
[0 6]
[2 8]
[ 4 10]
"""
It’s possible to modify elements while iterating through them.
for x in np.nditer(arr, op_flags=['readwrite']):
x[...]=x*x
print(arr)
"""
[[ 0 4 16]
[ 36 64 100]]
"""
You can iterate through two Numpy arrays simultaneously, but for this to work, the arrays must be broadcastable. This means they should either have the same size or one should be one-dimensional.
arr1 = np.arange(start=0, stop=12, step=2).reshape(3,2)
arr2 = np.arange(start=12, stop=21, step=3).reshape(3,1)
for x,y in np.nditer([arr1, arr2]):
print(x, y)
"""
0 12
2 12
4 15
6 15
8 18
10 18
"""
meshgrid
can generate rectangular grids from arrays of x and y coordinates. In fact, this method can be utilized to replace two nested loops, such as in a sum operation.
A meshgrid example. Source: Numpy.org
def foo(x):
total = 0
for ith in x:
for jth in x:
total += (ith+jth)
return total
def boo(x):
return np.sum(np.meshgrid(x,x))
print(foo(range(10)))
print(boo(range(10)))
n = 10
numbers = range(1000)
result = timeit.timeit(stmt='foo(numbers)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(numbers)', globals=globals(), number=n)
print(f"boo: {(result/n)}")
"""
900
900
foo: 0.0700874042
boo: 0.0047679999999999945
"""
Conclusion
There are multiple ways to perform iteration in Python, and choosing one method over another is not inherently wrong. However, when speed is a crucial factor, alternatives to a straightforward for-loop, such as list comprehension, parallelization, or vectorization with Numpy, often yield faster results.
Read More
Sources
https://www.yourkit.com/docs/java/help/times.jsp
https://wiki.python.org/moin/ForLoop
https://www.simplilearn.com/tutorials/python-tutorial/python-for-loop
https://docs.python.org/3/library/enum.html
https://www.programiz.com/python-programming/list-comprehension
https://book.pythontips.com/en/latest/map_filter.html
https://www.w3schools.com/python/python_lambda.asp
https://docs.python.org/3/library/itertools.html
https://joblib.readthedocs.io/en/latest/parallel.html
https://docs.python.org/3/library/multiprocessing.html
https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs
https://realpython.com/numpy-array-programming/
https://www.youtube.com/watch?v=EEUXKG97YRw
https://numpy.org/doc/stable/reference/generated/numpy.nditer.html