The open blogging platform. Say no to algorithms and paywalls.

Buckle Up Your For-Loops, Or Speeding Up Iterations in Python

Faster Alternatives to Python’s Standard For Loop Implementations

Loops are fundamental and effective for simple tasks. Yet, their efficiency decreases with larger ranges or more complex operations. This article compares various methods of performing iterations in Python to determine the fastest approach.

First, we’ll establish our measurement criteria. While asymptotic notations are an option, a funnier approach involves running each function n times, measuring their execution times, and calculating the average.

Measuring Execution Time: How Is It Done?

There are two primary types of time measurements: wall time and CPU time.

Wall time represents the total elapsed time from the start to the end of execution, analogous to tracking time using a wall clock.

User-CPU time refers to the duration the CPU spends executing the user’s code exclusively, excluding any kernel operations.

System-CPU time encompasses the entire CPU runtime, including system calls, I/O operations, and other kernel tasks.

If the wall time is less than the CPU time, this typically indicates parallel processing, where tasks are executed simultaneously, leading to an accumulation of CPU time that surpasses the actual elapsed time. On the other hand, if the wall time exceeds the CPU time, it often points to delays caused by disk operations, the impact of other running programs, or similar inefficiencies.

Various methods exist to measure execution time in Python.

#a simple function to measure

def foo(x):
    total = 0
    for i in range(x):
        total += i
    return total
  • datetime: Timestamps for the start and end can be recorded using the datetime module to measure wall time.
from datetime import datetime

start = datetime.now()
foo(100000000)
end = datetime.now()
et = (end - start).total_seconds()
print(f"Execution time : in seconds: {(end - start).total_seconds() },
        in time format: {(end - start)}")

"""
Execution time : in seconds: 5.709269, in time format: 0:00:05.709269
"""
  • time: Python’s standard library. It can measure both the wall time and the CPU time.
import time

start = time.time()
foo(100000000)
end = time.time()
print(f"Execution time : in seconds: {(end - start)},
       in time format: {time.strftime('%H:%M:%S', time.gmtime((end - start)))}")

"""
Execution time : in seconds: 5.703110933303833, in time format: 00:00:05
"""
start = time.process_time()
foo(100000000)
end = time.process_time()
print(f"Execution process time : in seconds: {(end - start)},
in time format: {time.strftime('%H:%M:%S', time.gmtime((end - start)))}")

"""
Execution process time : in seconds: 5.514226999999999, in time format: 00:00:05
"""
  • timeit: This module offers a straightforward method to measure wall time. It disables the garbage collector, repeats the task n times, and returns the total time taken, allowing for the calculation of the average execution time across n executions.
import timeit
n=10
result = timeit.timeit(stmt='foo(100000000)', globals=globals(), number=n)
print(f"Execution process time : in seconds: {(result/n)}")

"""
Execution process time : in seconds: 5.475671691600001
"""

I plan to use timeit for measuring execution time, setting the number of repeats to n=10.

💡 Speed up your blog creation with DifferAI.

Available for free exclusively on the free and open blogging platform, Differ.

FOR LOOP

A “for-loop” iterates over an iterable.

liste = ["Apple", "Orange", "Strawberry"]

for fruit in liste:
    print(fruit)

"""
Apple
Orange
Strawberry
"""

In Python, as in all programming languages, different methods can perform the same task as a for loop, each with varying completion times.

import timeit
import numpy as np

def foo(x):
    total = 0
    for i in range(x):
        total += i
    return total

def boo(x):
    total = sum(range(x))
    return total

def nuu(x):
    total = np.sum(np.arange(x))
    return total

n=10

result = timeit.timeit(stmt='foo(100000000)', globals=globals(), number=n)
print(f"Foo: {(result/n)}")

result = timeit.timeit(stmt='boo(100000000)', globals=globals(), number=n)
print(f"Boo: {(result/n)}")

result = timeit.timeit(stmt='nuu(100000000)', globals=globals(), number=n)
print(f"Nuu: {(result/n)}")

"""
Foo: 5.413630695899999
Boo: 1.9532857375
Nuu: 0.23119822499999998
"""

Utilizing Python’s built-in sum function is more efficient than a simple loop. Moreover, numpy outperforms both (you will hear this a lot).

How Does the For-loop Work?

Simply put, an iterable object is iterated over.

An iterable object yields its members sequentially. When an iterable object is passed to the iter function, it returns an iterator.

print(iter([1,2,3]))
print(iter("ASD"))

"""
<list_iterator object at 0x7fbdd8134af0>
<str_iterator object at 0x7fbdd8134af0>
"""

In an iterator, you advance to the next value using the next function, provided there is a subsequent element available.

it = iter([1,2,3,4])
print(it)
print(next(it))
print(next(it))
print(next(it))
print(next(it))
print(next(it))

"""
<list_iterator object at 0x7f8f380ccfd0>
1
2
3
4
Traceback (most recent call last):
  File "/Users/okanyenigun/Desktop/codes/draft.py", line 42, in <module>
    print(next(it))
StopIteration

"""

The Python interpreter executes byte code, processing instructions one at a time, including invoking the next function repeatedly within a for-loop and performing tasks like lock acquisition and variable resolution during each iteration. As an interpreted language, Python generally exhibits slower performance compared to compiled languages, although this can vary based on the task and optimizations applied.

In Python, various operations can achieve the same outcome as a for-loop. Let’s examine these alternatives.

Enumeration

Enumeration enables iteration over an iterable with access to each item’s index. It produces a tuple during each iteration, consisting of the index and the current item’s value.

Additionally, the index for each iteration can be obtained by utilizing the range function in a for-loop.

Using enumerate is slightly faster than using a range iterator.

def foo(liste):
    "simple for loop"
    total = 0
    for i in range(len(liste)):
        total += liste[i]
    return total

def boo(liste):
    "enumeration"
    total = 0
    for i, j in enumerate(liste):
        total += j
    return total

n= 10
liste = list(range(10000000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
foo: 0.7428947792
boo: 0.6568550417
"""

List Comprehension

List comprehension transforms iterative statements into concise expressions, used to generate new lists. It offers a shorter syntax and faster execution but loads the entire output list into memory, which requires caution with large datasets.

[expression for item in list]

liste = [1,2,3,4,5]
newlist = [x**2 for x in liste]
print(newlist)

"""
[1, 4, 9, 16, 25]
"""
string = "cristiano ronaldo"
liste = [s.upper() for s in string]
print(liste)

"""
['C', 'R', 'I', 'S', 'T', 'I', 'A', 'N', 'O', ' ', 'R', 'O', 'N', 'A', 'L', 'D', 'O']
"""

We can use if-else conditions in list comprehensions.

liste = [1,2,3,4,5]
a = [x for x in liste if x > 3]
b = [x if x < 3  else x**3  for x in liste]
print("a: ",a)
print("b: ",b)
"""
a: [4, 5]
b: [1, 2, 27, 64, 125]
"""

We can create a matrix or a nested list:

matrix = [[j+1  for j in  range(3)] for i in  range(5)]
print(matrix)

nested = [[j+1  for j in  range(i+1)] for i in  range(5)]
print(nested)

"""
[[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]]
"""

#nested if

it = range(100)
liste = [x for x in it if x % 3 == 0  if x < 50]
print(liste)

"""
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48]
"""
def foo(numbers):
    "simple loop"
    liste = []
    for number in numbers:
        if number < 5000000:
            liste.append(number*1.08)
        else:
            liste.append(number*2)
    return liste

def boo(numbers):
    "list comprehension"
    return [x*1.08 if x < 5000000 else x*2 for x in numbers]

n= 10
liste = list(range(100000000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")

result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
foo: 9.10908775
boo: 7.0431246583
"""

Lambda

Lambda functions are anonymous functions (functions without a name) that can accept any number of arguments but are limited to a single expression.

result = lambda x: x ** 3
print(result(5))

"""
125
"""

Map, Filter & Reduce

The map function enables applying a specified function to each item in an iterable, such as a list.

map(function, list)

#we can use lambda function
liste = [10,20,30,40,50]
result = map(lambda x: x**2, liste)
result_list = list(result)
print(result)
print(result_list)

"""
<map object at 0x7fea380badf0>
[100, 400, 900, 1600, 2500]
"""

#we can use list of functions
def square(x):
    return x*x

def cube(x):
    return x ** 3

list_of_func = [square, cube]
for i in range(3):
    result = list(map(lambda x: x(i), list_of_func))
    print(result)

"""
[0, 0]
[1, 1]
[4, 8]
"""


def foo(liste):
    new_list = []
    for l in liste:
        new_list.append(l ** 2)
    return new_list

def foo_range(liste):
    new_list = []
    for i in range(len(liste)):
        new_list.append(liste[i] ** 2)
    return new_list

def boo(liste):
    return list(map(lambda x: x ** 2, liste))

n= 10
liste = list(range(10000000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='foo_range(liste)', globals=globals(), number=n)
print(f"foo with range: {(result/n)}")
result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
foo: 5.2132118541
foo with range: 5.531390737499999
boo: 2.8367943749999993
"""

#map accepts only one argument

liste= [(1, 2), (3, 2), (2, 4)]

list(map(lambda x, y: x *y, liste))

"""
TypeError: <lambda>() missing 1 required positional argument: 'y'
"""

The filter function filters elements of an iterable based on a specified function that evaluates to True or False.

liste = range(10)
filtered_list = list(filter(lambda x : x <5, liste))
print(filtered_list)

"""
[0, 1, 2, 3, 4]
"""

def foo(liste):
    new_list = []
    for l in liste:
        if l % 2 == 0:
            new_list.append(l)
        elif l % 3 == 0:
            new_list.append(l**2)
    return new_list

def sub_boo(val):
    if val % 2 == 0:
        return val
    elif val % 3 == 0:
        return val ** 2

def boo(liste):
    return list(filter(sub_boo, liste))

n= 10
liste = list(range(10000000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")

result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
foo: 1.5751615709000002
boo: 1.3204897791
"""

The reduce function allows applying a cumulative operation to a list, resulting in a single value.

from functools import reduce

liste = range(11)
reduced_value = reduce((lambda x, y: x + y), liste)
print(reduced_value)

"""
55
"""

from functools import reduce

def foo(liste):
    total = 0
    for l in liste:
        total += l
    return total

def boo(liste):
    return reduce((lambda x, y : x+y), liste)

n= 10
liste = list(range(100000000))

result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")

result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
foo: 4.1751244416999995
boo: 7.651879245900001
"""

Itertools

The itertools module offers a variety of looping techniques, all of which return iterators.

The combinations method generates subsequences from an iterable, consisting of all possible combinations of a specified length.

import itertools

def foo(liste):
    new_list = []
    for i in liste:
        for j in liste:
            if i < j:
                new_list.append((i,j))
    return new_list

def boo(liste):
    return itertools.combinations(liste, 2)

vals = boo([1,2,3,4])
for v in vals:
    print(v)


n = 10
liste = list(range(10000))
result = timeit.timeit(stmt='foo(liste)', globals=globals(), number=n)
print(f"foo: {(result/n)}")

result = timeit.timeit(stmt='boo(liste)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)

foo: 6.4279789417
boo: 0.000488679100000411
"""

The permutations method generates successive permutations of a given iterable.

string = "CRISTIANO"
print(list(itertools.permutations(string, 2)))

"""
[('C', 'R'), ('C', 'I'), ('C', 'S'), ('C', 'T'), ('C', 'I'), ('C', 'A'), ('C', 'N'), ('C', 'O'), ('R', 'C'), ('R', 'I'), ('R', 'S'), ('R', 'T'), ('R', 'I'), ('R', 'A'), ('R', 'N'), ('R', 'O'), ('I', 'C'), ('I', 'R'), ('I', 'S'), ('I', 'T'), ('I', 'I'), ('I', 'A'), ('I', 'N'), ('I', 'O'), ('S', 'C'), ('S', 'R'), ('S', 'I'), ('S', 'T'), ('S', 'I'), ('S', 'A'), ('S', 'N'), ('S', 'O'), ('T', 'C'), ('T', 'R'), ('T', 'I'), ('T', 'S'), ('T', 'I'), ('T', 'A'), ('T', 'N'), ('T', 'O'), ('I', 'C'), ('I', 'R'), ('I', 'I'), ('I', 'S'), ('I', 'T'), ('I', 'A'), ('I', 'N'), ('I', 'O'), ('A', 'C'), ('A', 'R'), ('A', 'I'), ('A', 'S'), ('A', 'T'), ('A', 'I'), ('A', 'N'), ('A', 'O'), ('N', 'C'), ('N', 'R'), ('N', 'I'), ('N', 'S'), ('N', 'T'), ('N', 'I'), ('N', 'A'), ('N', 'O'), ('O', 'C'), ('O', 'R'), ('O', 'I'), ('O', 'S'), ('O', 'T'), ('O', 'I'), ('O', 'A'), ('O', 'N')]
"""

The product function calculates the Cartesian product of the given iterables.

import itertools

def foo(list1, list2):
    new_list = []
    for i in list1:
        for j in list2:
            new_list.append((i,j))
    return new_list

def boo(list1, list2):
    return itertools.product(list1,list2)

vals = boo([1,2],["a","b","c"])
for v in vals:
    print(v)
list1  =range(1000000)
list2 = ["a","b","c","d","e","f"]
n = 10
liste = list(range(10000))
result = timeit.timeit(stmt='foo(list1, list2)', globals=globals(), number=n)
print(f"foo: {(result/n)}")

result = timeit.timeit(stmt='boo(list1, list2)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
(1, 'a')
(1, 'b')
(1, 'c')
(2, 'a')
(2, 'b')
(2, 'c')
foo: 0.5739818416
boo: 0.025386108300000033
"""

starmap is a variant of the map function that allows passing multiple arguments to the function.

import itertools

numbers = list(itertools.combinations(range(1,10), 2))
print("numbers: ", numbers)
def foo(x, y):
    return x * y

result = list(itertools.starmap(foo, numbers))
print("result: " ,result)

"""
numbers:  [(1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (5, 6), (5, 7), (5, 8), (5, 9), (6, 7), (6, 8), (6, 9), (7, 8), (7, 9), (8, 9)]
result:  [2, 3, 4, 5, 6, 7, 8, 9, 6, 8, 10, 12, 14, 16, 18, 12, 15, 18, 21, 24, 27, 20, 24, 28, 32, 36, 30, 35, 40, 45, 42, 48, 54, 56, 63, 72]
"""

def operation(x, y):
    return x * y

def foo(numbers):
    for num in numbers:
        operation(num[0], num[1])

def boo(numbers):
    list(itertools.starmap(operation, numbers))

n = 10
numbers = list(itertools.combinations(range(1,1000), 2))
result = timeit.timeit(stmt='foo(numbers)', globals=globals(), number=n)
print(f"foo: {(result/n)}")

result = timeit.timeit(stmt='boo(numbers)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
foo: 0.0595270958
boo: 0.037608983400000004
"""

compress filters an iterable based on the boolean values from another iterable.

cars = ["Mercedes", "Audi", "Bmw"]
i_have = [1,0,0]

print(list(itertools.compress(cars, i_have)))

"""
['Mercedes']
"""

The groupby function allows grouping elements of an iterable according to a specified criterion.

liste = range(1,100)

group = itertools.groupby(liste, key = lambda x : x < 50)

for key, value in  group:
    print(key, list(value))

"""
True [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
False [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
"""

The accumulate function cumulatively processes values in a sequence by summing them.

liste = range(1,10)
print(list(itertools.accumulate(liste)))

"""
[1, 3, 6, 10, 15, 21, 28, 36, 45]
"""

Numba

Numba is a compiler library, a just-in-time (JIT) compiler, that translates Python code into optimized machine code at runtime. It’s a super cool feature, it doesn’t require changing your interpreter; instead, you simply apply a decorator to your functions.

You can read the official documentation here.

pip install numba

import random
import numba
import timeit

def foo(n):
    count = 0
    for i in range(n):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            count += 1
        else:
            count -= 1
    return count

@numba.jit()
def boo(n):
    count = 0
    for i in range(n):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            count += 1
        else:
            count -= 1
    return count

n = 10
result = timeit.timeit(stmt='foo(10000000)', globals=globals(), number=n)
print(f"foo: {(result/n)}")

result = timeit.timeit(stmt='boo(10000000)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
foo: 3.3778286208000003
boo: 0.0987006749999999
"""

JIT options:

  • nopython: if True, then Numba compiles the code without any Python dependencies.
  • parallel: if True, then Numba uses parallelism via multiprocessing.
  • nogil: if True, then Numba releases GIL. So, it will be free to run other parts of your code. To make it work, you should have set nopython as True, too.
  • cache: if True, it saves the compiled binary code into your pycache folder. Next time, it will use it instead of starting from scratch (only if the file hasn’t been changed).
  • fastmath: this option allows faster mathematical operations but the drawback is that it uses less safe floating-point transformations. So, use it if your data is not likely to create inf or Nan values.
  • boundscheck: use it when debugging. It ensures array access will not go out of bounds.

Joblib Parallelisation

Joblib is a collection of Python tools designed for lightweight pipelining and parallel computing. Its parallelization features can be integrated into for-loops.

import timeit
import numpy as np
from joblib import Parallel, delayed

def detect_prime_number(x):
    if x <= 3:
        return x > 1
    if (np.mod(x, 2) == 0) or (np.mod(x, 3) == 0):
        return False
    sqrt_n = int(np.floor(np.sqrt(x)))
    p = 5
    while p <= sqrt_n:
        if (np.mod(x, p) == 0) or (np.mod(x, p + 2) == 0):
            return False
        p += 6
    return True

def foo(numbers):
    liste = []
    for number in numbers:
        liste.append(detect_prime_number(number))
    return liste

def boo(numbers):
    return Parallel(n_jobs=8)(delayed(detect_prime_number)(n) for n in numbers)

numbers  = range(1000000)

n = 10
result = timeit.timeit(stmt='foo(numbers)', globals=globals(), number=n)
print(f"foo: {(result/n)}")

result = timeit.timeit(stmt='boo(numbers)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
foo: 28.189341283300003
boo: 7.506871979099998
"""

Multiprocessing

The multiprocessing package in Python can also be utilized.

import timeit
import numpy as np
from multiprocessing import Pool

def detect_prime_number(x):
    if x <= 3:
        return x > 1
    if (np.mod(x, 2) == 0) or (np.mod(x, 3) == 0):
        return False
    sqrt_n = int(np.floor(np.sqrt(x)))
    p = 5
    while p <= sqrt_n:
        if (np.mod(x, p) == 0) or (np.mod(x, p + 2) == 0):
            return False
        p += 6
    return True

def foo(numbers):
    liste = []
    for number in numbers:
        liste.append(detect_prime_number(number))
    return liste

def boo(numbers):
    with Pool() as pool:
        liste =  pool.map(detect_prime_number, numbers)
    return liste

if __name__ == '__main__':

    numbers  = range(1000000)
    n = 10
    result = timeit.timeit(stmt='foo(numbers)', globals=globals(), number=n)
    print(f"foo: {(result/n)}")
    result = timeit.timeit(stmt='boo(numbers)', globals=globals(), number=n)
    print(f"boo: {(result/n)}")


"""
foo: 28.122645595799998
boo: 6.724766995800001
"""

For more information on multiprocessing in Python:

Multiprocessing in Python

Numpy

Python is slow for repeated execution of low-level tasks due to the overhead from type-checking and reference counting. For operations like a + b, Python checks the types of a and b to determine the correct operation to execute. Additionally, it manages reference counting for objects. These overheads accumulate significantly during cycles of repetitive tasks.

Numpy is fast because it utilizes densely packed arrays and benefits from operations implemented in C, thus avoiding Python’s performance pitfalls like pointer indirection and dynamic type checking. It shifts execution to compiled code, enabling rapid processing. For instance, type-checking occurs just once for the entire array, rather than in each loop iteration.

Numpy performs operations on an array through vectorization, treating the array as a whole rather than iterating over its elements.

With the timeit module, the %timeit command can be used in notebooks to measure execution time. Below are some examples that utilize this command.

The arange method in Numpy creates an evenly spaced array, similar to how the range function works for lists. The reshape method can alter the array's shape, and the flatten method can revert it to one-dimensional.

arr = np.arange(start=0, stop=12, step=2)
#[ 0 2 4 6 8 10]

arr = arr.reshape(2,3)
#[[ 0 2 4]
#[ 6 8 10]]

arr = arr.flatten()
#[ 0 2 4 6 8 10]

Universal Functions (ufuncs)

These functions operate element-wise on arrays, with all arithmetic operators being overloaded for Numpy arrays. The complete list of available universal functions (ufuncs) can be found here.

import timeit
import numpy as np

#pythonic way
liste = [1,2,3,4,5,6,7,8,9,10]
result = [x + 10  for x in liste]

#numpy way
arr = np.array(liste)
result = arr + 10

In Numpy, the for-loop operations occur within the compiled core.

liste = list(range(100000))
%timeit [x+10  for x in liste]
#4.05 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

arr = np.array(liste)
%timeit arr+10
#19.9 µs ± 14 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The matmul method calculates the product of two arrays.

def foo(m1, m2):
    result = np.zeros((len(m1), len(m2[0])))
    for i in range(len(m1)):
        for k in range(len(m2)):
            for j in range(len(m2[0])):
                result[i][j] += m1[i][k] * m2[k][j]
    return result

def nuu(m1, m2):
    return np.matmul(m1, m2)
m1 = np.random.randint(low=1, high=100, size=(100, 100))
m2 = np.random.randint(low=1, high=100, size=(100, 100))
%timeit foo(m1, m2)
%timeit nuu(m1, m2)
"""
790 ms ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
463 µs ± 4.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
"""

The accumulate function allows for the application of operations cumulatively across an array's elements.

arr1 = np.array([1,2,3,4,5])
print(np.add.accumulate(arr1))
#[ 1  3  6 10 15]

print(np.multiply.accumulate(arr1))
#[  1   2   6  24 120]

arr1 = np.arange(start=0, stop=12, step=2).reshape(3,2)
print("arr1: \n",arr1)
print("add: \n", np.add.accumulate(arr1))
print("multiply: \n",np.multiply.accumulate(arr1))

"""
arr1:
 [[ 0  2]
 [ 4  6]
 [ 8 10]]
add:
 [[ 0  2]
 [ 4  8]
 [12 18]]
multiply:
 [[  0   2]
 [  0  12]
 [  0 120]]
"""

Aggregations

Aggregation functions summarize the values in an array, including operations like min, max, and mean.

from random import random

liste = [random() for i in range(100000)]

#min

%timeit min(liste)
#1.29 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

arr = np.array(liste)

%timeit arr.min()
#18.9 µs ± 148 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

#sum
arr = np.random.randint(low=0, high=100, size=1000000)

def foo(arr):
    total = 0
    for i in arr:
        total += i
    return total

%timeit foo(arr) #94.6 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.sum(arr) #399 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Broadcasting

Broadcasting refers to a set of rules that allow ufuncs to operate on arrays of different sizes and dimensions.

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. [source]

The isin method allows you to check if elements in one array are present in another array.

import timeit
import numpy as np

#example
arr1 = np.array([1,2,3])
arr2 = np.array([2,4,5])
print(np.isin(arr1, arr2))
print(arr1[np.isin(arr1, arr2)])
#comparison
#pythonic way
def foo(arr1, arr2):
    comparison =[]
    for i  in  arr1:
        for j in arr2:
            if(i==j):
                comparison.append(i)
#numpy way
def nuu(arr1, arr2):
    comparison = arr1[np.isin(arr1 , arr2)]
n = 10
arr1 = np.random.randint(low=10, high=100000, size=1000)
arr2 = np.random.randint(low=10, high=100000, size=1000)
result = timeit.timeit(stmt='foo(arr1, arr2)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='nuu(arr1, arr2)', globals=globals(), number=n)
print(f"noo: {(result/n)}")
"""
#example
[False  True False]
[2]
#comparison
foo: 0.06589524579999999
noo: 0.0002577209000000025
"""

Masking

In Numpy, masking (also known as fancy indexing) can be used as an alternative to slicing.

#masking

arr = np.array([1,4,5,6,7,14])
mask = (arr < 4) | (arr > 8)
arr_new = arr[mask]
print(arr_new)
#[ 1 14]

#fancy indexing

arr = np.array([1,4,5,6,7,14])
indices = [0,3,2]
print(arr[indices])
#[1 6 5]

arr = np.arange(6).reshape(2,3)
print(arr)
#[[0 1 2]
#[3 4 5]]
print(arr[[1,0], :2])
#[[3 4]
# [0 1]]

Some other methods:

The nditer method offers a sophisticated approach to iteration. 'C' order performs iteration in the same manner as the flatten method, while 'F' order follows Fortran-style (column-based) iteration.

arr = np.arange(start=0, stop=12, step=2).reshape(2,3)
for x in np.nditer(arr, order='C'):
    print(x)
"""
0
2
4
6
8
10
"""
for x in np.nditer(arr, order='F'):
    print(x)
"""
0
6
2
8
4
10
"""

To print an entire column during each iteration, supply the appropriate flags.

for x in np.nditer(arr, order='F', flags=['external_loop']):
    print(x)
"""
[0 6]
[2 8]
[ 4 10]
"""

It’s possible to modify elements while iterating through them.

for x in np.nditer(arr, op_flags=['readwrite']):
    x[...]=x*x
print(arr)
"""
[[  0   4  16]
 [ 36  64 100]]
"""

You can iterate through two Numpy arrays simultaneously, but for this to work, the arrays must be broadcastable. This means they should either have the same size or one should be one-dimensional.

arr1 = np.arange(start=0, stop=12, step=2).reshape(3,2)
arr2 = np.arange(start=12, stop=21, step=3).reshape(3,1)
for x,y in np.nditer([arr1, arr2]):
    print(x, y)
"""
0 12
2 12
4 15
6 15
8 18
10 18
"""

meshgrid can generate rectangular grids from arrays of x and y coordinates. In fact, this method can be utilized to replace two nested loops, such as in a sum operation.

A meshgrid example. Source: Numpy.org

def foo(x):
    total = 0
    for ith in x:
        for jth in x:
            total += (ith+jth)
    return total

def boo(x):
    return np.sum(np.meshgrid(x,x))

print(foo(range(10)))
print(boo(range(10)))

n = 10
numbers = range(1000)
result = timeit.timeit(stmt='foo(numbers)', globals=globals(), number=n)
print(f"foo: {(result/n)}")
result = timeit.timeit(stmt='boo(numbers)', globals=globals(), number=n)
print(f"boo: {(result/n)}")

"""
900
900
foo: 0.0700874042
boo: 0.0047679999999999945
"""

Conclusion

There are multiple ways to perform iteration in Python, and choosing one method over another is not inherently wrong. However, when speed is a crucial factor, alternatives to a straightforward for-loop, such as list comprehension, parallelization, or vectorization with Numpy, often yield faster results.

Read More

Sources

https://www.yourkit.com/docs/java/help/times.jsp

https://serverfault.com/questions/48455/what-are-the-differences-between-wall-clock-time-user-time-and-cpu-time

https://wiki.python.org/moin/ForLoop

https://www.simplilearn.com/tutorials/python-tutorial/python-for-loop

https://docs.python.org/3/library/enum.html

https://www.programiz.com/python-programming/list-comprehension

https://book.pythontips.com/en/latest/map_filter.html

https://www.w3schools.com/python/python_lambda.asp

https://docs.python.org/3/library/itertools.html

https://numba.pydata.org/

https://joblib.readthedocs.io/en/latest/parallel.html

https://docs.python.org/3/library/multiprocessing.html

https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs

https://realpython.com/numpy-array-programming/

https://www.youtube.com/watch?v=EEUXKG97YRw

https://numpy.org/doc/stable/reference/generated/numpy.nditer.html

https://numpy.org/

https://numpy.org/doc/stable/user/basics.broadcasting.html




Continue Learning