The open blogging platform. Say no to algorithms and paywalls.

A Simple Guide to Shared Memory in Python

Photo by Gautam Ganguly on Unsplash

Python is an incredible programming language. Python is easy to learn, and you can do just about anything with it. However, it has some drawbacks that come into play for parallel programming. Python uses a global interpreter lock, which means only a single thread in a Python process can execute Python byte code at a time. Multiprocessing serves as the primary method to overcome this barrier. Yet, the need to share information between multiple processes can greatly complicate parallel programming in Python.

Shared memory , although possible to do in Python for many years, became much easier in version 3.8 with some additions to the multiprocessing module. Sharing memory between processes is the fastest and most natural approach toward parallel programming in Python. It puts the performance of Python in the ballpark where software like even databases become a possibility. This article will detail how shared memory can be effectively used in Python.

What is shared memory?

Shared memory has two distinct forms. The first is what can be commonly referred to as System V style memory. shm_open and shm_unlink in posix systems deal with this type of shared memory. In this form, shared memory objects are named memory segments that, once created, are persisted by the operating system until being explicitly unlinked, or a system reboot. But processes are free to read and write to them like normal memory space, and those updates are seen to any other process which reads or writes from that shared memory object.

Persisting shared memory objects until a reboot can be both an advantage and a disadvantage. If handled properly, having a place to retrieve objects in memory that can exist across lifetimes of a process allows for easier sharing of work among many processes. If handled incorrectly, a process that allocates shared memory objects but does not unlink them from the system causes a shared memory leak. Alleviating shared memory leaks requires a reboot of the entire system.

Here’s an example of a shared memory leak.

>>> from multiprocessing import shared_memory
>>> shm_c = shared_memory.SharedMemory("testing_block", True, 32)
>>> exit()
% /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown

The second form of shared memory comes from memory mapping. In posix, memory mapping is controlled by the mmap and munmap functions. This interface is quite different than shared memory segments, as memory mapping offers both filed backed and anonymous memory regions that can be accessed by processes. While file mapped memory does offer a form of persistence, it has a number of potential drawbacks around performance. This article will focus entirely on System V style shared memory.

đź’ˇ Speed up your blog creation with DifferAI.

Available for free exclusively on the free and open blogging platform, Differ.

Coordinating Access

We have touched on what shared memory is, but not how we can use it within Python. The builtin multiprocessing module of Python provides an interface which gives you access to shared memory in the form of a buffered view of bytes. However, if multiple processes intend to access this memory at once, how do we ensure such access is safe ?

There’s a few ways. The first is file based locking. This approach uses locks on files which exist on disk to determine the exclusive access for reading and writing a shared memory segment. In posix systems, this can be accomplished with flock . When a process acquires a file lock, it then has the safety to read or write to the shared memory object, before releasing the lock again.

To set up an example, let’s consider the following setup. A parent process creates some shared memory segment, and a lock file. Then, it forks two child processes, one for reading the memory, and one for writing the memory. Either of the children must acquire the lock on the file before proceeding to read or write to it. The children can then release the lock on the file after finishing their operation.

import os
import sys
import fcntl
import time
import random
from multiprocessing import shared_memory

LOCK_FILE_NAME = "foo.lock"
MEM_BLOCK_NAME = "mem_block"
MEM_BLOCK_SIZE = 64

def create_lock_file(name):
	with open(name, "w") as file_obj:
		file_obj.write("The lock")


def do_child_work(mode):
	with open(LOCK_FILE_NAME, "w") as lock_file:
		shm_c = shared_memory.SharedMemory(MEM_BLOCK_NAME, False, MEM_BLOCK_SIZE)
		for i in range(1000):
			start_acq = time.time()
			fcntl.lockf(lock_file, fcntl.LOCK_EX)
			end_acq = time.time()
			print("proc {} took {} to acquire the lock".format(os.getpid(), end_acq - start_acq))
			if mode == 'r':
				start_op = time.time()
				read_bytes = bytes(shm_c._buf)
				end_op = time.time()
				print("proc {} took {} to perform the read".format(os.getpid(), end_op - start_op))
				print("The proc read {}".format(read_bytes[0]))
			elif mode == 'w':
				to_write = random.getrandbits(4)
				print("Will write {}".format(to_write))
				bytes_to_write = bytearray([to_write for i in range(MEM_BLOCK_SIZE)])
				start_op = time.time()
				shm_c._buf[:MEM_BLOCK_SIZE] = bytes_to_write
				end_op = time.time()
				print("proc {} took {} to perform the write".format(os.getpid(), end_op - start_op))

			fcntl.lockf(lock_file, fcntl.LOCK_UN)
	sys.exit(0)

if __name__ == '__main__':
	print("Py DB")
	create_lock_file(LOCK_FILE_NAME)
	shm = shared_memory.SharedMemory(MEM_BLOCK_NAME, True, MEM_BLOCK_SIZE)
	c1 = os.fork()
	c2 = os.fork()
	if c1 == 0:
		do_child_work('r')
	elif c2 == 0:
		do_child_work('w')
	else:
		print(os.waitpid(c1, 0))
		print(os.waitpid(c2, 0))

	print("Processes finished!")
	shm.unlink()

In the above example, the parent process forks two children, and waits on their process IDs , while the children perform the work. The parent process , then at the end, unlinks the shared memory as to not leak it. If this program is run, we can know the process safety works by sequential changes in the reader and the writer.

Will write 10
proc 25827 took 0.0 to perform the write
proc 25828 took 0.0005240440368652344 to acquire the lock
proc 25828 took 6.9141387939453125e-06 to perform the read
The proc read 10
proc 25828 took 9.5367431640625e-07 to acquire the lock
proc 25828 took 2.86102294921875e-06 to perform the read
The proc read 10

As the writer writes a different number in each byte, the reader reads that byte value. However, there’s another, better way to ensure safety when reading and writing to shared memory, and that’s semaphores.

Semaphores and File Locks

Semaphores are the oldest parallelization primitive in programming. They exist as a special type of counter that can be accessed by multiple processes. A counter, such that, if the value is zero, all operations are blocked, until the value reaches one. Although they are old, due to Python’s need for multiple process concurrency , they serve as a valuable tool.

Instead of using file locks, we can take our same example from before and alter it to use a semaphore. Yet, there’s an issue. Unlike files, we don’t have a system to just lookup a semaphore. Although named semaphores exist in some operating systems, it’s a little easier in this example to just transfer it across forks. Meaning, the parent process will create the semaphore, and the children will use the semaphore to gate access to the shared memory. The Python semaphores are a little more encapsulated to provide directly acquire() and release() methods.

import os
import sys
import time
import random
from multiprocessing import shared_memory, Semaphore

MEM_BLOCK_NAME = "mem_block"
MEM_BLOCK_SIZE = 64


def do_child_work(sem, mode):
	shm_c = shared_memory.SharedMemory(MEM_BLOCK_NAME, False, MEM_BLOCK_SIZE)
	for i in range(1000):
		start_acq = time.time()
		sem.acquire()
		end_acq = time.time()
		print("proc {} took {} to acquire the sem".format(os.getpid(), end_acq - start_acq))
		if mode == 'r':
			start_op = time.time()
			read_bytes = bytes(shm_c._buf)
			end_op = time.time()
			print("proc {} took {} to perform the read".format(os.getpid(), end_op - start_op))
			print("The proc read {}".format(read_bytes[0]))
		elif mode == 'w':
			to_write = random.getrandbits(4)
			print("Will write {}".format(to_write))
			bytes_to_write = bytearray([to_write for i in range(MEM_BLOCK_SIZE)])
			start_op = time.time()
			shm_c._buf[:MEM_BLOCK_SIZE] = bytes_to_write
			end_op = time.time()
			print("proc {} took {} to perform the write".format(os.getpid(), end_op - start_op))

		sem.release()
	sys.exit(0)

if __name__ == '__main__':
	print("Py DB")
	sem_main = Semaphore()
	shm = shared_memory.SharedMemory(MEM_BLOCK_NAME, True, MEM_BLOCK_SIZE)
	c1 = os.fork()
	c2 = os.fork()
	if c1 == 0:
		do_child_work(sem_main, 'r')
	elif c2 == 0:
		do_child_work(sem_main, 'w')
	else:
		print(os.waitpid(c1, 0))
		print(os.waitpid(c2, 0))

	print("Processes finished!")
	shm.unlink()

Running the same test on the previous implementation, we see a working model of safety between the child processes

Will write 12
proc 26668 took 0.0 to perform the write
proc 26669 took 8.606910705566406e-05 to acquire the sem
proc 26669 took 9.5367431640625e-07 to perform the read
The proc read 12
proc 26667 took 6.985664367675781e-05 to acquire the sem
proc 26667 took 9.5367431640625e-07 to perform the read
The proc read 12
proc 26668 took 5.91278076171875e-05 to acquire the sem
Will write 14
proc 26668 took 9.5367431640625e-07 to perform the write
proc 26669 took 6.508827209472656e-05 to acquire the sem
proc 26669 took 9.5367431640625e-07 to perform the read
The proc read 14
proc 26667 took 7.009506225585938e-05 to acquire the sem
proc 26667 took 1.1920928955078125e-06 to perform the read
The proc read 14
proc 26668 took 6.508827209472656e-05 to acquire the sem
Will write 4

Something interesting , while running this on an M1 chip laptop with 16GB memory is, these updates and operations are astonishingly fast for Python. Using multiple processes and shared memory is a powerful tool that can be used to overcome the limitations of the global interpreter lock. Databases, web servers, and other high-contention tasks could be in the range of usable performance with the help of shared memory.




Continue Learning