Thought leadership from the most innovative tech companies, all in one place.

These Python Data Structures Will Be Your New Best Friend

An overview of the collections library in Python

image

The collections library is part of Python standard libraries. In it, there are some great data structures that modify and improve lists, dictionaries, and tuples.

Here is an overview of the most useful ones.

Deque

The deque (pronounced “deck”) is a list for which you can specify the maximum number of objects that can be stored. When you create a deque, you can specify the maxlenparameter. For example:

from collections import deque

a = deque(maxlen=10)

In this case, the “list” can have a length of at most 10. Once the deque is full, every time you add a new element at the end(using append, like for lists) the first element will be removed.

a = deque(maxlen=3)

a.append(1) # a = [1]

a.append(2) # a = [1, 2]

a.append(3) # a = [1, 2, 3] FULL

a.append(4) # a = [2,3,4]

Furthermore, a deque is also more efficient than lists for adding and removing values at both ends. There are the methods append, appendleft, pop, popleftthat all take O(1). On the other hand, accessing elements in the middle is more expensive, approximately O(n).

a = deque(maxlen = 10)

a.append(1) # a = [1]

a.append(2) # a = [1, 2] append at the end

a.appendleft(3)# a = [3, 1, 2] append at the beginning

x = a.pop() # a = [3, 1], x = 2 remove on the right

y = a.popleft() # a = [1], y = 3 remove on the left

Named Tuples

This library provides a function that allows you to create an improved tuple, where each field can be accessed using a custom name. For example:

from collections import namedtuple

Point = namedtuple(“Point”, [“x”,”y”,”z”])

p= Point(3,4,5)

print(p.x, p.y, p.z) #Output: 3, 4, 5

The function takes as the first argument the name of the new tuple class (this should always be equal to the variable name). Then we can give the names for the values stored in the tuple. These can be passed as a list of strings, or a single string where names are divided by space and/or commas.

We can create a new instance of a named tuple in various ways:

p1 = Point(3,4,5)

p2 = Point(x=3, y=4, z=5)

p3 = Point._make([3,4,5])

When calling namedtuple(), you can even define the default value for some of the parameters.

PointDef = namedtuple(“PointDef”, “x, y, z”, defaults = [0,0,0])

p = PointDef(x=1) # p is (1,0,0)

Note, however, that the names of non-optional parameters must always be written before the optional ones. So for example,` if you define 3 parameters and only 1 default value, this will be the value for the last parameter. You can check which parameters have default values by using _field_defaults, which returns a dict:

Point = namedtuple(“Point”, “x y z”,defaults ="[0, 0])

print(Point._field_defaults)

# output: {“y”: 0, “z”: 0}

Counter

This class is really useful when you need to count the occurrences of each value in an iterable object, such as a list or a string.

from collections import Counter
c = Counter(“aaabbccdaaa”)

print(c)

#Output: Counter({'a': 6, 'b': 2, 'c': 2, 'd': 1})

You can check the number of occurrences for a specific value. If it is not present, the counter will return 0.

print(c[“a”]) #output: 6

print(c[“z”]) #output: 0

The most_common(n)method will return a list of the nmost common values, together with their occurrences. If *n *is not specified, it writes all the values stored in the counter

print(c.most_common(2))

#output: [('a', 6), ('b', 2)]

Finally, you can add or subtract some occurrences:

c = Counter(“abbc”) # {"a":1, "b":2, "c":1}

c.update(“bccd”) # {"a":1, "b":3, "c":3, "d":1}

c.subtract(“bbc”) # {"a":1, "b":1, "c":2, "d":1}

Defaultdict

This is a subclass of the Dictionary class. The special characteristic of defaultdict is that you can provide a function, which will be used to create a default value whenever you try to access a key that is not yet present in the dictionary.

For example, suppose that we want to store a list of values for each key. With a normal dict , we would have to do something like:

toAdd = [("key1", 3), ("key2", 5), ("key3", 6), ("key2", 7)]

d = dict()
for key, val in toAdd:
  if key in d:
    d[key].append(val)
  else:
    d[key] = [val]

print(d) # {"key1":[3], "key2":[5, 7], "key3":[6]}

With defauldict, however, we can set the default value to be an empty list. Then we can always call append: if the key was not in the dictionary, it will firstly create the default value (empty list) and then add the value.

To create an empty list we can either provide a lambda function: lambda : [] or pass the built-in function list .

from collections import defaultdict
toAdd =[("key1", 3), ("key2", 5), ("key3", 6), ("key2", 7)]

d = defaultdict(list)
for key, val in toAdd:
  d[key].append(val)

print(d) # {"key1":[3], "key2":[5, 7], "key3":[6]}

This functionality is similar to the setdefault method for dict(), which can be used to rewrite the above code as:

toAdd =[("key1", 3), ("key2", 5), ("key3", 6), ("key2", 7)]

d = dict()
for key, val in toAdd:
  d.setdefault(key, []).append(val)

print(d) # {"key1":[3], "key2":[5, 7], "key3":[6]}

However, the defaultdict is simpler and faster.

Conclusion

I hope you found this article useful! To learn more about the collections library, you can see the python docs. Thank you for reading through to the end!




Continue Learning