Thought leadership from the most innovative tech companies, all in one place.

How to Specify Data Types in Python


I recently had a client request that I help him fix and debug his Python code for some personal project. It was hell! — the code was messy and all over the place, there were no comments, and I spent a hell of a long time figuring out what each existing function does! Python is a dynamically-typed language, which means that we don't have to specify data types when we create variables and functions. While this reduces the amount of code we need to write, the workload that we save is in turn added to the next developer that needs to understand and debug the existing function!

Python as a Dynamically-typed language

Let's say we have a simple function add that adds 2 numbers together

def add(a, b):
    return a + b

Normally in Python, we need not specify the data types of the inputs and outputs — As such, Python will allow us to pass any data type into the function as long as they work with the + operator.

x = add(4, 5) # x will be 9
x = add(4.0, 5.0) # x will be 9.0
x = add("apple", "pie") # x will be "applepie"
x = add(["apple"], ["pie"]) # x will be ["apple", "pie"]

Note that we can pass integers, floats, strings, lists etc into the add function, and Python will simply allow us to do so as long as a and b can be added together. However, as functions become more complicated, readability takes a big hit, and we might be often left guessing and having to interpret what data types certain variables and functions belong to.

For instance, let's say we are tasked to fix this function:

def magic(a, b, c):
    if c:
        return [i for i in a if i in b]
    return list(set(a))

At first glance, we wouldn't be able to immediately tell what the input arguments nor the output of the functions are; in this case, we would need to investigate and infer the data types from the function definition itself. As such, if you're working on a semi-complicated project with other developers, for readability's sake, I would recommend explicitly specifying the data types of the inputs, output and variables.

Specifying Data Types in Functions

Let's say we want a function that takes in 2 integers and adds them together.

def add(a, b):
    return a + b

Now, let's modify this function to include the input and output types.

def add(a:int, b:int) -> int:
    return a + b

Notice the additional : after the input arguments a and b that specify the integer type. This means that the arguments a and b that goes into the function should be of integer types.

Also, notice the -> int that comes after the input arguments. This means that the function should return an integer.

Specifying Data Types when Declaring Variables

This is how we normally initialize an integer:

x = 5

This would be how we initialize an integer while explicitly specifying the data type:

x:int = 5

This means that the variable x is an integer

Specifying Multiple Data Types

def add2(a:int) -> int:
    return a + 2

Sometimes, we want our functions to be able to take in more than 1 data type. For instance, let's say we have a add2 function that takes in a number and adds 2 to it. Intuitively, this function should work for both floats and integers. Here's how we specify that a function can either take in or return multiple data types:

from typing import Union
def add2(a:Union[int, float]) -> Union[int, float]:
    return a + 2

Here, the Union[int, float] means that the variable a can be either an integer or a float. This means that the add2 function takes in either an integer or a float, and also returns an integer or a float.

Specifying Types of Lists & Dictionaries

Let's say we have a function that takes in a list of strings and returns another list containing the uppercase versions of the original elements.

def convert_upper(lis):
    return [i.upper() for i in lis]
convert_upper(["apple", "orange"]) # ["APPLE", "ORANGE"]

In this case, we would need to specify that the input list would need to be a list of string values, and not a list of integers, floats, booleans or whatnot. Here's how we can do this:

from typing import List
def convert_upper(lis: List[str]) -> List[str]:
    return [i.upper() for i in lis]

Here, the List[str] means that the variable lis should be a list containing string values. In this case, this function takes in a list of string values, and also returns a list of string values.

from typing import Dict
def print_dict(d:Dict[str,int]):
    for k,v in d.items():

In this case, Dict[str,int] means that the variable d should be a dictionary where keys are of type string, and values are of type integer.

Important Note: You can still pass in whichever data types you want

One important thing to note is that while these type specifications mean that certain variables should be of certain types, they don't mean that they must be of these certain types. Let' say we have this function.

def add(a: int, b:int) -> int:
    return a + b

This function should take in 2 integer variables, and return an integer value. However, if we say pass in 2 strings, Python will still allow it!

x = add("apple", "pie")
# x will still be "applepie"

The main purpose of specifying these data types is to let other developers know which data types certain variables should be, which data types should generally be input to a function as well as which data types a function should return. It however does not dictate these data types strictly.

Continue Learning