How To Efficiently Concatenate Strings In Python

And how not to concatenate strings

By Dinesh Kumar

November 24th, 2020

image

I always used to concatenate strings using the “+” operator irrespective of the language I code only to realize later that I was doing them wrong. To be precise, inefficient.

Today, we shall discuss string concatenation in python and how to do it efficiently.

A simple concatenation:

x =  "hello"
y =  "world"
print(x + y)

Output:
helloworld

As we are aware of the fact that string objects are immutable in python, any concatenation operation using the “+” operator could prove to be costly.

Why is it so?

String objects are immutable in python and any concatenation operation creates a new string, copies the old string character by character and then appends the new string to be concatenated.

To be clear, let me explain with an example:

    def concat_strings():
        """
        This is a program to remove spaces in a string
        :return:
        """

        *input_string = "Th is is an ex am pl ew it hs pa ce"

        output_string = ""

        for i in input_string:
            if i == " ":
                pass
            else:
                output_string += i

        print(output_string)


    concat_strings()

    Output:
    Thisisanexamplewithspace

This program removes the white spaces in a string and prints the new string without spaces. This iterates through the input_string character by character and concatenates to a new string called output_string whilst ignoring the white spaces.

Though this gets the job done this is not very efficient, Is it?

The reason for this is, on every “+” operation a new string is created(Remember strings are immutable?) every single time and the existing values have to be copied character by character.

The time complexity of this operation is O(n²). Wait…how?

Concatenation using + operator under the hood:

To get a basic idea of time complexities and space complexities, kindly refer to this article.

Let's walkthrough the example program iteration by iteration.

In iteration 1, a new string object is created and character “T” is copied to the output_string. So our new string has 1 character now.

In iteration 2, again a new string is created and this time ,“T” is copied. In addition to that, “h” is appended. This goes on for every iteration. (I have skipped the iteration for spaces for simplicity).

So what pattern do we observe here?

For every concatenation, the number of characters copied increases quadratically.

Iteration 1:
1 character (“T”) = 1

Iteration 2:
2 characters(“T”, “h”) = 1+1

Iteration 3:
3 characters(“T”, “h”, “i”) = 1+1+1

Iteration 4:
4 characters(“T”, “h”, “i”, ”s”) = 1+1+1+1

and so on till the length of the string.

If we observe the pattern, the number of copies during concatenation is 1(no of characters) + 2(number of characters) + 3(number of characters) + ….. + (n) .

This gets us to the sum of n natural numbers which is n(n+1)/2 which will give us O(xn²), where x = number of characters. Since the number of characters for every concatenation is always constant here(1 in this case) we could drop the constant factor which gives us O(n²).

The same applies for concatenation of strings of varying length as well. For concatenation of strings with varying lengths, it should be O(N + M) where N and M are the lengths of the two strings being concatenated. However, the behavior will still be quadratic based on the lengths of the strings due to repeated concatenation.

So to summarize, concatenating strings with + is a costly operation especially if string to be concatenated is long.

From the python docs:

Concatenating immutable sequences always results in a new object. This means that building up a sequence by repeated concatenation will have a quadratic runtime cost in the total sequence length.

How do we overcome this?

Well, it's better off using a list to store all the strings to be concatenated and join them using the str.join() method.

str.join() takes an iterable as an argument and returns a string which is a concatenation of all the string objects in the iterable.

def concat_strings():
    *"""
    This is a program to remove spaces in a string
    **:return**:
    """

    *input_string = "Th is is an ex am pl ew it hs pa ce"

    output_lst = list()

    for i in input_string:
        if i == " ":
            pass
        else:
            output_lst.append(i)

    print("".join(output_lst))

Output:

Thisisanexamplewithspace

Here, we have modified our previous example to store every string into a list and finally join them. Please note that appending to lists is always O(1). So that shouldn't impact the run time of the program.

This is a more efficient way to concatenate strings rather than using a “+”. The time complexity of using join() for strings is O(n) where n is the length of the string to be concatenated.

Having said that, the difference in the execution time will be significant only if the strings to be concatenated is long. For smaller strings, we may not see a huge difference.

Note : The second example can also be written as a one liner as pointed out in the comments. This could further reduce the number of lines of code.However, the run time will still remain the same as the time-complexity of split is O(n).

def concat_strings():
    *"""
    This is a program to remove spaces in a string
    **:return**:
    """

    *input_string = "Th is is an ex am pl ew it hs pa ce"

    print("".join(input_string.split()))

Output:

Thisisanexamplewithspace

Summary:

  • Concatenating strings using + is not efficient.

  • The time complexity of string concatenation using + is O(n²)

  • It's always better to leverage the str.join() to concatenate strings

  • str.join() executes in O(n) time.



Continue Learning