Preventing Regular Expression Backtracking in JavaScript

Introduction

Regular expressions (regex) are a powerful tool for manipulating and validating text in software applications. However, certain regex patterns can be vulnerable to backtracking, which can cause super-linear runtime and potentially lead to a denial of service (DoS) attack. In this article, we’ll explore what backtracking is, how it can cause performance issues, and how to prevent it in your regular expressions.

What is backtracking in regex?

Backtracking is a technique used by regex engines to handle complex patterns that contain optional or repeated subpatterns. When a regex pattern contains optional or repeated subpatterns, the engine may need to try multiple combinations of subpatterns in order to find a match. This process is called backtracking.

For example, consider the following regex pattern:

/^[a-zA-Z0-9\s]+$/

This regular expression should match any string that contains only alphanumeric characters and spaces. However, it is vulnerable to backtracking, as the + operator after the character class allows for an arbitrary number of repetitions of the character class.

An attacker could exploit this vulnerability by sending a malicious search query containing a long sequence of non-matching characters, such as:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaab

This string contains 30 a characters followed by a single b. When the regular expression engine tries to match this string, it will match the first 30 a characters to the character class, but fail to match the b character. The engine will then backtrack and try different combinations of the character class until it either matches the entire string or exhausts all possible combinations.

In this case, there are 31 characters in the string, so there are 2³¹ possible combinations of character classes to try. This can take an extremely long time, potentially causing a DoS attack on the server.

To prevent this vulnerability, you could modify the regular expression to use the * operator instead of the + operator, like this:

/^[a-zA-Z0-9\s]*$/

Using * instead of + can make a regular expression less vulnerable to backtracking and reduce the risk of a DoS attack because it reduces the number of possible paths that the regex engine needs to explore.

The + quantifier means "one or more", while the * quantifier means "zero or more". When using the + quantifier, the regex engine must try every possible combination of characters that match the pattern before giving up. This can lead to backtracking and cause the engine to spend an excessive amount of time trying to match a string, which makes it more vulnerable to a DoS attack.

On the other hand, using the * quantifier makes the subpattern optional, which means that the regex engine can skip it entirely if it doesn't match. This reduces the number of possible paths that the engine needs to explore and makes it less vulnerable to backtracking and DoS attacks.

However, it’s important to note that using the * quantifier isn't always the best solution, especially if the subpattern is essential for the match. In such cases, using other techniques such as bounded quantifiers or possessive subpatterns may be more appropriate.

How can backtracking cause performance issues?

Backtracking can cause performance issues in two ways:

  1. Super-linear runtime: Backtracking can cause the runtime of a regex pattern to become super-linear, which means that the time it takes to match the pattern grows faster than the length of the input string. This can make the pattern extremely slow for long input strings and can cause a DoS attack if the pattern is applied to untrusted user input.
  2. High memory usage: Backtracking can also cause the regex engine to use a lot of memory, especially if the pattern contains nested optional or repeated subpatterns. This can cause the application to run out of memory and crash.

How can you prevent backtracking in your regex patterns?

To prevent backtracking in your regex patterns, you need to design your patterns in a way that avoids optional or repeated subpatterns. Here are some techniques you can use:

Use specific character classes:

Using specific character classes can help prevent backtracking by limiting the number of characters that can match a subpattern. For example, the pattern /[a-z]/ matches any lowercase letter from a to z.

Use non-capturing groups:

Using non-capturing groups can help prevent backtracking by avoiding unnecessary memory allocation. For example, the pattern /(?:ab)+/ matches one or more occurrences of the string "ab" without creating capturing groups.

Use atomic groups:

Using atomic groups can help prevent backtracking by making subpatterns non-optional. For example, the pattern /a(?>b|c)/ matches a string that contains the letter "a" followed by either "b" or "c", without backtracking.

Use multiple subpatterns without making any of them optional:

Using multiple subpatterns without making any of them optional can help prevent backtracking by limiting the number of possible paths that the regex engine needs to explore. For example, the pattern /^(ab|cd|ef)$/ matches strings that are either "ab", "cd", or "ef", without backtracking.

Use possessive subpatterns:

Using possessive subpatterns can help prevent backtracking by making a subpattern non-optional. A possessive subpattern is denoted by the (?+...) syntax. For example, the pattern /a(?+b)/ matches a string that contains the letter "a" followed by the letter "b", without backtracking.

Use lazy quantifiers:

Using lazy quantifiers can help prevent backtracking by making subpatterns optional, but still preventing backtracking. A lazy quantifier is denoted by the +? or *? sign. For example, the pattern /a+?/ matches one or more occurrences of the letter "a", but uses lazy quantifiers to prevent backtracking.

Use bounded quantifiers:

Using bounded quantifiers can help prevent backtracking by limiting the number of repetitions of a subpattern. Bounded quantifiers are denoted by {min,max} syntax, where min and max are integers that specify the minimum and maximum number of repetitions. For example, the pattern /a{1,3}/ matches strings that contain the letter "a" repeated between one and three times, without backtracking.

By applying these techniques in your regular expressions, you can create patterns that are more efficient and less vulnerable to backtracking and DoS attacks.

Conclusion

In conclusion, regular expressions are a powerful tool for manipulating and validating text in JavaScript, but they can also be a performance bottleneck if used improperly. By understanding backtracking and how to prevent it in your regex patterns, you can write more efficient and secure code that is less vulnerable to DoS attacks.

Enjoyed this article?

Share it with your network to help others discover it

Continue Learning

Discover more articles on similar topics