The open blogging platform. Say no to algorithms and paywalls.

How to Properly Lint Your SQL Code Using sqlfluff

Meet sqlfluff — a simple tool to automate and harmonize the SQL formatting and styling.

image

This may be the best-kept secret in data analytics, but there’s no right way to format SQL queries. Worse still, the online documentation about formatting is sparse, and what is out there is often contradictory.

For a beginner, this can lead to bewilderment and frustration. The trick is to realize that there’s no universal standard; different people observe different conventions, and various auto-formatting plugins have different outputs.

In other words, SQL formatting is one of the small things that you shouldn’t sweat. However, some general rules of thumb are helpful when starting and perhaps, the only absolute in SQL formatting is to adopt whatever standards your team has cultivated.

In this article, I will present a simple tool to automate and harmonize the SQL formatting and styling, it’s called Sqlfluff and I would like to take you through:

  • The different roles of an SQL linter.
  • How to set up, configure, and use sqlfluff.
  • How you can integrate sqlfluff into a CICD process to automate SQL style checking.

1. Roles of a linting tool

💅 Style

Linting is a process where a code analyzer is run to find programmatic errors, bugs, “stylistic constructs” or anti-patterns. One of the roles of a linter is to enforce cosmetic rules on your code. These rules do not change how the code behaves, only how it looks.

Code bases without a consistent style are hard to read and work with because of their unpredictable structure. The readers are not given a chance to get used to patterns that would allow them to quickly go through the logic.

👃Code Smells

Beyond “ugly” code, there is another category of “problematic” code. Code that isn’t invalid per se, but might indicate an issue in the logic. For example, in general-purpose languages: we can end up with variables defined but never used redundant imports, and shadowed variables.

2- SQLfluff

It is an extensible and modular linter designed to help you write good SQL and catch errors and bad SQL before it hits your database. It’s an open-source tool with 4K GitHub stars and 287 GitHub forks. Here’s a link to SQLFluff’s open-source repository on GitHub. The tool is dialect-flexible, which means it supports many SQL dialects (BigQuery, Postgres, Spark SQL, SnowFlake..) and it’s designed with ELT applications in mind, sqlfluff also works with jinja templating and dbt.

Installation

🛠 Installation

image

pip install sqlfluff
	    OR
poetry add sqlfluff

Interactive Usage

🤖 Interactive Usage

You can use also the Web UI or an IDE extension to interact with sqlfluff:

Linting

This command will lint the specified file, providing feedback to stdout. The --dialect flag is optional and by default, will try to process your files using its standard ANSI SQL parser.

image

If you prefer using poetry commands:

image

Using the BigQuery dialect ensures that SQLFluff will handle the idiosyncrasies of the flavour of SQL it uses.

Example:

Considering the following SQL query:

image

sqlfluff lint test_query.sql — dialect bigquery

image

Fixing

In fix mode, SQLFluff will essentially find all the linting violations and then see if it can fix them. This won’t perform miracles, if your SQL is invalid, you’ve got more immediate problems to sort out but it does do a good job of automating a lot of boring stuff.

The CLI can automatically fix certain classes of violations that involve formatting concerns like whitespace or uppercasing of keywords.

Running in fix mode is simple:

sqlfluff fix test_query.sql — dialect bigquery

The output from fix mode will transform the query to something like this:

image

3. Configuration

If you’re comfortable with the default conventions, then SQLFluff works without any additional configuration after installing it and will enforce a strongly opinionated set of formatting standards; setting your own conventions involves adding more configuration. You can customize linting and fix SQL files by customizing rules. Please refer to Rules Reference for a complete list of rules.

SQLFluff accepts configuration either through the command line or through configuration files.

You can configure the rules using the command line using the argument --rules or exclude rules using --exclude-rules. It’s useful, especially when using testing or exploring rules.

image

You can also customize sqlfluff parameters using a configuration file. SQLFluff will look for the following files in order. Later files will (if found) will be used to overwrite any values read from earlier files.

  1. setup.cfg
  2. tox.ini
  3. pep8.ini
  4. .sqlfluff
  5. pyproject.toml

Below is an example of the .sqlfluff configuration file:

image

If you are using poetry, you can put your sqlfluff configuration in the pyproject.tom file:

image

Similar to Git’s .gitignore and Docker’s .dockerignore, SQLFluff supports a .sqlfluffignore file to control which files are and aren’t linted.

An example of a potential .sqlfluffignore placed at the root of your project would be:

_# Comments start with a hash.__# Ignore anything in the "temp" path_
/temp/_# Ignore anything called "testing.sql"_
testing.sql

4. Integrate SQL linting in a CICD process

In general, Linting is a process where a code analyzer is run to find programmatic errors, bugs, “stylistic constructs” or anti-patterns. If the code is not indented correctly, the line is flagged in the build. Depending on the implementation, the build gives a warning, or if they’re really strict, the build fails.

Wouldn’t it be nice if we could do the same for SQL files? That when someone tries to push a SELECT * or a script with a NOLOCK hint to the server, the build fails and the SQL script never makes it to the server? Well, it’s pretty simple to do it with sqlfluff.

The process depends on your CICD tools but the most important is that you integrate a step that executes sqlfluff lint ... command in your CI process.

Example:

If you are working in a GCP environment with Cloudbuild as a CICD tool, you can integrate the SQL linting as a build step in the cloudbuild.yaml file. If you push a modification or a new SQL file into your code repository, the Sqlfluff will run automatically and the build fails if the lint is not OK.

# Check SQL Formatting.
- name: ${_BUILD_IMAGE}
  id: "sqlfluff-linter"
  waitFor: ["build-ci"]
  entrypoint: "sqlfluff"
  args: ["lint", "mysql_folder"]

There is also documentation on how to use SQLFluff in CI with diff-quality or pre-commit.

Thank you for reading this far and being interested in this article!




Continue Learning