The open blogging platform. Say no to algorithms and paywalls.

Terraform + Yaml = ❤️

Managing infrastructure on the cloud these days can be done in numerous ways, tools like Terraform, Pulumi, Cloudformation, and CDK have propelled the area of Infrastructure as Code(IaC) into new heights. While most of these tools require some basic programming knowledge or even learning new languages, like Terraform HCL, the true power lies in building a platform using these tools.

image

When building infrastructure, as infrastructure developers we tend to create it in ways where the author can create a new infrastructure easily, but not the remainder of the organization. This usually happens when as engineers we think the tool is built only for “us”, however this can be avoided by thinking about infrastructure as a product where our customer is the engineering team as a whole, this helps in ensuring anyone can build new infrastructure easily without prior knowledge about the tools being used.

This was one of the goals for my team at Welcome Software (formerly Newscred) when I first joined. Prior to me joining, the team had built an in house solution using aws cli and boto3, while this worked it was missing key features like state management, allowing quick disaster recovery, and maintainability. We decided to use terraform as our tool in the end and in this article I will be covering how we used Terraform and yaml files to be in sync with our in house solutions.

Why Yaml?

Our in-house solution maintains major resources like security groups and iam roles using yaml for configuration as code.

My initial thoughts to migrate was to use Terraform with .tfvars files to maintain our configurations. However, as soon as we wrote our initial configurations, there was a large unreadability issue and an increasing complexity of duplicate configurations stored in different formats, to ensure we keep a single source of truth we made the choice of implementing our existing pattern of using yaml configurations for our migration.

Implementation

Note: The implementation was written using terraform version 0.14.6 , and was tested in 1.0.6 and required some minor syntax changes. Please refer to warning/errors given by Terraform to fix.

The following implementation uses a lot of advanced tools of terraform like for_each, lookup and flatten. It is highly recommended to go through the docs to get a better understanding of how it works.

Setting up the yaml files

queues:
  - name: example-queue-dlq
    type: standard
    access_policy: basic
    env:
      - production
      - dev

  - name: example-queue
    type: standard
    access_policy: basic
    visibility_timeout_seconds: 10
    dlq:
      name: example-queue-dlq
      max_recieve_count: 1
    env:
      - production
      - dev

Setting up your yaml configuration

Reading the Yaml files

# This file decodes the yaml file and makes it into a terraform map than can then be used with for_each statements
locals {
  sqs_queues = yamldecode(file("${path.root}/conf/sqs/${var.config_file}"))["queues"] # this converts all the queues into a list of maps
  # i.e the list looks as follows
  #[
  #  {
  #    "access_policy" = "basic"
  #    "env" = [
  #      "production",
  #      "dev",
  #    ]
  #    "name" = "example-queue-dlq"
  #    "type" = "standard"
  #  },
  #  ...
  #]
  # The following statement flattens the list of maps into a list of maps that is flattened and easier for us to use
  sqs_standard_queues = flatten([for queue in local.sqs_queues :
    {
      "name"                       = "${terraform.workspace}-${queue.name}"
      "type"                       = queue.type
      "access_policy"              = queue.access_policy
      "dlq"                        = lookup(queue, "dlq", null)
      "visibility_timeout_seconds" = lookup(queue, "visibility_timeout_seconds", 30)
    }
    if contains(queue.env, terraform.workspace)
  ])
  # i.e this is what sqs_standard_queues looks like
  # [
  #   {
  #     "access_policy" = "basic"
  #     "dlq" = null
  #     "name" = "production-example-queue-dlq"
  #     "type" = "standard"
  #   },
  #   {
  #     "access_policy" = "basic"
  #     "dlq" = {
  #       "max_recieve_count" = 1
  #       "name" = "production-example-queue-dlq"
  #     }
  #     "name" = "production-example-queue"
  #     "type" = "standard"
  #     "visibility_timeout_seconds" = 10
  #   },
}

Parsing yaml configuration in Terraform

yamldecode — helps in formatting your yaml file into a map object that terraform can read from.

flatten — helps in restructuring nested maps into a more readable map that is easier to access by terraform functions

Creating all resources

Based on the configurations above, we can now create n sqs queues just by adding new configurations in the yaml file. The following file helps in doing that, using for_each

### Manage standard queues and their access policies
resource "aws_sqs_queue" "sqs_standard_queues" {
  for_each = {
    for queue in local.sqs_standard_queues : queue.name => queue
    if queue.dlq == null
  }

  name                       = each.value.name
  visibility_timeout_seconds = each.value.visibility_timeout_seconds
}

resource "aws_sqs_queue_policy" "sqs_standard_queue_policies" {
  for_each = {
    for queue in local.sqs_standard_queues : queue.name => queue
    if queue.dlq == null
  }

  queue_url = aws_sqs_queue.sqs_standard_queues[each.value.name].id
  policy = file("${path.root}/conf/sqs-policy.json")
}

### Manage standard queues with DLQ enabled and their access poicies
resource "aws_sqs_queue" "sqs_dlq_enabled_standard_queues" {
  for_each = {
    for queue in local.sqs_standard_queues : queue.name => queue
    if queue.dlq != null
  }

  name                       = each.value.name
  visibility_timeout_seconds = each.value.visibility_timeout_seconds
  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.sqs_standard_queues["${terraform.workspace}-${each.value.dlq.name}"].arn
    maxReceiveCount     = each.value.dlq.max_recieve_count
  })
}

resource "aws_sqs_queue_policy" "sqs_dlq_enabled_standard_queue_policies" {
  for_each = {
    for queue in local.sqs_standard_queues : queue.name => queue
    if queue.dlq != null
  }

  queue_url = aws_sqs_queue.sqs_dlq_enabled_standard_queues[each.value.name].id
  policy = file("${path.root}/conf/sqs-dlq-policy.json")
}

Creating queues using parsed configuration

for queue in local.sqs_standard_queues : queue.name => queue

The above statement iterates through our list of flattened queues and maps them to a key value pair. In our scenario the key is the name of the queue, and the value is the map object.

"production-example-queue-dlq":   {
    "access_policy": "basic"
    "dlq": null
    "name": "production-example-queue-dlq"
    "type": "standard"
}

for_each — Iterate through each key in the map generated above and creates a resource as follows in the plan.

aws_sqs_queue.sqs_standard_queues["production-example-queue-dlq"]

Note: The above statement is also how we need to reference the queue in a different resource

if — Helps to condense the list based on meeting the criteria if the key dlq exists or not.

each.value.* — each references to the key. value references to the value of the key and the * can be any of the keys that we set in our locals.

Debugging Tips

Terraform has a lot of useful functions, but sometimes it becomes hard to debug situations with complex maps. In order to debug you can use terraform console . This helps in calling your local resources and seeing the map.

Example in order to debug the above example.

terraform console
> local.sqs_queues # prints out the yaml file decoded
> local.sqs_standard_queues # prints out the flattened object

Conclusion

Using only Terraform limits us to writing configurations in .tfvars files to abstract away complexity from our infrastructure users, which in turn introduces a burden on our users to understand how terraforms language works. By leveraging yaml for configuration as code, as our user interface, we empower our infrastructure users to easily create new resources and stacks using a language they are already familiar with.

This will allow the larger engineering team to bring up services quickly and with less wait times. We have already implemented this for our standalone AWS services successfully, and are currently in the process of migrating our more complex stacks like EKS clusters using Terraform + Yaml.

Special mention to Pratik Saha who had figured out how to convert yaml files into Terraform objects.




Continue Learning