Managing infrastructure on the cloud these days can be done in numerous ways, tools like Terraform, Pulumi, Cloudformation, and CDK have propelled the area of Infrastructure as Code(IaC) into new heights. While most of these tools require some basic programming knowledge or even learning new languages, like Terraform HCL, the true power lies in building a platform using these tools.
When building infrastructure, as infrastructure developers we tend to create it in ways where the author can create a new infrastructure easily, but not the remainder of the organization. This usually happens when as engineers we think the tool is built only for “us”, however this can be avoided by thinking about infrastructure as a product where our customer is the engineering team as a whole, this helps in ensuring anyone can build new infrastructure easily without prior knowledge about the tools being used.
This was one of the goals for my team at Welcome Software (formerly Newscred) when I first joined. Prior to me joining, the team had built an in house solution using aws cli
and boto3
, while this worked it was missing key features like state management, allowing quick disaster recovery, and maintainability. We decided to use terraform as our tool in the end and in this article I will be covering how we used Terraform and yaml files to be in sync with our in house solutions.
Why Yaml?
Our in-house solution maintains major resources like security groups and iam roles using yaml for configuration as code.
My initial thoughts to migrate was to use Terraform with .tfvars
files to maintain our configurations. However, as soon as we wrote our initial configurations, there was a large unreadability issue and an increasing complexity of duplicate configurations stored in different formats, to ensure we keep a single source of truth we made the choice of implementing our existing pattern of using yaml
configurations for our migration.
Implementation
Note: The implementation was written using terraform version 0.14.6 , and was tested in
1.0.6
and required some minor syntax changes. Please refer to warning/errors given by Terraform to fix.
The following implementation uses a lot of advanced tools of terraform like for_each
, lookup
and flatten
. It is highly recommended to go through the docs to get a better understanding of how it works.
Setting up the yaml files
queues:
- name: example-queue-dlq
type: standard
access_policy: basic
env:
- production
- dev
- name: example-queue
type: standard
access_policy: basic
visibility_timeout_seconds: 10
dlq:
name: example-queue-dlq
max_recieve_count: 1
env:
- production
- dev
Reading the Yaml files
# This file decodes the yaml file and makes it into a terraform map than can then be used with for_each statements
locals {
sqs_queues = yamldecode(file("${path.root}/conf/sqs/${var.config_file}"))["queues"] # this converts all the queues into a list of maps
# i.e the list looks as follows
#[
# {
# "access_policy" = "basic"
# "env" = [
# "production",
# "dev",
# ]
# "name" = "example-queue-dlq"
# "type" = "standard"
# },
# ...
#]
# The following statement flattens the list of maps into a list of maps that is flattened and easier for us to use
sqs_standard_queues = flatten([for queue in local.sqs_queues :
{
"name" = "${terraform.workspace}-${queue.name}"
"type" = queue.type
"access_policy" = queue.access_policy
"dlq" = lookup(queue, "dlq", null)
"visibility_timeout_seconds" = lookup(queue, "visibility_timeout_seconds", 30)
}
if contains(queue.env, terraform.workspace)
])
# i.e this is what sqs_standard_queues looks like
# [
# {
# "access_policy" = "basic"
# "dlq" = null
# "name" = "production-example-queue-dlq"
# "type" = "standard"
# },
# {
# "access_policy" = "basic"
# "dlq" = {
# "max_recieve_count" = 1
# "name" = "production-example-queue-dlq"
# }
# "name" = "production-example-queue"
# "type" = "standard"
# "visibility_timeout_seconds" = 10
# },
}
yamldecode
— helps in formatting your yaml file into a map object that terraform can read from.
flatten
— helps in restructuring nested maps into a more readable map that is easier to access by terraform functions
Creating all resources
Based on the configurations above, we can now create n
sqs queues just by adding new configurations in the yaml
file. The following file helps in doing that, using for_each
### Manage standard queues and their access policies
resource "aws_sqs_queue" "sqs_standard_queues" {
for_each = {
for queue in local.sqs_standard_queues : queue.name => queue
if queue.dlq == null
}
name = each.value.name
visibility_timeout_seconds = each.value.visibility_timeout_seconds
}
resource "aws_sqs_queue_policy" "sqs_standard_queue_policies" {
for_each = {
for queue in local.sqs_standard_queues : queue.name => queue
if queue.dlq == null
}
queue_url = aws_sqs_queue.sqs_standard_queues[each.value.name].id
policy = file("${path.root}/conf/sqs-policy.json")
}
### Manage standard queues with DLQ enabled and their access poicies
resource "aws_sqs_queue" "sqs_dlq_enabled_standard_queues" {
for_each = {
for queue in local.sqs_standard_queues : queue.name => queue
if queue.dlq != null
}
name = each.value.name
visibility_timeout_seconds = each.value.visibility_timeout_seconds
redrive_policy = jsonencode({
deadLetterTargetArn = aws_sqs_queue.sqs_standard_queues["${terraform.workspace}-${each.value.dlq.name}"].arn
maxReceiveCount = each.value.dlq.max_recieve_count
})
}
resource "aws_sqs_queue_policy" "sqs_dlq_enabled_standard_queue_policies" {
for_each = {
for queue in local.sqs_standard_queues : queue.name => queue
if queue.dlq != null
}
queue_url = aws_sqs_queue.sqs_dlq_enabled_standard_queues[each.value.name].id
policy = file("${path.root}/conf/sqs-dlq-policy.json")
}
for queue in local.sqs_standard_queues : queue.name => queue
The above statement iterates through our list of flattened
queues and maps them to a key
value
pair. In our scenario the key
is the name of the queue, and the value
is the map object.
"production-example-queue-dlq": {
"access_policy": "basic"
"dlq": null
"name": "production-example-queue-dlq"
"type": "standard"
}
for_each
— Iterate through each key
in the map
generated above and creates a resource as follows in the plan.
aws_sqs_queue.sqs_standard_queues["production-example-queue-dlq"]
Note: The above statement is also how we need to reference the queue in a different resource
if
— Helps to condense the list based on meeting the criteria if the key dlq
exists or not.
each.value.*
— each references to the key
. value
references to the value
of the key and the *
can be any of the keys
that we set in our locals
.
Debugging Tips
Terraform has a lot of useful functions, but sometimes it becomes hard to debug
situations with complex maps. In order to debug you can use terraform
console
. This helps in calling your local
resources and seeing the map
.
Example in order to debug the above example.
terraform console
> local.sqs_queues # prints out the yaml file decoded
> local.sqs_standard_queues # prints out the flattened object
Conclusion
Using only Terraform limits us to writing configurations in .tfvars
files to abstract away complexity from our infrastructure users, which in turn introduces a burden on our users to understand how terraforms language works. By leveraging yaml for configuration as code, as our user interface, we empower our infrastructure users to easily create new resources and stacks using a language they are already familiar with.
This will allow the larger engineering team to bring up services quickly and with less wait times. We have already implemented this for our standalone AWS services successfully, and are currently in the process of migrating our more complex stacks like EKS clusters using Terraform + Yaml.
Special mention to Pratik Saha who had figured out how to convert yaml
files into Terraform objects.