The cloud is a wondrous technology for building scalable apps with easily provisioned access to elastic compute resources, extensive logging, and a flexible, pay-as-you-go model. The concept has made it possible for millions of devs worldwide to build public SaaS offerings of their own...but getting it to work is only half of your business. Keeping your users' data safe is the other.
Unsurprisingly, users - i.e. your customers - are what set a Business™ apart from a Fun Side Project™. If you're not educated on the security vulnerabilities and best practices associated with your cloud provider, you might end up with the best way to lose said customers: data breaches.
Granted, you're probably not reading this if you're a bank, but lower stakes is not an excuse for neglecting security.
Klotho is a developer toolbox that helps you with both halves of that equation, by:
a. abstracting away the complexity of building cloud native apps (learn more about that here) by generating ready-to-deploy Infrastructure-as-Code templates in Pulumi or Terraform for you based on annotations in your application code, and
b. automatically configuring IAM roles and policies for said apps with a focus on the principle of least privilege.
Generating IaC from application code is pretty neat! But...least privilege? There's that phrase again. It clearly seems part of some notion of AWS Best Practices 101, but why? How do we achieve it? How does Klotho help? Let's answer these.
IAM and the Principle of Least Privilege
"IAM Who I say I am."
AWS Identity and Access Management (IAM) is a web service which allows users to assign granular AWS permissions to people and applications. You use IAM to make sure someone is who they say they are (authentication), and control their permissions to use the resources they want to use (authorization).
TL;DR: Instead of directly allowing users to access resources, the safest way to provision access in AWS is to use IAM Roles and Policies.
- A Role is a definition of a trusted entity and the things that it can do. Assuming a Role is like putting on a costume and gaining all the perks associated with it.
- Roles aren't permanently assigned to a human with a password but are only assumed (via an
AssumeRole
call) for short durations by services (like a lambda function). The service user then gets temporary security credentials to do what it needs to do for its session (except getting the actual session token). - For authorization, a Role has access granted to it by Policies - permissions to access other AWS resources (like S3 buckets) - that are attached to it.
Putting it all together: want your app that's deployed within an EC2 instance to read from S3 buckets? The app needs to assume a role that has a policy attached to it that allows s3:GetObject
API calls.
AWS IAM policies are just JSON data.
There are 200+ AWS services and 7,000+ different AWS API calls, with some seemingly straightforward functions unintuitively needing access to all kinds of other AWS services, with each having its own quirks. Just have a look at the AWS IAM Docs on Actions, Resources, and Condition Keys for example.
Given all that, formulating an IAM that's secure and permissive enough is difficult. On the other hand, writing policies that are unsafe but let your devs prototype rapidly - like that IAM policy above - is incredibly easy. So is handing out privileged roles/root access like candy.
Which makes this the perfect segue into...
Privilege Creep
Too often, access to these root user accounts and privileged IAM roles is provisioned far more broadly than is actually needed. Even worse, these privileges are often one-off uses, or not used at all but left in anyway, either because they are forgotten about over years of development or as a "better to have it and not need it" mentality.
This is how your organization develops an attack surface as large as your app itself, exposing you to unnecessary and wholly avoidable risk (misuse of these privileges is the cause of 82% of all breaches, according to Verizon's 2022 Data Breach Investigations Report).
But privileges being easy to over-provision and mismanage is only half the story of why IAM is so difficult. The other half is difficulty gauging how permissive these policies need to be, to begin with. Too narrow, and your apps will break due to insufficient privileges. Too broad, and you speed up development, but risk having an SSRF breach turn into compromised customer data, general downtime, and yes, those $80MM fines.
💡 The Capital One breach of 2019 is a cautionary tale for what happens if your permissions are too broad. This was when an attacker used a compromised server (via Server-Side Request Forgery), obtained AWS credentials via EC2 metadata, and discovered that the server had excessive privileges to access S3 buckets, some of which contained customer data.
When designing systems that need a balance of security and development velocity, it's critical to look to the principle of least privilege.
Least Privilege - The What, The Why, and The How.
The What - A reasonably safe, balanced default.
The Principle of Least Privilege (PoLP) has its roots in the US Department of Defense's paper on "Department of Defense Trusted Computer System Evaluation Criteria" in 1985, and was basically an extension of the "information on a need-to-know basis only" policy to cover classified data in the digital age.
The TL;DR of it is that you want to give users and systems access to only the resources that they need for their use case, at access levels appropriate to their use case. For example, an IAM policy having permissions to access a specific S3 bucket (the resource), with "Read" actions only (the access level).
Perfect? No. But it's always safer to start with a minimally permissioned role and add permissions as needed, than it is to start with root access and remove permissions, making breaches one oversight or misplaced secret key away.
The Why - They limit Blast Radius.
In Infrastructure-as-Code form, a potentially dangerous, too-permissive IAM policy would look something like this:
This will allow the IAM principal (a role or user) to run GetObject
from any S3 bucket in the AWS account. This overly permissive access to S3 buckets - i.e., a wide blast radius - together with SSRF caused the Capital One breach.
How could we remedy this? Well, we could rewrite this IAM policy by hand in a least-privileges way.
Now, we've restricted the S3:GetObject
action to the specific S3 bucket that your role needs access to, meaning even if someone were to gain unauthorized entry, they wouldn't have the access to do any damage outside of this specific S3 bucket. You've limited your blast radius, even if the bomb did go off.
The How - It's difficult as hell.
Writing secure IAM policies by hand that follow the principle of least privilege is slow, tedious, error-prone, and difficult to audit. Real-world applications are rarely as simple as the example above. Even with Infrastructure-as-code tools like Pulumi, you'd probably have to do this by hand for dozens of IAM actions across multiple services and ARNs.
Most of the time, you'll end up doing one (or all) of these, instead:
- Create all new AWS Users/Organizations to give each Dev team their own sandbox account with full admin rights instead of bothering with formulating granulated least privileged IAM. Hey, a poorly configured role/policy is a much lower risk in a sandbox than in production, right?
- Insistence on security is holding back development, so just slap on AWS Managed Policies to your EC2 Instances so IAM stops breaking your app, telling yourself you'll write the actual policy some other day (you won't).
- Give up entirely, make best guesses as to the AWS API calls you perform, and use wildcards for them so you can get on with building cool things instead.
To be clear, none of these approaches are ideal, but the industry as it stands appraises engineers only in terms of velocity and the number of features shipped, not security. That $80MM fine Capital One paid might very well have been less than the cost of "doing things the right way". Until that changes, managers will incentivize whichever approach is faster.
So...don't do any of that. Use Klotho instead.
Least Privilege IAM in Seconds with Klotho
Klotho takes the opinionated approach that developers shouldn't have to understand the complexities of AWS IAM, or worry about writing Infrastructure-as-Code that deftly balances security and developer velocity.
In fact, if the app needs a KV store for an in-memory map, developers should literally only have to use an annotation in their code to signal their intent for it...
Add the persist annotation in your application code for the said map, like so.
...and automation should decipher that intent from just this application code alone, analyzing it to mean:
"Hi. I need a DynamoDB instance for some books I need to store, with a table named 'books', and secure CRUD access to the resource at arn:aws:dynamodb:us-west-2:123456789012:table/books, and its index, stream, backup, and export. Thanks!"
That's all. When you compile this code with Klotho, it generates Infrastructure-as-code from it, and automatically formulates IAM policies for you in a least-privileged manner.
Klotho has inherent advantages over other tools for managing IAM, because...
- Instead of letting things fail on the first run and then inspecting CloudWatch logs after the fact to decide which permissions were needed, it can analyze your annotated application code directly to figure out what you meant, and then dynamically formulate a secure, best-practices, least privileges IAM policy for your cloud native application based on that analysis, when it generates Pulumi/Terraform code.
- As your app grows, and your service requirements - and permissions associated with them - inevitably change, Klotho's analysis of your app code will auto-adjust cloud dependencies accordingly and apply least privileges to all of them.
Continuing with our example, then:
Klotho makes use of Pulumi parameters to dynamically create resources based on your code - including roles, policies, and permissions that make sure principals only have access to what they need - dynamically filling in ARNs to allow your application to do CRUD operations in those specific tables, indexes, streams, backups, and, exports...
Still too broad? You can go into index.ts (if using Pulumi) in the ./compiled folder and make further changes by hand for peace of mind. Klotho hides nothing from you.
...Instead of the tedious task of writing this same IAM policy by hand. For each ARN, each table.
Can Klotho Replace Cloud Security Tools?
Using Klotho to provision infra using IaC and generating balanced IAM defaults, are both best practices re: security for AWS, but to fully get there you need to understand that Klotho, while helpful, cannot replace dedicated tools for cloud security. You need to know how best to shore up that gap in your organization.
So, what exactly does Klotho do and what does it not do? Let's find out.
What Klotho Does
- With Klotho, developers only have to determine the resources that they need to access, and Klotho abstracts the complexity of IAM policies away from their development processes, generating reasonably safe defaults while creating a ready-to-deploy cloud-native version of the local app.
- Operators no longer have to enforce a too-strict IAM policy by default without talking to the application or product teams first, or diagnose pages of Cloudwatch logs later to see which permissions errored out, go through docs, talk to all the devs - rinse and repeat.
- Klotho will automatically make changes to the IaC and your permissions based on how your app changes as it grows, and auto-generate least privileged IAM policies every time.
- Klotho only creates IAM roles and permissions, not new users. You stay flexible this way because any future user in your organization - such as third-party auditors - can be granted ad-hoc, temporary access via roles without having to set up new root credentials for them.
- You nullify one attack vector entirely because with Klotho's IaC and auto-generated IAM, you don't ever have to embed AWS access/secret keys in your application code (where they can be difficult to rotate and
.gitignore
goof-ups might expose them to malicious actors).
Before Klotho, it could take hours to craft a secure IAM policy that didn't also hold back development. Now, you can have a working, secure cloud native version of your local application ready to go in seconds.
What Klotho Does Not
However, it is critical to understand that Klotho only allows you to convert your local application to a cloud-native version with a secure, basic set of IAM permissions. That's it. It cannot help you regularly audit and profile your access, and prune overly-permissive policies that are barely used.
The ideal AWS workflow, then, should be to use Klotho to generate IaC (in a CI/CD pipeline) and least privileged IAM based on your app code...and then use a mature, battle-tested tool like Netflix's RepoKid to conduct periodic access rights reviews. RepoKid leverages AWS Access Advisor under the hood to determine how many AWS services and resources an IAM Principal has access to, how many of them it has used in the last X days/months, and revoke out-of-date policies automatically.
In Summary
Least-privileged IAM policies have long-term benefits to your organization's security and resource management...but they've categorically been a pain to get right without also slowing down developer velocity. The AWS permission set is incredibly broad, documentation is vague, and to make matters worse - their own examples often have too-broad IAM policies.
It's a tall task to have every developer you hire engage with AWS IAM in addition to application code, and continually reverse engineer your apps to be secure. This is where Klotho can help you move fast and save time and money, especially if your competitors have more risk appetite.
Using Klotho's automation to provision cloud infra via Infrastructure-as-Code (Klotho can produce IaC for Pulumi, Terraform, or aws-cdk), and auto-generating safe IAM defaults, you can go cloud-native with your app in seconds - while only granting users and processes the minimum level of access necessary to perform their tasks, reducing your attack surface, and minimizing blast radius should the unthinkable happen.