Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor out resources requiring IAM write permissions #36

Open
jaytmiller opened this issue May 21, 2020 · 11 comments
Open

Factor out resources requiring IAM write permissions #36

jaytmiller opened this issue May 21, 2020 · 11 comments

Comments

@jaytmiller
Copy link

In order to meet the requirements of our IT department, we need to avoid using IAM write permissions and work with pre-existing IAM resources they create for us.

Can you recommend an approach for offloading IAM resource creation? (I'm attempting to solve this and make a PR.)


Two approaches which come to mind are:

  1. Add a flag like var.create_iam_resources and make all the IAM resources in terraform-deploy/aws conditional on it.

  2. Factor out IAM resources to a separate Terraform and then modify terrarform-deploy/aws to refer to those resources.

In addition to the hubploy IAM resources, IAM also ripples through autoscaling, ecr, and oicd where some resources require the cluster to exist already. This causes issues for a simple create-iam-first-create-cluster-next work flow since IAM resources need to be created both before and after the cluster.

@jaytmiller
Copy link
Author

I forgot to mention the IAM impact on EKS, which includes the need to supply a cluster role and work instance profile / worker role if they're not created automatically by terraform-deploy/aws.

@jaytmiller
Copy link
Author

On other idea I had for dealing with IAM perms would be to share Terraform .tfstate with our IT department using remote state. Our IT would initially create the entire deployment including IAM. Our development group would then be permitted to perform whatever Terraform actions we have permissions for, relying on Terraform not to repeat the IAM work IT already did. Does this sound feasible?

@yuvipanda
Copy link
Member

There's also problems with scoping. For example, the cluster autoscaler role only needs access to resources created by a specific EKS cluster. This keeps roles as tightly scoped as possible. However, now we end up with a loop:

  1. IT department: Create IAM roles required to run the terraform code, minus any roles the terraform code itself creates
  2. Team: Run terraform code, creating all resources (EKS clusters, EFS, etc), minus any IAM roles what so ever. I actually am not sure this is possible?
  3. IT department: Run terraform(?) code to create the IAM roles needed for (2)
  4. Team: Find the (randomly generated) role names from (3), fill that into terraform .tfvars files again
  5. Team: Run terraform code to provision infrastructure.

Steps 2-5 will have to be repeated each time there's any change in our code or a module our code depends on.

I'm not at all sure how we can do this in an automated fashion.

@yuvipanda
Copy link
Member

On other idea I had for dealing with IAM perms would be to share Terraform .tfstate with our IT department using remote state. Our IT would initially create the entire deployment including IAM. Our development group would then be permitted to perform whatever Terraform actions we have permissions for, relying on Terraform not to repeat the IAM work IT already did. Does this sound feasible?

That would be great, @jaytmiller. Unfortunately, I don't know enough about terraform to answer that. Would you be willing to research this and see where it goes?

@jaytmiller
Copy link
Author

Thanks @yuvipanda for drawing attention to the loop. I also noticed an issue with IRSA/OIDC in addition to autoscaling. Definitely agree the loop is a mess and probably fatal for this refactoring approach.

@yuvipanda
Copy link
Member

One way forward is:

  1. IT runs the terraform code in aws-creds to create roles with enough permissions to run the terraform code in aws
  2. We can run code in aws in an automated fashion without IT help
  3. IT is only engaged when a change in aws-creds is needed

This means you don't need unlimited IAM permissions, but much better scoped ones. However, it might be possible to escalate to full IAM permissions from this, but I'm not sure.

@jaytmiller
Copy link
Author

Limiting scope of IAM permissions more is not something we've considered yet; so far our approach for setting minimum perms has just been to factor out IAM write operations entirely and assume IT would create them all. I'll make a note of this approach as another possibility in our discussion with IT.

@mfox22
Copy link

mfox22 commented May 29, 2020

@yuvipanda this approach looks interesting. What is your concern exactly as to how it might be possible to escalate to full IAM with this method?

@mfox22
Copy link

mfox22 commented Jun 2, 2020

Considering the case where no IAM write permissions are given by IT some more. One point to mention is that we can have sandbox accounts where devs given full IAM perms. Which makes me think it would be possible for devs to deploy with full perms to a sandbox account where the IAM roles created are then exported to a CloudFormation template to be applied to Prod once IT Security has reviewed the requested IAM role changes.

It was not clear to me from Yuvi's earlier comment if Teraform regenerates all IAM roles with new randomly decorated names for each deploy run? Can we consider making those names more static?

@yuvipanda
Copy link
Member

@mfox22 so my experience with AWS is limited, so please take everything I say with lots of salt :)

With the goal being fully automated deployment, here are two sets of IAM rights needed:

  1. One set of IAM rights required to operate the hub & deploy to it in an automated fashion. This is in https://github.com/pangeo-data/terraform-deploy/blob/master/aws/iam.tf, and is intertwined with other parts of the infrastructure - like the EKS Cluster Name used to give the hubploy role just enough righst to talk to that cluster, but no other. Or the autoscaler setup which gives the autoscaler role just enough rights to do its work just on this EKS cluster, and makes it available for the autoscaler pod running on the cluster.

    This set of IAM roles are intertwined with the rest of the infrastructure, and might change upstream outside of the control of the space telescope team. For example, we might find a way to restrict IAM role permissions further. Or maybe a feature gets added and that has a slightly different IAM configuration. This makes separating the iam stuff into a fully separate setup difficult, and I don't know enough terraform / AWS to know how to do that properly.

  2. The AWS creds required to run the terraform module itself. These are in https://github.com/pangeo-data/terraform-deploy/tree/master/aws-creds, and is reasonably static. They create enough creds to let someone (or something) run terraform apply. Changes here should be rarer, and more importantly aren't intertwined with any of the infrastructure.

    So in a system where IT running in a different controller account is responsible for IAM rules, I had thought they would manage setting up these creds. This will produce a role with arbitrary name, that can then be used in the account to run terraform apply. This would delieate responsibilities nicely - changes under aws-creds/ require IT co-ordination, changes inside aws/ do not.

The list of permissions granted in aws-creds/ might need some tweaking and narrowing. Depending on the threat model, this might not be acceptable regardless.

I hope this makes a little more sense, @mfox22. However, I want to re-iterate that I don't consider myself an AWS or terraform expert, so take this with salt :) There's also no reason this can't be done - I am just worried there's no way this can be done in an automated, upstream friendly way.

@mfox22
Copy link

mfox22 commented Jun 3, 2020

Yes that makes sense @yuvipanda, thank you for laying this out for me in more detail.
I share your concern about maintenance and automation of this deployment process on our end. Thanks for your help as we evaluate options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants