Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stepfunctions: using itemProcessor with mode = DISTRIBUTED doesn't work out of the box due to permission error #28820

Open
akerra6993 opened this issue Jan 22, 2024 · 8 comments
Labels
@aws-cdk/aws-stepfunctions Related to AWS StepFunctions bug This issue is a bug. effort/medium Medium work item – several days of effort p3

Comments

@akerra6993
Copy link

Describe the bug

Deploying a map state in a state machine using distributed processing mode (and standard execution type for the child executions) causes an IAM permissions issue since the parent state machine role doesn't have permission to start executions on itself. Trying to grant permissions via stateMachine.grantStartExecution(stateMachine) causes a circular dependency.

Expected Behavior

When using distributed processing mode, necessary permissions should be generated by default.

Current Behavior

Start execution permission for the child executions is not granted to the parent state machine.

Reproduction Steps

const mapState = new Map(this, 'Map State', {
  itemsPath: JsonPath.stringAt('$...'),
  maxConcurrency: 100,
   parameters: {
    ...
   }
})
mapListings.itemProcessor(..., {
  executionType: ProcessorType.STANDARD,
  mode: ProcessorMode.DISTRIBUTED
})

Possible Solution

Automatically add the necessary IAM policy to the parent state machine's default role

Additional Information/Context

No response

CDK CLI Version

2.122.0 (build 7e77e02)

Framework Version

No response

Node.js Version

v18.16.1

OS

MacOS Sonoma 14.0 (M2 Pro)

Language

TypeScript

Language Version

No response

Other information

technically I am using vanilla JS CDK language but that's not an option in the language dropdown.

@akerra6993 akerra6993 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 22, 2024
@github-actions github-actions bot added the @aws-cdk/aws-stepfunctions Related to AWS StepFunctions label Jan 22, 2024
@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Jan 24, 2024
@pahud
Copy link
Contributor

pahud commented Jan 24, 2024

Thank you. Can you share the full error messages?

@akerra6993
Copy link
Author

Error contacting AWS Service. | Message from Service: User: arn:aws:sts::{my account id}:assumed-role/{the state machine default role} is not authorized to perform: states:StartExecution on resource: arn:aws:states:us-west-2:{my account id}:stateMachine:{state machine id} because no identity-based policy allows the states:StartExecution action (Service: Sfn, Status Code: 400, Request ID: 5891e970-2bf1-4e15-9b06-f6f631b010b5)

@rogerchi
Copy link
Contributor

rogerchi commented Feb 9, 2024

This should work in the meantime:

    const policy = new Policy(this, 'sfn-map-policy', {
      document: new PolicyDocument({
        statements: [new PolicyStatement({ resources: [machine.stateMachineArn], actions: ['states:StartExecution'] })],
      }),
    })

    policy.attachToRole(machine.role)

@abdelnn
Copy link
Contributor

abdelnn commented Feb 15, 2024

The new Distributed Map construct should also work - #28821

@anentropic
Copy link

I have this issue and am using a DistributedMap state

I attempted this:

        self.state_machine.add_to_role_policy(
            iam.PolicyStatement(
                actions=["states:StartExecution"],
                resources=[self.state_machine.state_machine_arn],
            ),
        )

But I get FAILED, Circular dependency between resources: [StateMachineB23A416F, StateMachineRoleDefaultPolicyD3EF01D8]

@anentropic
Copy link

anentropic commented Apr 18, 2024

...but the form given by @rogerchi does work instead

        policy = iam.Policy(
            self,
            "sfn-map-policy",
            document=iam.PolicyDocument(
                statements=[
                    iam.PolicyStatement(
                        resources=[self.state_machine.state_machine_arn],
                        actions=["states:StartExecution"],
                    ),
                    iam.PolicyStatement(
                        resources=[
                            f"arn:aws:states:*:{Aws.ACCOUNT_ID}:execution:{self.state_machine.state_machine_name}/*"
                        ],
                        actions=["states:RedriveExecution"],
                    ),
                ],
            ),
        )
        policy.attach_to_role(self.state_machine.role)

I had to add another missing permission, to allow re-driving failed distributed map run. Maybe there are other missing perms that I haven't run into yet.

Anyway, the point is that DistributedMap state has not set up the permissions like it ought to

@bilalq
Copy link

bilalq commented May 10, 2024

Relevant docs: https://docs.aws.amazon.com/step-functions/latest/dg/iam-policies-eg-dist-map.html

Seems like at a minimum, you want:

  • states:StartExecution on the state machine ARN
  • states:StopExecution and states:DescribeExecution on execution ARNs (i.e., ${stateMachineArn}:*)
  • states:RedriveExecution on labeled execution ARNs (i.e., ${stateMachineArn}/*:*)

Also, if you have a resultWriter S3 bucket, you'll need all the various permissions mentioned in the doc above for the bucket.

@bilalq
Copy link

bilalq commented May 10, 2024

I see that the PR that added the DistributedMap construct did seem to set permissions other than RedriveExecution in the bind method of the state graph packages/aws-cdk-lib/aws-stepfunctions/lib/state-graph.ts:

  /**
   * Binds this StateGraph to the StateMachine it defines and updates state machine permissions
   */
  public bind(stateMachine: StateMachine) {
    for (const state of this.allStates) {
      if (DistributedMap.isDistributedMap(state)) {
        stateMachine.role.attachInlinePolicy(new iam.Policy(stateMachine, 'DistributedMapPolicy', {
          document: new iam.PolicyDocument({
            statements: [
              new iam.PolicyStatement({
                actions: ['states:StartExecution'],
                resources: [stateMachine.stateMachineArn],
              }),
              new iam.PolicyStatement({
                actions: ['states:DescribeExecution', 'states:StopExecution'],
                resources: [`${stateMachine.stateMachineArn}:*`],
              }),
            ],
          }),
        }));

        break;
      }
    }
  }

But I'm still hitting errors like the following at runtime:

Error contacting AWS Service. | Message from Service: User: arn:aws:sts::123456789012:assumed-role/ExampleStateMachineRole-w9L0WPmFgXQU/KFFycMGpPUoVXQJNEKPZfzjTqKAbOZlA is not authorized to perform: states:StartExecution on resource: arn:aws:states:us-east-2:123456789012:stateMachine:Example because no identity-based policy allows the states:StartExecution action (Service: Sfn, Status Code: 400, Request ID: efa7d0da-0d5f-4359-bd3c-844ede092da5)

I have no resultWriter or itemReader in my state task here. Would that maybe affect things?

cc @abdelnn

@pahud pahud added p3 and removed p2 labels Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-stepfunctions Related to AWS StepFunctions bug This issue is a bug. effort/medium Medium work item – several days of effort p3
Projects
None yet
Development

No branches or pull requests

6 participants