-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(route53): new Route53 CrossAccountRecordSet construct #31281
Conversation
74fbbd8
to
c96be8d
Compare
if (props.SetIdentifier) { | ||
listResourceRecordSetArgs.StartRecordIdentifier = props.SetIdentifier; | ||
} | ||
defaultLog('Searching for recordsets with args: ', listResourceRecordSetArgs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module has a lot of log statements which were very beneficial when testing the construct, and troubleshooting in the CloudWatch log groups. I see some custom resources that have log statements, but most don't. I'm fine with removing the log statements if required.
/** | ||
* Return the region that hosts the Route53 endpoint | ||
* | ||
* Route53 is a partitional service: the control plane lives in one particular region, | ||
* which is different for every partition. | ||
* | ||
* The SDK knows how to convert a "target region" to a "route53 endpoint", which | ||
* equates to a (potentially different) region. However, when we use STS | ||
* AssumeRole credentials, we must grab credentials that will work in that | ||
* region. | ||
* | ||
* By default, STS AssumeRole will call the STS endpoint for the same region | ||
* as the Lambda runs in. Normally, this is all good. However, when the AssumeRole | ||
* is used to assume a role in a different account A, the AssumeRole will fail if the | ||
* Lambda is executing in an an opt-in region R to which account A has not been opted in. | ||
* | ||
* To solve this, we will always AssumeRole in the same region as the Route53 call will | ||
* resolve to. | ||
*/ | ||
function route53Region(region: string) { | ||
const partitions = { | ||
'cn': 'cn-northwest-1', | ||
'us-gov': 'us-gov-west-1', | ||
'us-iso': 'us-iso-east-1', | ||
'us-isob': 'us-isob-east-1', | ||
'eu-isoe': 'eu-isoe-west-1', | ||
'us-isof': 'us-isof-south-1', | ||
}; | ||
|
||
for (const [prefix, mainRegion] of Object.entries(partitions)) { | ||
if (region.startsWith(`${prefix}-`)) { | ||
return mainRegion; | ||
} | ||
} | ||
|
||
// Default for commercial partition | ||
return 'us-east-1'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved into sts-util.ts
so the logic can be reused. I didn't want to take any chances and reintroduce problems solved in #26593
async function submitResponse(status: 'SUCCESS' | 'FAILED', event: Response) { | ||
export async function submitResponse(status: 'SUCCESS' | 'FAILED', event: Response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make submitResponse
public so we can reuse it in the CrossAccountRecordSet custom resource handler.
dd9f8ed
to
4c22422
Compare
Need to figure out the correct way to get the integration tests to pass in the CodeBuild job, then the change will be ready for review. Test snapshots seem to have the account ID magically referenced as 12345678. It's very confusing where this value is coming from. Finally found it it's in the runner-base.ts file. Still not clear how anyone correctly checks in integ tests with a second AWS account, but still experimenting.. Looks like I have to modify the integ test so that the CodeBuild job doesn't try to replace these resources, but that requires changing the test quite a bit which is unfortunate:
I rewrote the test so that there are no WILL_REPLACE errors, but the build still fails saying that 18 out of 937 integ tests failed and this change only adds two of those integ tests.. I'm not sure if it's just expected that the build fails when new integ tests are added or not, I've been digging for past commits to see if that's true or not but haven't found much luck. Yeah, not super clear if there's anything I'm supposed to do to fix this build issue. I'll publish the PR and push any fixes for snapshot weirdness if requested. So... I definitely created a hack that will make it so the snapshots succeed:
This feels really clunky, and I'm not sure if this is what everyone else has been doing, but it should work... Well, this definitely worked! except 16 integ tests unrelated to this change are still failing:
This is definitely confusing as to why the build fails on these tests... Maybe there is something to these 16 failing tests. ❯ yarn integ aws-ec2/test/integ.launch-template.js
yarn run v1.22.19
$ integ-runner --language javascript aws-ec2/test/integ.launch-template.js
Verifying integration test snapshots...
CHANGED aws-ec2/test/integ.launch-template 0.821s
Resources
[~] AWS::Lambda::Function CustomVpcRestrictDefaultSGCustomResourceProviderHandlerDC833E5E
└─ [~] Code
└─ [~] .S3Key:
├─ [-] bde7b5c89cb43285f884c94f0b9e17cdb0f5eb5345005114dd60342e0b8a85a1.zip
└─ [+] 4fabb91af38bfa4b232ef9e8db9a055aa28ba261170742fddb4a53ebe05f6393.zip
Snapshot Results:
Tests: 1 failed, 1 total My changes seemingly have no reason to cause these changes, but I assume there is a valid reason somewhere. huh.. I suspect the cause is most likely the single line change in I can either commit the new hash of all the 16 integ tests, or rewrite the functionality. It probably makes sense to just commit the hashes. Actually I was too annoyed and through the almighty power of ChatGPT I cooked up an absolutely cursed script that updated the S3 key in all the complaining tests: #!/bin/bash
# fix-s3-keys.sh
failingTests=(
aws-cloudfront/test/integ.cloudfront-cross-region-cert.js
aws-cloudfront/test/integ.distribution-lambda-cross-region.js
aws-ec2/test/integ.launch-template.js
aws-ec2/test/integ.vpc-flow-logs-kinesis.js
aws-ec2/test/integ.vpc-restrict-default-sg.js
aws-elasticloadbalancingv2/test/integ.alb.oidc.js
aws-iam/test/integ.oidc-provider.js
aws-iam/test/integ.condition-with-ref.js
aws-route53/test/integ.cross-account-zone-delegation.js
aws-route53/test/integ.delete-existing-record-set.js
aws-route53/test/integ.rename-cross-account-zone-delegation.js
aws-route53-patterns/test/integ.hosted-redirect.js
aws-s3/test/integ.bucket-auto-delete-objects.js
core/test/integ.cross-region-references.js
custom-resources/test/provider-framework/integ.provider-with-waiter-state-machine.js
triggers/test/integ.triggers.js
)
for failingTest in "${failingTests[@]}"; do
output=$(yarn integ "${failingTest}" 2>&1 >/dev/null)
# Extract the old and new keys
oldKey=$(echo "$output" | grep -oE '\[-\] [a-f0-9]{64}' | awk '{print $2}')
newKey=$(echo "$output" | grep -oE '\[\+\] [a-f0-9]{64}' | awk '{print $2}')
echo "Replacing $oldKey with $newKey for $failingTest"
read -p "Are you sure you want to continue? <y/N> " prompt
if [[ $prompt == "y" || $prompt == "Y" || $prompt == "yes" || $prompt == "Yes" ]]
then
find "${failingTest}.snapshot" -type f -print0 | xargs -0 sed -i '' -e "s/${oldKey}/${newKey}/g"
continue
fi
# Handle multiple keys
oldKeys=($(echo "$output" | grep -oE '\[-\] [a-f0-9]{64}' | awk '{print $2}'))
newKeys=($(echo "$output" | grep -oE '\[\+\] [a-f0-9]{64}' | awk '{print $2}'))
alternateOldKey="${oldKeys[0]}"
alternateNewKey="${newKeys[0]}"
echo "Replacing $alternateOldKey with $alternateNewKey for $failingTest"
read -p "Are you sure you want to continue? <y/N> " prompt
if [[ $prompt == "y" || $prompt == "Y" || $prompt == "yes" || $prompt == "Yes" ]]
then
find "${failingTest}.snapshot" -type f -print0 | xargs -0 sed -i '' -e "s/${alternateOldKey}/${alternateNewKey}/g"
continue
else
echo "${failingTest} failed to parse yarn integ"
exit 1
fi
done Hopefully this is the last of the build issues.. Well the build still tries changing the S3 keys on the build job even though local tests now report success... The fix is probably something silly, will try and look more tomorrow. It looks like modifying @aws-cdk/custom-resource-handlers/lib/custom-resources-framework/config.ts forces you to rerun the integration tests for every test within that file such as in #31166 |
33e2c1b
to
90739a8
Compare
384b1ac
to
7072fdd
Compare
6a75a29
to
2f6ac12
Compare
A critical piece of information has to be missing from INTEGRATION_TESTS.md; I have now spent twice as long trying to successfully update snapshots as I did as making the change. Catch 22 Integ TestsInteg tests can run with 3 different environment variables:
When the CodeBuild project checks all the snapshots, it won't run with your environment variable values. It will either use the default values set in the integ runner, or the default values set in the integ test. These are dummy values that you can't use when you rerun the snapshot tests. If you did try and use them, the tests would surely fail. So you fundamentally have to regenerate the snapshots with valid environment variables that the CodeBuild project will never have access to. Your snapshot, and the CodeBuild snapshot will never align. So how did past snapshots ever get generated? If you inspect a lot of the existing tests, you can see that the snapshot uses the AWS account It's almost as if people are running the integ tests with real environment values, but the snapshot is saved with the fake values. I have not been able to figure out the incantation to make that spell work. What's going wrong?I'm torn between two conclusions which dramatically impacts my mental decision tree going forward. One of the two scenarios is true:
Common sense tells me scenario 2 can't be true, because that would mean there has been lots of past commits where people have been successfully following a magical undocumented process. However I'm quickly running out of other conclusions to reach, since I have tried everything under the sun. I would love to be missing something, but I've thought about this from every possible angle and I'm starting to lose hope. |
I thought I almost found the secret sauce. Perhaps if the explicit @@ -158,9 +156,12 @@ const crossAccountRoleStack = new CrossAccountRoleStack(app, 'CrossAccountRoleSt
authorizedAccounts: [appAccountId],
changeResourceRecordSetsRecordTypes: ['A'],
changeResourceRecordSetsNormalizedRecordNames: [`test.${hostedZoneName}`],
- env: {
- account: apexAccountId,
- region: 'us-east-1',
+ cdkCommandOptions: {
+ deploy: {
+ args: {
+ profile: 'cdk_integ_account',
+ },
+ },
},
});
@@ -170,14 +171,17 @@ const appStack = new AppStack(app, 'AppStack', {
apexAccountId: crossAccountRoleStack.account,
crossAccountRoleName: crossAccountRoleName,
recordName: 'test',
- env: {
- account: appAccountId,
- region: 'us-west-1',
+ cdkCommandOptions: {
+ deploy: {
+ args: {
+ profile: 'cdk_integ_cross_account',
+ },
+ },
},
}); This unfortunately did not work, and the generated snapshot still contained real values. |
2ababcf
to
1ad78d5
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
This PR has been in the BUILD FAILING state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week. |
This PR has been deemed to be abandoned, and will be automatically closed. Please create a new PR for these changes if you think this decision has been made in error. |
Comments on closed issues and PRs are hard for our team to see. |
Issue
Closes #15213
Reason for this change
Deploying your APEX Route53 HostedZone (example.com) to a designated AWS account, and delegating child HostedZones (us-east-1.prod.example.com, us-east-2.prod.example.com) to other AWS accounts is a common setup. Managing this setup through CDK becomes impossible once a child account contains RecordSet information that needs to be created in the APEX zone. This scenario is most common when you want to create a RecordSet at the root of your domain that contains a non-simple routing policy, however the information needed to create those RecordSets is distributed between multiple child CloudFormation stacks.
This
CrossAccountRecordSet
construct bridges this gap by creating the RecordSet in the APEX HostedZone, and it only needs the following props:crossAccountRole: iam.IRole
that the custom resource will assume which authorizes the Route53 changes in the APEX HostedZone.This issue is quite popular, as #15213 has 36 reactions, and there are already a couple constructs addressing this on ConstructHub:
This change was created because I believe this functionality should exist as a core part of the CDK. As for using this implementation over the ConstructHub implementations, I think the implementations are too simple for how many rough edges the Route53 ChangeResourceRecordSets API has, which is explained further in the change. I also think the
CrossAccountRecordSet
construct interface is intuitive, and easy for existing users to adopt.This change doesn't solve all CrossAccount DNS related problems. A specific pain for me is that the CertificateValidation doesn't support cross account updates (see #8934), which diminishes the impact of this change. Still, this change is already quite large and those problems can be addressed in later changes. I first wanted to confirm this change moves in a direction the CDK team finds acceptable considering the DnsValidatedCertificate deprecation has so much back and forth dialogue around it.
Side note: The CloudFormation AWS::CertificateManager::Certificate should make the CNAME RecordName and RecordValue a Return Value accessible through Fn::GetAtt. You could then pass those values to this
CrossAccountRecordSet
construct to enable cross account DNS validation for your ACM certificate!...One problem at a time.
Key Design Decisions
I spent a significant amount of time debating back and forth between the construct accepting
RecordSetProps
versus accepting an array ofRecordSetProps[]
which would fully address the concerns of aws_route53: Allow updates to existing records in a single operation #26754. Theaws-route53/README.md
update defends why this decision was made.The
@aws-cdk/custom-resource-handlers/lib/aws-route53/cross-account-record-set-handler/index.ts
custom resource handler may appear to be overly complicated because the implementation performs DELETE/CREATE ChangeBatches, rather than an UPSERT. I faced so much pain when testing various RecordSet changes relying on only Route53 UPSERT, and I firmly believe this will result in a better user experience for the price of a more complicated implementation.The
@aws-cdk/custom-resource-handlers/lib/aws-route53/cross-account-record-set-handler/index.ts
returns its own PhysicalResourceId, which requires sending a Custom Resource Response Object. ThePhysicalResourceId
class explains why this was necessary. Route53 uniquely identifies a RecordSet in a non obvious way, and this implementation 🪄 just does the right thing 🪄 .Up until the very end, the
CrossAccountRecordSet
construct supported anevaluateTargetHealth
prop, but I decided to remove it because the RecordSetProps doesn't support that prop, and I only wanted this change to match the features for what already exists (See feat(route53): added EvaluateTargetHealth to Route53 Alias targets (#9481) #30664). There are a few parameters to the ChangeResourceRecordSet API that aren't available in the Route53 CDK package, and should probably be addressed in a separate change. I'm most interested in figuring out a sane way to configure health checks, but looks like this is an open issue that is being worked on ([route53] Support for health checks #9481 feat(route53): added L2 construct for Route53's health checks #30739).By the time I was mostly done with the implementation, I reread the NEW_CONSTRUCTS_GUIDE and realized this construct should probably go into an alpha package. If reviewers want this to be made an alpha package, I'll do the work of refactoring the code, but I don't want to spend more time on this unless I know the changes would be merged.
Description of changes
CrossAccountRecordSet
construct that creates Route53 RecordSet resources in a HostedZone that exists in another account.route53Region
from@aws-cdk/custom-resource-handlers/lib/aws-route53/cross-account-zone-delegation-handler/index.ts
->@aws-cdk/custom-resource-handlers/lib/aws-route53/sts-util.ts
.validateRecordSetProps
fromaws-cdk-lib/aws-route53/lib/record-set.ts
->aws-cdk-lib/aws-route53/lib/util.ts
.submitResponse
was made a public export frompackages/@aws-cdk/custom-resource-handlers/lib/core/nodejs-entrypoint-handler/index.ts
This seemed like a safe change to me, and it would have been a waste to rewrite this functionality in the custom handler.Description of how you validated changes
Personal Issues
This may be an issue on my part, but I couldn't find any good documentation on creating and validating
.lit.ts
files. I basically had to copy/pastepackages/aws-cdk-lib/aws-route53/test/integ.cross-account-record-set-simple-example.lit.ts
into the integ test atpackages/@aws-cdk-testing/framework-integ/test/aws-route53/test/integ.cross-account-record-set-simple-example.lit.ts
. One file purely exists for README documentation, and the other exists as an IntegTest. From what I can tell, past changes are basically duplicating these files as well. Hopefully I'm missing something.In order to get Istanbul to report my code coverage correctly, I had to make the following tsconfig.json changes:
npx lerna run build
never compiled the files like I thought it would, andyarn build
was inconsistent. I would frequently deploy changes and be confused as to why it didn't work, only to realize the build commands didn't actually work. I finally realized I should useyarn watch
as that was mostly consistent for compiling my changes on file save, but even that had to be frequently killed and restarted. I'm not sure if this is a problem with my setup, but I basically learned to not trust that my files were actually built, and always inspect the compiled .js files to confirm they contained the changes that I expected.Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license