-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Total refactor to add more type safety and testability #5
base: main
Are you sure you want to change the base?
Conversation
…green before continuing.
Wow! Basically a rewrite. Guess I’ll have to take it for a spin and let you know. |
print("Deploying to the web instances.", file=sys.stderr) | ||
env_name = self.state.env_metadata[self.state.subenvs['web']]['name'] | ||
self._update_environment(env_name) | ||
# Make sure we're not trying to deploy on top of an environment in distress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about this comment. Could there be a case we need a deploy to get us out of a bad state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely.
Unfortunately, there's a lot of reasons that an environment could be in a failing state. I added this check because a worker environment was in a warning state due to messages in its dead-letter queue. Before I added this check, the new code would be deployed to the worker, but safecast_deploy
would then spin for several minutes waiting for the environment to become green again before ultimately failing (since the new deploy didn't fix the problem). Sometimes it also seems like there's a significant lag between fixing a problem and seeing health status come back to green as well.
So in general, I think it's a good idea to fail quickly and make the operator track down what the problem is.
Right now, it's also possible to use new_env
to deploy out to a new environment, bypassing this check... which could run into the same problems with things like dead-letter queues. That's a potential area for enhancement as well.
Finally, we could enhance the code to allow overriding this check (using a CLI flag) in a subsequent PR. Would also probably be a good idea to add a restart
command to the deployer, although that's pretty easy through the Web UI as well.
I think these should all be taken as enhancements for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, we could enhance the code to allow overriding this check (using a CLI flag) in a subsequent PR.
this approach makes sense to me. 👍
deploy.py
Outdated
safecast_deploy.same_env.SameEnv( | ||
EnvType(args.env), | ||
state.old_aws_state, | ||
state.new_aws_state(new_version=args.version), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got an error here trying to take it for a spin:
~/safecast/api/safecast_deploy refactor 01:57:47
python-3.8.5 ❯ ./deploy.py same_env api dev api-master-1093-1baaa1ddec89e5074a19c90d9b5f782c80e4b2d3
Traceback (most recent call last):
File "./deploy.py", line 204, in <module>
main()
File "./deploy.py", line 199, in main
parse_args()
File "./deploy.py", line 120, in parse_args
args.func(args)
File "./deploy.py", line 182, in run_same_env
state.new_aws_state(new_version=args.version),
TypeError: new_aws_state() missing 1 required positional argument: 'target_env_type'
Maybe this should be like this?
env_type = EnvType(args.env)
...
state.new_aws_state(env_type, new_version=args.version),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I tried that then got
/Users/matschaffer/safecast/api/safecast_deploy/.direnv/python-3.8.5/bin/python /Users/matschaffer/safecast/api/safecast_deploy/deploy.py same_env api dev api-master-1093-1baaa1ddec89e5074a19c90d9b5f782c80e4b2d3
Traceback (most recent call last):
File "/Users/matschaffer/safecast/api/safecast_deploy/deploy.py", line 205, in <module>
main()
File "/Users/matschaffer/safecast/api/safecast_deploy/deploy.py", line 200, in main
parse_args()
File "/Users/matschaffer/safecast/api/safecast_deploy/deploy.py", line 120, in parse_args
args.func(args)
File "/Users/matschaffer/safecast/api/safecast_deploy/deploy.py", line 183, in run_same_env
state.new_aws_state(env_type, new_version=args.version),
File "/Users/matschaffer/safecast/api/safecast_deploy/safecast_deploy/state.py", line 30, in new_aws_state
new_env_num = max([(tier.num + 1) % 1000 for (tier_type, tier) in old_aws_state.envs[target_env_type]], key=operator.itemgetter(1))
NameError: name 'old_aws_state' is not defined
So still need to dig more I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not a lot of test coverage on that part of state.py
. I've pushed a fix to that particular issue (needed self.
in front of old_aws_state
) and I'm going to see if I can add a basic unit test of that code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting this now
state.new_aws_state(env_type, new_version=args.version),
File "/Users/matschaffer/safecast/api/safecast_deploy/safecast_deploy/state.py", line 30, in new_aws_state
new_env_num = max([(tier.num + 1) % 1000 for (tier_type, tier) in self.old_aws_state.envs[target_env_type]], key=operator.itemgetter(1))
File "/Users/matschaffer/safecast/api/safecast_deploy/safecast_deploy/state.py", line 30, in <listcomp>
new_env_num = max([(tier.num + 1) % 1000 for (tier_type, tier) in self.old_aws_state.envs[target_env_type]], key=operator.itemgetter(1))
ValueError: too many values to unpack (expected 2)
Plus it looks like you might be missing import operator
safecast_deploy/state.py
Outdated
parsed_version = self._parse_version(new_version) | ||
new_envs = collections.defaultdict(dict) | ||
new_env_num = max([(tier.num + 1) % 1000 for (tier_type, tier) in self.old_aws_state.envs[target_env_type]], key=operator.itemgetter(1)) | ||
for (tier_type, tier) in self.old_aws_state.envs[target_env_type]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (tier_type, tier) in self.old_aws_state.envs[target_env_type]: | |
new_env_num = max([(tier.num + 1) % 1000 for (tier_type, tier) in self.old_aws_state.envs[target_env_type].items()]) | |
for (tier_type, tier) in self.old_aws_state.envs[target_env_type].items(): |
This got me a bit further. But then I hit this
File "/Users/matschaffer/safecast/api/safecast_deploy/deploy.py", line 183, in run_same_env
state.new_aws_state(env_type, new_version=args.version),
File "/Users/matschaffer/safecast/api/safecast_deploy/safecast_deploy/state.py", line 36, in new_aws_state
new_tier = dataclasses.replace(new_tier, version=new_version, parsed_version=parsed_version)
File "/Users/matschaffer/.asdf/installs/python/3.8.5/lib/python3.8/dataclasses.py", line 1293, in replace
Looks like AwsTier doesn't have a version
. Not sure what the intent is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github might be confused on the diff. Was trying to suggest for lines 30/31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably going to be easier for me to just write the unit test for that method and shake out all the issues.
And then move to a static-typed, compiled language. Although I hear Python has some static type support in 3.6 that can be used for checking before execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been learning rust and typescript lately. Happy to try either :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I've got more experience with it, I think it would be possible to build a Terraform provider to do much of this and automatically determine whether to use a new environment (although the official AWS ones would not be sufficient, would have to rewrite from scratch). Not that I'm planning another rewrite.
Latest code push should cover most of state.py with unit tests. |
I needed change to get the git write working #6 - just merge it if it looks right to you. |
Fix naming and indentation on git writer
Thanks for the fix! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hit one more little buggy when prd-wrk timed out on a deploy
Fix field mismatch on EnvUpdateTimedOutException
I feel like I've been able to learn a lot more about modern development in Python while refactoring this. I am not satisfied with the architecture or configurability of the system yet (and it would be nice to add static typing), but it has been refactored enough that it can be at least minimally unit tested.
It's passing basic unit tests, which will help a lot with development churn in the future, but there is still code in the CLI that needs to be tested by actually running against live environments. This PR should be left in draft state until that testing is done.