-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rock solid test deployments and back-outs #34
Comments
Further to this, and in response to the inevitable question "can't we just reapply the old host profile in order to rollback a change?", let me simplify the problem definition by drawing focus on a single NCM component, but bear in mind that we will be able to apply this easily across multiple components. A component has CIs that are in scope, and there will be lots more out of scope. By in scope, I mean specifically that the Perl code in
If we have independently recorded every change made by every NCM component, in order, indexed by a change number that can be traced back to a single |
@msmark thanks for detailed explanation. i feel a bit relunctant to start typing comments in here 😄 can you open an issue in ncm-cdispd/ncm-ncd or CCM wrt the test profile part? (if possible clarify the following: does the test profile have some (meta)data in the profile itself to mark it as testonly (in the sense that you would need to compile a different one to use it in production?) if not, how do you propose it can be seens as a test profile?) my main concern is that this is not the next step for quattor, but already a few steps in the future. i would really like to see the components cleaned up and use more CAF first. from your proposal, it looks like you want to skip all that and wrap it in something magical. |
The test profile part is tied in with the grander design. Once test profiles have been dispatched, the end user expects to be able to easily browse the full list of files that would have been changed (with the changed content if verbose output is requested) and the full list of commands that would have been executed on any number of hosts as a result of deploying a sandbox to a domain. This requires more than the current ad-hoc messages that an NCM component outputs in no action mode. It requires a consistent way of defining across all NCM components what actions are to be performed, how and in what order. But once you've solved that problem, you might as well also provide the undo commands, then you get the rest. Which is what I've done. So it's not so easy to do just test profiles, and not provide a rollback mechanism. In fact, if we did test profiles correctly, providing a rollback mechanism on top of that would be not much extra work. What I don't currently understand is what |
@msmark you could already have test profiles that can only trigger wrt CAF::History, it is simply an internal list and an API to add entries (via a so you can certainly ignore CAF::History if you think it's not sufficient/too complicated/not well suited. |
I see what you mean. Yes, the test profile idea will run components with |
However, that said, even the test profile has Aquilon components to it. For example, being able to atomically deploy and compile in a single step, and get back the results. Of points 3, 4 and 7 in the issue description, only point 4 relates to CCM. So I could create one Aquilon issue covering points 3, 4 and 7 so that we could discuss the larger implications of supporting test profiles first? Before separating these issues into subtasks? |
@msmark yes, try to factor out the aquilon specific bits. similarly, you seem to want to communicate with the host via to further simplify the whole picture, i would even consider separating the remote part from the local part: assume we have a test profile in the correct location on the host (somehow), what should happen when we run |
@stdweird I don't want to communicate with the host in any particular way, it's probably my lack of understanding as to what communication goes on today and exactly where in the stack it happens. So if there is a more logical place for it, please let me know where it is. At the moment, I am guessing 😄 Separating the remote and local parts also seems like a reasonable approach. You're quite right that there must be a way of doing all of these tasks individually by hand on a host, as well as have Aquilon orchestrate the tasks over many thousands of hosts. |
@stdweird looking into this a bit further today, I don't think we need any changes to So then the question becomes, do we modify |
@stdweird Ah, my mistake, In which case, I think That would be a good start, I'll raise some separate issues for that piece of the puzzle. I haven't decided yet how to get the results back to a central location, probably need to post them back to another URL, the reverse of |
sending the results can be handled by ncm-ncd, esp if you implement quattor/ncm-ncd#49 |
@stdweird Thanks, nice suggestion. Btw, as this touches various areas I need a name to refer to the whole piece. Unless there is an objection, I'd like to adopt the name Project Igneous to refer to this whole lofty goal of "Rock solid test deployments and back-outs". It means when I raise issues in various places and refer them each back to this issue, I can do so in a concise manner. |
@msmark or paste the url of this issue in any comment (or the description), and all this github issue will show all issues and/or PR that reference it. |
I agree with @stdweird that this is probably a better way to reference the discussion in a useful way... |
…aster/by_topic/replace_H301_rule to master * commit '6fd99a54b79e16278380068c5e0903a429681800': tests: codestyle: add flake8 plugin to check for one import per line
This issue is based on a new feature request I raised in email on the quattor-discuss mailing list last week. It touches many different components, e.g. Aquilon, cdp-listend, ncm-cdispd, ncm-ncd, CAF and lots of NCM components. However, I am creating this umbrella issue to describe the high-level request from which sub-issues may be readily created.
The ultimate goal is: we need rock solid test deployments and rock solid back-outs. More specifically, to provide a way to allow an operator who is managing hundreds or thousands of hosts in a domain to:
This can be broken down into the following requirements:
aq deploy --source <sandbox> --target <domain> --compile
should atomically deploy git changes from a sandbox to a domain and compile the domain. If the compile fails, the git changes are removed. If it succeeds, when host profiles are sent across to each host, the fact that this is a deploy and the change ID involved must be communicated withncm-cdispd
. The command returns a unique change ID.aq undeploy --change <id> --compile
should atomically remove the git changes identified by the change ID and compile the domain. Not allowed without the--compile
option. When host profiles are sent across to each host, the fact that this is an undeploy and the change ID involved must be communicated withncm-cdispd
. If this is not the last deploy made to a domain, the command will fail listing the other change IDs that have been subsequently applied. An additional flag--redeploy
may be provided, which indicates that all change IDs subsequently applied must be undeployed and then re-deployed again once the selected change ID has been removed.aq deploy --compile
andaq undeploy --compile
must support a--test
option. This option requests a test deployment. This deploys to (or undeploys from) a copy of the domain, not the live domain. The profiles are compiled and shipped out to every host in the domain with a new flag indicating toncm-cdispd
that this is a test profile only. What the host does when it receives a test profile is discussed below.ncm-cdispd
receives a test profile, as described above, it puts it in a different cache than it would normally use for live profiles. Then it runs all NCM components with--noaction
(only NCM components that supportNoAction
), logging output to a test log -- not the normal log. Then it deletes the test profile.CAF::Executor
object. The object encapsulates each individual task involved, with the change ID that links them all together, as well as the actions required to undo the change and a human friendly description. This object becomes a key part of information recorded byCAF::History
but is also used by the component to execute a change. We should also have aCAF::Evaluate
object in which we can enclose arbitrary code that makes a change but that cannot be expressed by another CAF method. See exampleCAF::Executor
object below.CAF::Executor
also takes a human friendly description of the change being made, as well as the steps required to undo the change.CAF::Process
will additionally need to capture the command needed to undo the change in order to informCAF::Executor
of the same.CAF::FileWriter
andCAF::FileEditor
can automatically ensure thatCAF::Executor
has an undo capability by backing up the file before and after it makes any changes (see also point 10 below). If there is no way to undo a change, there should be a way to flag this up the framework. See below for an example visual representation of aCAF::Executor
object.aq show testlog --change <id>
or similar command that collects the output of all of the test profile runs from every host that ran--noaction
as a result of item 4 above, with a succinct but user readable list of tasks performed by each component (theCAF::Executor
objects). With the--undo
flag will show the undo commands from theCAF::Executor
objects instead. By succinct, this means one line per task across allCAF::Executor
objects in scope (see exampleCAF::Executor
object below), but with the ability to drill down into more detail if needed (e.g. a--verbose
option).ncm-cdispd
receives a new profile, it records whether this is a live profile or a test profile, and the associated change ID. It passes this information ontoncm-ncd
.ncm-ncd
executes components, it records the exact order in which each component is run with the change ID. EachCAF::Executor
orCAF::History
object created during a deploy or undeploy is logged. If doing an undeploy, it computes the appropriate order to undo changes based on theCAF::Executor
objects that were created during the deploy.CAF::Executor
objects to wrap every change they make. If a file is being changed, the original file and a copy of the new file are stashed in a different directory. This is used byCAF::FileWriter
orCAF::FileEditor
to check and compute an appropriate rollback during an undeploy.ncm-cdispd
orncm-ncd
will play rollback commands logged inCAF::Executor
objects. It will not expect the NCM component to understand how to revert the state. This is because an NCM component is only good at handling what it thinks is currently in scope. Its view of the world changes as conditional logic within the Perl code routes down different code paths, and as new versions of NCM components are delivered. By recording an exact list of undo commands at the time that the change is made, it can be guaranteed that changes can be successfully reversed even if the NCM component code has been subsequently modified or removed. See theCAF::Executor
example below, note that in many cases recording whichCAF
method was used and the arguments are enough for the history. After a successful undeploy, re-runningncm-ncd
with the rolled back (now current) profile should be the same as a no-op.Here is an example visual representation of a
CAF::Executor
object for a component that wants to change a file and then send a HUP signal to a process. You'll see it essentially groups together a bunch of otherCAF
methods:The text was updated successfully, but these errors were encountered: