Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use not winrm instead of is ssh condition #442

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aklakina
Copy link

Communicator "none" issue

Context

Your plugin gives packer the option to not set up a communicator for provisioning. This is especially useful when your packer provisioning step is just calling an ansible playbook or some other OS configurator.

The issue

When this happened your plugin handled the connection as a winrm connection when creating the key vault. The problem with this is that the user did not provide any winrm specific configuration because they didn't want any communicator. This led to fatal deployment error.

Example configuration:

source "azure-arm" "baseimage" {
  <any configuration here without defining winrm>
  communicator = "none"
}

Error output:

Debug mode enabled. Builds will not be parallelized.
azure-arm.baseimage: output will be in this color.

==> azure-arm.baseimage: Running builder ...
    azure-arm.baseimage: Creating Azure Resource Manager (ARM) client ...
    azure-arm.baseimage: ARM Client successfully created
    azure-arm.baseimage: temp admin user: 'packer'
    azure-arm.baseimage: temp admin password: 'VP0AHVC47GturUZCm3rGcOsVkoUYnUHR'
==> azure-arm.baseimage: Getting source image id for the deployment ...
==> azure-arm.baseimage: Using existing resource group ...
==> azure-arm.baseimage:  -> ResourceGroupName : '<rg>'
==> azure-arm.baseimage:  -> Location          : '<loc>'
==> azure-arm.baseimage: Validating deployment template ...
==> azure-arm.baseimage:  -> ResourceGroupName : '<rg>'
==> azure-arm.baseimage:  -> DeploymentName    : 'kvpkrdp72fap2n0ot'
==> azure-arm.baseimage: Deploying deployment template ...
==> azure-arm.baseimage:  -> ResourceGroupName : '<rg>'
==> azure-arm.baseimage:  -> DeploymentName    : 'kvpkrdp72fap2n0ot'
==> azure-arm.baseimage: Getting the certificate's URL ...
==> azure-arm.baseimage:  -> Key Vault Name        : 'pkrkv72fap2n0ot'
==> azure-arm.baseimage:  -> Key Vault Secret Name : 'packerKeyVaultSecret'
==> azure-arm.baseimage:  -> Certificate URL       : 'https://pkrkv72fap2n0ot.vault.azure.net/secrets/packerKeyVaultSecret/7caa02c566b341cb8e800b8c7591c420'
==> azure-arm.baseimage: Setting the certificate's URL ...
==> azure-arm.baseimage: Validating deployment template ...
==> azure-arm.baseimage:  -> ResourceGroupName : '<rg>'
==> azure-arm.baseimage:  -> DeploymentName    : 'pkrdp72fap2n0ot'
==> azure-arm.baseimage: Deploying deployment template ...
==> azure-arm.baseimage:  -> ResourceGroupName : '<rg>'
==> azure-arm.baseimage:  -> DeploymentName    : 'pkrdp72fap2n0ot'
==> azure-arm.baseimage: ERROR: -> DeploymentFailed : At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.
==> azure-arm.baseimage: ERROR:   -> ResourceDeploymentFailure : The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'.
==> azure-arm.baseimage:
==> azure-arm.baseimage: polling after CreateOrUpdate: polling failed: the Azure API returned the following error:
==> azure-arm.baseimage: 
==> azure-arm.baseimage: Status: "DeploymentFailed"
==> azure-arm.baseimage: Code: "ResourceDeploymentFailure"
==> azure-arm.baseimage: Message: "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.\nThe resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
==> azure-arm.baseimage: Activity Id: ""
==> azure-arm.baseimage: 
==> azure-arm.baseimage: ---
==> azure-arm.baseimage: 
==> azure-arm.baseimage: API Response:
==> azure-arm.baseimage: 
==> azure-arm.baseimage: ----[start]----
==> azure-arm.baseimage: {"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions/<subscription_id>/resourceGroups/<rg>/providers/Microsoft.Resources/deployments/pkrdp72fap2n0ot","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"ResourceDeploymentFailure","target":"/subscriptions/<subscription_id>/resourceGroups/<rg>/providers/Microsoft.Compute/virtualMachines/pkrvm72fap2n0ot","message":"The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'.","details":[{"code":"CertificateImproperlyFormatted","message":"The secret retrieved from https://pkrkv72fap2n0ot.vault.azure.net/secrets/packerKeyVaultSecret/7caa02c566b341cb8e800b8c7591c420 is empty string."}]}]}}
==> azure-arm.baseimage: -----[end]-----
==> azure-arm.baseimage:
==> azure-arm.baseimage: 
==> azure-arm.baseimage: Deleting Virtual Machine deployment and its attached resources...
==> azure-arm.baseimage: Could not retrieve OS Image details: unable to obtain a OS disk for "pkrvm72fap2n0ot", please check that the instance has been created
==> azure-arm.baseimage: Deleted -> pkrvm72fap2n0ot : 'Microsoft.Compute/virtualMachines'
==> azure-arm.baseimage: Deleted -> pkrni72fap2n0ot : 'Microsoft.Network/networkInterfaces'
==> azure-arm.baseimage: Failed to find temporary OS disk on VM.  Please delete manually.

The fix

Originally the communicator type was checked if it was set to "ssh" and if not then it defaulted to the winrm deployment. This should be changed to a "not winrm" check. This way if the user set it to "none" it will still deploy a correct certificate without any extra configuration.

@aklakina aklakina requested a review from a team as a code owner September 14, 2024 07:20
Copy link

hashicorp-cla-app bot commented Sep 14, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@JenGoldstrich JenGoldstrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @aklakina thanks for this contribution, setting none results in a very ungraceful experience currently I agree, I have a few concerns with this change.

The first is that now, the default for an unset communicator (which is a valid input in Azure) on Windows is SSH. This isn't how most people use Azure on Windows and I believe will break a lot of builds. This function (GetCommunicatorSpecificKeyVaultDeployment) is only called for Windows builds.

If you're running Ansible playbooks you'd still want to connect with SSH, and your change still sets up a communicator fully due to the way the builder code is written.

When none comm type is set after your change an SSH certificate will be uploaded , and then WinRM will be used

WinRMConfig: func(multistep.StateBag) (*communicator.WinRMConfig, error) {
return &communicator.WinRMConfig{
Username: b.config.UserName,
Password: b.config.Password,
}, nil
},
},
to connect to the WIndows VM still, the plugin builder isn't written in such a way to skip connecting, and on Windows it defaults to WinRM, so with none set and after your change the behavior is a bit inconsistent, as it uploads an ssh private key in a keyvault, and then tries to connect over WinRM. This would require a bit more work to make this behavior consistent.

Can you tell me more about why setting the communicator type as ssh won't meet your use case? With this a connection will still be created for Windows builds with none set, just a WinRM username/password, with a VM configured for SSH, but probably with a default username and password that WinRM can access, this isn't a consistent experience. If you could post your full template I can understand more about your goal here, if you're using the Ansible packer provisioner as I mentioned you'll still need some sort of VM connection.

I think our support for the none option needs revisiting, maybe it should not be an option on our plugin given how the rest of the plugin is written.

@aklakina
Copy link
Author

aklakina commented Oct 30, 2024

Hi @JenGoldstrich, thank you for your time and review.

The use case where this communicator: none issue came up is about reusing already deployed images.
The whole concept was to have an OS scan after the initial workflow to check for vulnerabilities and errors during initialization. This step would be made on both linux and windows, but as packer and the hcl language is not present for long, they do not have the capability to handle control flows depending on input variables (or if they do I found it a bit complicated).

In this form I haven't tried explicitly setting the communicator to ssh yet. Maybe it could solve this cross platform issue. The root problem with the none setting is that the os: Windows sets the communicator settings to winrm by default (for Linux it sets it to ssh) and the communicator none does not override this and the azure templates are being deployed as if someone specified either connector. The templates are expecting a certificate to be present in order to deploy the disk but it is not there because key vault template got an empty winrm config and the template deployment fails. I thought setting the default for none to ssh overcomes this issue.

Currently I solved this issue with hcl not being so dynamic with a code generator. It makes sense in our environment because we mostly reuse every template with little-to-no changes.

The first is that now, the default for an unset communicator (which is a valid input in Azure) on Windows is SSH. This isn't how most people use Azure on Windows and I believe will break a lot of builds. This function (GetCommunicatorSpecificKeyVaultDeployment) is only called for Windows builds.

Would it still be inconsistent if instead of not winrm it would be ssh or none? I think that would not change the default behavior unless packer engine passes the not set communicator as none but I would doubt that.

@JenGoldstrich
Copy link
Contributor

JenGoldstrich commented Oct 31, 2024

It would be a less disruptive change if it was checking for explicitly none yes, but it would still run into the failure with my provided template, can you please provide your full Packer template so I can more quickly understand how you're making this change pass?

"The use case where this communicator: none issue came up is about reusing already deployed images."

After a packer build finishes the VM that created this image is destroyed, you can not run an OS scan after a Packer build unless you create a new VM from that image. To use a Packer build to create an image, anything you want to bake into that image should be handled using a Packer provisioner, which allows you to run shell commands, there is also the Ansible Plugin.

"...(packer and hcl) do not have the capability to handle control flows depending on input variables (or if they do I found it a bit complicated)."

I am not sure I understand what your goal here is, "control flows depending on input variables". Do you mean disabling certain aspects of a Packer build based on different input variables? I'm not sure what value that provides compared to having two different templates, one for each OS type, as they will have different needs inherently, Packer has the https://developer.hashicorp.com/packer/docs/commands/build#except-foo-bar-baz except flag, but I don't believe this works on Provisioners, only post processors and builds, but it seems like a reasonable feature request for Packer itself.

Either way if you set the communicator to none the Azure plugin should not negotiate a connection to the VM, either via WinRM or SSH, currently it does. But for most cases of these images you would need to connect to the created VM before you capture it into an image, your code still results in the WinRM communicator being initiated just without a certificate created in a KeyVault. Originally I suggested using SSH but I think this may not meet your use case, I am confused as to why WInRM fails for your use case, as if you do not provide a certificate the plugin will create one for you, so I don't understand why WinRM was failing for your build originally, providing that template will help answer some of those questions though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants