Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inventory workflow workflow manager - provision in 2.3.5.3 #19

Conversation

syed-khadeerahmed
Copy link
Collaborator

@syed-khadeerahmed syed-khadeerahmed commented Nov 8, 2024

Description

Device Provisioning - version 2.3.5.3

Problem:
Error in Playbook: Reported failure to provision wired devices, even though the UI indicated success.
Log Details: DNAC logs showed an error due to a 'NoneType' object not having the expected get attribute.
Cause: The check_task_response_status function was incompatible with older API versions used for device provisioning, causing it to misinterpret API responses.

Solution:
Updated Logic for Provisioning:
New handling logic for older API versions was added in inventory_workflow_manager.py.
The function now retrieves an executionId after initiating provisioning, which is used to continuously check the provisioning status.
Status is checked in real-time, logging progress as:
Success: Logs the success message and records the device in the provisioned list.
Failure: Logs an error and raises an exception.
In Progress: Logs a status update without interrupting the process.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • All the sanity checks have been completed and the sanity test cases have been executed

Ansible Best Practices

  • Tasks are idempotent (can be run multiple times without changing state)
  • Variables and secrets are handled securely (e.g., using ansible-vault or environment variables)
  • Playbooks are modular and reusable
  • Handlers are used for actions that need to run on change

Documentation

  • All options and parameters are documented clearly.
  • Examples are provided and tested.
  • Notes and limitations are clearly stated.

Screenshots (if applicable)

Notes to Reviewers

@@ -1799,7 +1799,7 @@ def provision_wired_device_v1(self, device_ip, site_name, device_type):
site_name (str): The name of the site where the device will be provisioned.
device_type (str): The type of device being provisioned.
Returns:
bool: True if provisioning is successful, False otherwise.
self (object): An instance of the class after the provision operation is performed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provision_wired_device_v2 is not returning anything. Can you please check that API as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I have checked and we can safely remove the return. and testing also done

@@ -1799,7 +1799,7 @@ def provision_wired_device_v1(self, device_ip, site_name, device_type):
site_name (str): The name of the site where the device will be provisioned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change the site_name to site_name_hierarchy? You can also change in provision_wired_device_v2...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the code

self.check_task_response_status(response, validation_string, 'provision_wired_device')
self.deleted_devices.append(device_ip)
exec_id = response.get("executionId")
response = self.get_execution_details(exec_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are calling 'get_execution_details' outside of the while loop and only doing it once. I believe in your test case it was successful and worked fine. However, if it fails, you will be stuck in an infinite loop..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing a while loop, especially for operations like status polling or waiting for an external event, it’s critical to plan an exit strategy before implementing the loop. An improperly structured while loop can lead to an infinite loop, which may cause the code to hang indefinitely, consume resources, or fail to produce results.

Always ensure there is a clear and reachable exit condition. The loop should have an end state or a known outcome to prevent infinite execution.

For processes that may not complete within expected time frames (e.g., polling a status), include a timeout or retry counter to avoid endless looping. For example, setting a maximum number of retries (e.g., max_retries) helps ensure the loop will exit even if the expected result isn’t achieved.

If the loop depends on an external response (such as polling a task status), remember to refresh the conditions being checked. This prevents the loop from becoming stale, especially in network or API calls.

Include logs that indicate the loop’s progress and any potential issues. This makes it easier to understand how the loop is operating and to troubleshoot if something goes wrong.

Sample Code:—

# Retry limit to prevent infinite looping
max_retries = 10
retries = 0

while retries < max_retries:
    response = self.get_execution_details(exec_id)  # Refresh condition inside loop
    status = response.get("status")  # Store the status in a variable to avoid redundant calls
    
    if status == "SUCCESS":
        # Successfully exit the loop
        self.log("Device provisioned successfully.", "INFO")
        break
    elif status == "FAILURE":
        # Exit the loop with failure handling
        raise Exception("Provisioning failed.")
    else:
        # Continue polling and log progress
        self.log("Provisioning in progress... retrying.", "DEBUG")  <<< Do we need to sleep before doing retry??
        
    retries += 1

# Check if the loop ended without success or failure
if retries == max_retries:
    self.log("Provisioning timed out after {0} retries.".format(max_retries), "WARNING")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have used a existing function from the DNAC.py

exec_id = response.get("executionId")
response = self.get_execution_details(exec_id)
while True:
if response.get("status") == "SUCCESS":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can avoid calling response.get("status") multiple times by storing it in a variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have used a existing function from the DNAC.py

@madhansansel madhansansel self-requested a review November 9, 2024 18:45
@madhansansel madhansansel merged commit ae455b6 into cisco-en-programmability:main Nov 12, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants