The goal of this project is to create, in a few simple steps, a kubernetes work environment, where you can test the performance and resilience of your own web application, through load tests and chaos engineering experimentation.
The following resources will be installed:
- Azure Kubernetes Service
- Azure Load Testing
- Chaos Experiment
This project contains a pipeline that deploys all the previous components to an Azure Subscription. The execution of this pipeline creates the resources, then deploys to the AKS cluster the following resources:
- the Nginx Ingress Controller
- the web app to be tested
- Prometheus and Grafana to monitor the web app metrics
- the Chaos Mesh to simulate random faults
Finally creates and runs the JMeter load test and the Chaos Experiment during the load test, to cause really high CPU usage in your app pods.
Follow the instructions to prepare the environment before starting the pipeline.
The first step you need to do before the environment setup is to fork your repository.
- To be able to follow this tutorial you must be a subscription owner.
To check if your user is a subscription owner, open the Azure portal, then open the subscription, and check if your user is listed as Owner in the
Access Control (IAM)
blade, as you can see in the following image - A Container Registry with an image. If you don't have a container registry with an image check the file README-createImageForTest.md.
- Docker Desktop installed https://www.docker.com/products/docker-desktop/
- Azure CLI installed https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest
Create a resource group that will contain all the resources generated.
Before creating the resource group, you should decide the target region. To see the list of current available regions, you can execute this command
az account list-locations -o table
Choose a region, and use the "Name" value
To create the resource group, the command is:
az group create -l <REGION-NAME> -n <RESOURCE-GROUP-NAME>
Substitute <REGION-NAME>
with the Name value of the chosen region, then choose a unique resource group inside you subscription and use it in place of <RESOURCE-GROUP-NAME>
Sample command:
az group create -l westeurope -n unique-resource-group-aks-demo
The command output is a JSON response like this one
{
"id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/xxxxxxxxxxxxxxxxxxxxxxx",
"location": "xxxxxxxxxxxx",
"managedBy": null,
"name": "xxxxxxxxxxxxxxxxxxxxxxx",
"properties": {
"provisioningState": "Succeeded"
},
"tags": null,
"type": "Microsoft.Resources/resourceGroups"
}
Reference: https://docs.microsoft.com/en-us/cli/azure/group?view=azure-cli-latest
Create a service principal identity, and assign the owner role to the group created in the previous step.
VERY IMPORTANT: Save the command output in Notepad, you will need it in the next step. If you forget this output, you won't be able to launch the GitHub Action
The command that creates the Service Principal is:
az ad sp create-for-rbac --name <SERVICE-PRINCIPAL-UNIQUE-NAME> --role owner --scopes /subscriptions/<SUBSCRIPTION-ID> --sdk-auth
Choose a Service Principal name that is unique inside you Azure Active Directory, and use it in place of <SERVICE-PRINCIPAL-UNIQUE-NAME>
.
To get the <SUBSCRIPTION-ID>
value, use this command
az account show --query id --output tsv
The ouptut of this command is something like
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Copy the output, and use it to substitute the <ID>
inside the string /subscription/<ID>
. The result is /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
, and this is the <SUBSCRIPTION-ID>
value.
An example of the command that creates the Service Principal is:
az ad sp create-for-rbac --name "unique-sp-name-for-aks-demo" --role owner --scopes /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx --sdk-auth
Reference: https://docs.microsoft.com/en-us/azure/developer/github/connect-from-azure?tabs=azure-cli%2Clinux
Copy the full output from the previous step inside the GitHub secret key AZURE_CREDENTIALS. You can find AZURE_CREDENTIALS in GitHub Setting-->Secret-->Actions
Sample
{
"clientId": "651ca1e0-XXXX-XXXX-XXXX-aa7c11e10a57",
"clientSecret": "QOFEWIJFQewfEWFewqFewFewfEWf34_h.pj",
"subscriptionId": "74LVd6eb-XXXX-XXXX-XXXX-ecec2fm3c22e",
"tenantId": "72f988bf-XXXX-XXXX-XXXX-2d7cd011db47",
"activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
"resourceManagerEndpointUrl": "https://management.azure.com/",
"activeDirectoryGraphResourceId": "https://graph.windows.net/",
"sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
"galleryEndpointUrl": "https://gallery.azure.com/",
"managementEndpointUrl": "https://management.core.windows.net/"
}
Create the following GitHub secrets
- AZURE_RG containing the Resource Group Name that has been created in the first step of this tutorial. All the Azure resources will be created in this resource group.To remove all those resources, you can delete this Resource Group.
- AZURE_SUBSCRIPTION containing the Azure Subscription ID, where the Resource Group was created in the first step of this tutorial.
- GRAFANA_ADMIN_PASSWORD containing the password to access Grafana. This can be a random guid.
Also save the value you used for GRAFANA_ADMIN_PASSWORD in the notepad as you won't be able to read it from github after saving it. |
After that, you will have the following secrets
Then you need to choose:
- If you want to try the base-scenario with a single stateless image contained in an Azure Container Registry go here:README-baseScenario.md
- If you want to try the advanced-scenario (where we used public images by default) go here:README-advancedScenario.md
When the pipeline has completed, you can retrieve the IP address of your web app to test it. If you do not know how to retrieve it you need to go to the file: README-getExternalIP.md and follow one of the two ways.
Now that you have retrieved the public IP address of the web app, you can load your web app home page at http://<PUBLIC_IP_ADDRESS>
, and going to http://<PUBLIC_IP_ADDRESS>/grafana
you can login in Grafana.
The credentials to login are:
- username: admin
- password: the one you entered as a secret in step number 4.
Here are some issues that can happen during the installation phases
- Resource group creation failure. You could be using an existing resource group name. Retry the phase with another name
- Service Principal creation failure.
- You could be using an existing name. Retry the phase with another name.
- Your account isn't a subscription owner. Retry this phase using a subscription owner
- Start the Pipeline failure.
- Check the GitHub secrets created in phase #3 and #4, then repeat phase #7
- The names usend in phase #5 aren't unique, or contain forbidden characters. Change them, then repeat phase #7
- The Subscription resource providers aren't registered. Check phase #6, then repeat phase #7
- The input used in phase #7 aren't valid. Check input validity, then repeat phase #7
This project allows several customizations. Keep in mind that some values cannot be changed after the first pipeline run.
If you want more choice in VM families selection, you can modify the GitHub Workflow YAML file, adding more VM Families to the choice.
Here's how to do it
-
Get all available size for specified location
Every region has a different list of available VM families. To list all the available families, perform the following command, then copy the name of a vm size with no restrictions
az vm list-skus --location <replace with location> -r virtualMachines --output table
-
Edit the github workflow
Modify the
.github/workflows/base-scenario.yml
file by adding the new vm in the AGENTVMSIZE input. Write the name in lower case. You can modify the file directly in the GitHub web page, or you can clone the repo locally, perform the editing, then push the modified files to the repo as you can read here
You can change yor web app configuration in the Helm Chart. You have to edit the src\helloworld-service\user-service-chart\templates\infrastructure.yaml
file, starting from line 114, for example changing resources requests and limits, or adding environment variables to configure your app, or adding some persistent volume claim. You can modify the file directly in the GitHub web page, or you can clone the repo locally, perform the editing, then push the modified files to the repo as you can read here
After the editing, yo shoud re-run the pipeline.
Before the first pipeline run, you can change the default load test file, that you can find in Bicep\ALT\base-scenario\Test1.jmx
. This load test file can be edited with Apache JMeter. You can modify the file directly in the GitHub web page, or you can clone the repo locally, perform the editing, then push the modified files to the repo as you can read here
After the pipeline first launch, you can perform load testing using the Azure Load Testing resource created, you can find the documentation at this link
The experiment used in the pipeline causes a very high CPU usage in your app pods for some minutes.
Before the first pipeline run, you can change this behaviour configuring the json file located in "./Bicep/ACS/parameters.json". The configurable value is:
- duration = Duration in seconds of the experiment You can modify the file directly in the GitHub web page, or you can clone the repo locally, perform the editing, then push the modified files to the repo as you can read here
After the pipeline first launch, you can perform Chaos Engineering running Chaos Experiments as you can see in this link
For other experiments you will need to enable some capabilities:
-
Open the Azure portal.
-
Search for Chaos Studio in the search bar.
-
Click on Targets and navigate to your AKS cluster.
-
Click on Manage Actions.
-
Select the desired capabilities and click Save.
- Search for Chaos Experiments in the search bar.
- Click "Create"
- Select the subscription and resource group
- In the "Experiment designer" click "Add action"
- Chose the experiment type
Reference:
https://docs.microsoft.com/en-us/azure/chaos-studio/chaos-studio-tutorial-aks-portal
To remove all the objects created, you must:
- Delete the resource group created in step #1:
- From Azure portal
Open the resource groups view in Azure Portal, then select the resource group and delete it with the delete button - From Azure CLI
Reference https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/delete-resource-group?tabs=azure-cliaz group delete --name <resource-group-name>
- From Azure portal
- Delete the service principal created:
-
From Azure portal
Go to Azure Active Directory in Azure Portal, then selectApp Registration
blade, and thenOwned Applications
tab. ClickView All the Applications
button, search the Service principal, click on it, then delete it with theDelete
button -
from Azure CLI
Retrieve service principal id, replace the name of the service principal with the one chosen in the second step$ID = az ad sp list --display-name <replace_with_service_principal_name> --query [].objectId -o tsv
Delete service principal
az ad sp delete -- id $ID
Reference https://docs.microsoft.com/it-it/cli/azure/ad/sp?view=azure-cli-latest#az-ad-sp-list
Reference https://docs.microsoft.com/it-it/cli/azure/ad/sp?view=azure-cli-latest#az-ad-sp-delete
-
https://github.com/Azure/bicep/blob/main/docs/examples/101/aks/main.bicep
https://github.com/Azure/bicep/blob/main/docs/examples/101/container-registry/main.bicep
https://docs.microsoft.com/en-us/azure/developer/github/connect-from-azure?tabs=azure-cli%2Clinux
https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli
https://docs.microsoft.com/en-us/azure/templates/microsoft.loadtestservice/loadtests?tabs=bicep
https://github.com/marketplace/actions/azure-container-registry-build
https://docs.microsoft.com/en-us/azure/aks/internal-lb#create-an-internal-load-balancer
https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/deploy-cli
https://docs.microsoft.com/en-us/azure/load-testing/how-to-parameterize-load-tests