Containers are quickly becoming an industry standard for deployment of software applications. The business and technological advantages of containerizing workloads are driving many teams towards moving their applications to containers. This demo provides a basic walkthrough of migrating a stateless application from running on a VM to running on Kubernetes Engine (GKE). It demonstrates the lifecycle of an application transitioning from a typical VM/OS-based deployment to a specialized os for containers to a platform for containers better known as GKE.
- Table of Contents
- Introduction
- Architecture
- Initial Setup
- Deployment
- Exploring Prime Flask Environments
- Validation
- Load Testing
- Tear Down
- More Info
- Troubleshooting
There are numerous advantages to using containers to deploy applications. Among these are:
-
Isolated - Applications have their own libraries; no conflicts will arise from different libraries in other applications.
-
Limited (limits on CPU/memory) - Applications may not hog resources from other applications.
-
Portable - The container contains everything it needs and is not tied to an OS or Cloud provider.
-
Lightweight - The kernel is shared, making it much smaller and faster than a full OS image.
What you'll learn This project demonstrates migrating a simple Python application named Prime-flask to:
-
A virtual machine (Debian VM) where Prime-flask is deployed as the only application, much like a traditional application is run in an on-premises datacenter
-
A containerized version of Prime-flask is deployed on Container-Optimized OS (COS)
-
A Kubernetes deployment where
Prime-flask
is exposed via a load balancer and deployed in Kubernetes Engine
After the deployment you'll run a load test against the final deployment and scale it to accommodate the load.
The python app Prime-flask has instructions for creating container in this folder.
Configuration 1: Virtual machine running Debian, app deployed directly to the host OS, no containers
Configuration 2: Virtual machine running Container-Optimized OS, app deployed into a container
Configuration 3: Kubernetes Engine (GKE) platform, many machines running many containers
A simple Python Flask web application (Prime-flask
) was created for this demonstration which contains two endpoints:
http://<ip>:8080/factorial/
and
http://<ip>:8080/prime/
Examples of use would look like:
curl http://35.227.149.80:8080/prime/10
The sum of all primes less than 10 is 17
curl http://35.227.149.80:8080/factorial/10
The factorial of 10 is 3628800
Also included is a utility to validate a successful deployment.
When using Cloud Shell execute the following command in order to setup gcloud cli.
gcloud init
The infrastructure required by this project can be deployed by executing:
make create
This will call script create.sh which will perform following tasks:
- Package the deployable
Prime-flask
application, making it ready to be copied to Google Cloud Storage. - Create the container image via Google Cloud Build and push it to the private Container Registry (GCR) for your project.
- Generate an appropriate configuration for Terraform.
- Execute Terraform which creates the scenarios we outlined above.
Terraform creating single VM, COS VM, and GKE cluster:
Terraform outputs showing prime and factorial endpoints for Debian VM and COS system:
Kubernetes Cluster and Prime-flask service are up:
We have now setup three different environments that our Prime-flask app could traverse as it is making its way to becoming a container app living on a single virtual machine to a pod running on a container orchestration platform like Kubernetes.
At this point it would benefit you to explore the systems.
Jump onto the Debian virtual machine, vm-webserver
, that has application running on host OS. In this environment there is no isolation, and portability is less efficient. In a sense the app running on the system has access to all the system and depending on other factors may not have automatic recovery of application if it fails. Scaling up this application may require to spin up more virtual machines and most likely will not be best use of resources.
gcloud compute ssh vm-webserver --zone us-west1-c
List all processes:
ps aux
root 882 0.0 1.1 92824 6716 ? Ss 18:41 0:00 sshd: user [priv]
user 888 0.0 0.6 92824 4052 ? S 18:41 0:00 sshd: user@pts/0
user 889 0.0 0.6 19916 3880 pts/0 Ss 18:41 0:00 -bash
user 895 0.0 0.5 38304 3176 pts/0 R+ 18:41 0:00 ps aux
apprunn+ 7938 0.0 3.3 48840 20328 ? Ss Mar19 1:06 /usr/bin/python /usr/local/bin/gunicorn --bind 0.0.0.0:8080 prime-flask-server
apprunn+ 21662 0.0 3.9 69868 24032 ? S Mar20 0:05 /usr/bin/python /usr/local/bin/gunicorn --bind 0.0.0.0:8080 prime-flask-server
Jump onto the Container-Optimized OS (COS) machine, cos-vm
, where we have docker running the container. COS is an optimized operating system with small OS footprint, which is part of what makes it secure to run container workloads. It has cloud-init and has Docker runt time preinstalled. This system on its own could be great to run several containers that did not need to be run on a platform that provided higher levels of reliability.
gcloud compute ssh cos-vm --zone us-west1-c
We can also run ps aux
on the host and see the prime-flask running, but notice docker and container references:
root 626 0.0 5.7 496812 34824 ? Ssl Mar19 0:14 /usr/bin/docker run --rm --name=flaskservice -p 8080:8080 gcr.io/migration-to-containers/prime-flask:1.0.2
root 719 0.0 0.5 305016 3276 ? Sl Mar19 0:00 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 8080 -container-ip 172.17.0.2 -container-port 8080
root 724 0.0 0.8 614804 5104 ? Sl Mar19 0:09 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/mo
chronos 741 0.0 0.0 204 4 ? Ss Mar19 0:00 /usr/bin/dumb-init /usr/local/bin/gunicorn --bind 0.0.0.0:8080 prime-flask-server
chronos 774 0.0 3.2 21324 19480 ? Ss Mar19 1:25 /usr/local/bin/python /usr/local/bin/gunicorn --bind 0.0.0.0:8080 prime-flask-server
chronos 14376 0.0 4.0 29700 24452 ? S Mar20 0:05 /usr/local/bin/python /usr/local/bin/gunicorn --bind 0.0.0.0:8080 prime-flask-server
Also notice if we try to list the python path it does not exist:
ls /usr/local/bin/python
ls: cannot access '/usr/local/bin/python': No such file or directory
Docker list containers
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d147963ec3ca gcr.io/migration-to-containers/prime-flask:1.0.2 "/usr/bin/dumb-init …" 39 hours ago Up 39 hours 0.0.0.0:8080->8080/tcp flaskservice
Now we can exec a command to see running process on the container:
docker exec -it $(docker ps |awk '/prime-flask/ {print $1}') ps aux
PID USER TIME COMMAND
1 apprunne 0:00 /usr/bin/dumb-init /usr/local/bin/gunicorn --bind 0.0.0.0:
6 apprunne 1:25 {gunicorn} /usr/local/bin/python /usr/local/bin/gunicorn -
17 apprunne 0:05 {gunicorn} /usr/local/bin/python /usr/local/bin/gunicorn -
29 apprunne 0:00 ps aux
Jump on to Kubernetes. In this environment we can run hundreds or thousands of pods that are groupings of containers. Kubernetes is the defacto standard for deploying containers these days. It offers high productivity, reliability, and scalability. Kubernetes makes sure your containers have a home, and if container happens to fail, it will respawn it again. You can have many machines making up the cluster and in so doing you can spread it across different zones ensuring availability, and resilience to potential issues.
Get cluster configuration:
gcloud container clusters get-credentials prime-server-cluster
Get pods running in the default namespace:
kubectl get pods
NAME READY STATUS RESTARTS AGE
prime-server-6b94cdfc8b-dfckf 1/1 Running 0 2d5h
See what is running on the pod:
kubectl exec $(kubectl get pods -lapp=prime-server -ojsonpath='{.items[].metadata.name}') -- ps aux
PID USER TIME COMMAND
1 apprunne 0:00 /usr/bin/dumb-init /usr/local/bin/gunicorn --bind 0.0.0.0:8080 prime-flask-server
6 apprunne 1:16 {gunicorn} /usr/local/bin/python /usr/local/bin/gunicorn --bind 0.0.0.0:8080 prime-flask-server
8 apprunne 2:52 {gunicorn} /usr/local/bin/python /usr/local/bin/gunicorn --bind 0.0.0.0:8080 prime-flask-server
19 apprunne 0:00 ps aux
As you can see from the last example, python application is now running in a container. The application can't access anything on the host. The container is isolated. It runs in a linux namespace and can't (by default) access files, the network, or other resources running on the VM, in containers or otherwise.
Now that the application is deployed, we can validate these three deployments by executing:
make validate
A successful output will look like this:
Validating Debian VM Webapp...
Testing endpoint http://35.227.149.80:8080
Endpoint http://35.227.149.80:8080 is responding.
**** http://35.227.149.80:8080/prime/10
The sum of all primes less than 10 is 17
The factorial of 10 is 3628800
Validating Container OS Webapp...
Testing endpoint http://35.230.123.231:8080
Endpoint http://35.230.123.231:8080 is responding.
**** http://35.230.123.231:8080/prime/10
The sum of all primes less than 10 is 17
The factorial of 10 is 3628800
Validating Kubernetes Webapp...
Testing endpoint http://35.190.89.136
Endpoint http://35.190.89.136 is responding.
**** http://35.190.89.136/prime/10
The sum of all primes less than 10 is 17
The factorial of 10 is 3628800
Of course, the IP addresses will likely differ for your deployment.
In a new console window, execute the following, replacing [IP_ADDRESS]
with the IP address and port from your validation output from the previous step. Note that the Kubernetes deployment runs on port 80
, while the other two deployments run on port 8080
:
ab -c 120 -t 60 http://<IP_ADDRESS>/prime/10000
ApacheBench (ab
) will execute 120 concurrent requests against the provided endpoint for 1 minute. The demo application's replica is insufficiently sized to handle this volume of requests.
This can be confirmed by reviewing the output from the ab
command. A Failed requests
value of greater than 0 means that the server couldn't respond successfully to this load:
One way to ensure that your system has capacity to handle this type of traffic is by scaling up. In this case, we would want to scale our service horizontally.
In our Debian and COS architectures, horizontal scaling would include:
- Creating a load balancer.
- Spinning up additional instances.
- Registering them with the load balancer.
This is an involved process and is out of scope for this demonstration.
For the third (Kubernetes) deployment the process is far easier:
kubectl scale --replicas 3 deployment/prime-server
After allowing 30 seconds for the replicas to initialize, re-run the load test:
ab -c 120 -t 60 http://<IP_ADDRESS>/prime/10000
Notice how the Failed requests
is now 0. This means that all of the 10,000+ requests were successfully answered by the server:
When you are finished with this example you will want to clean up the resources that were created so that you avoid accruing charges:
$ make teardown
It will run terraform destroy
which will destroy all of the resources created for this demonstration.
For additional information see: Embarking on a Journey Towards Containerizing Your Workloads
Occasionally the APIs take a few moments to complete. Running make validate
immediately could potentially appear to fail, but in fact the instances haven't finished initializing. Waiting for a minute or two should resolve the issue.
The setup of this demo does take up to 15 minutes. If there is no error the best thing to do is keep waiting. The execution of make create
should not be interrupted.
If you do get an error, it probably makes sense to re-execute the failing script. Occasionally there are network connectivity issues, and retrying will likely work the subsequent time.
This is not an officially supported Google product