- Kubernetes 101
This page is mainly created for Skyscrapers customers that are currently being onboarded to the platform.
It assumes no prior knowledge for Kubernetes. It should technically be a much lighter and digestible version of the kubernetes documentation, with our Skyscrapers
flavor.
General target audience is aimed for software developers, or generally anyone that does not care to go in-depth towards infrastructure related topics.
Most of the samples are yaml
kubernetes manifests
. They define kubernetes objects, and they may be created via kubectl apply -f <nameofyamlfile>.yaml
.
Bearing in mind, that some example manifests here assume some dependencies.
"Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available."
This is how we like to think about kubernetes.
Imagine Kubernetes as unified software that, with a single API, allows us to orchestrate and manage containers across multiple servers seamlessly. This single API is the gateway through which developers and operations teams can deploy, scale, and monitor their applications regardless of the complexity or quantity of the services involved. It abstracts the underlying infrastructure, making it easier to manage resources, balance loads, and ensure resilience and scalability. Essentially, Kubernetes acts as a powerful, centralized control plane for containerized environments, simplifying many of the challenges associated with running applications at scale in a cloud-native ecosystem.
For more in-depth reading, check the official documentation for Kubernetes
Nodes are the underlying servers that run the kubernetes software. In the case of running on AWS, they are the EC2
instances that carry our pods.
A normal Kubernetes system is composed of one or more master nodes, and multiple worker nodes. If on AWS and using EKS, then the huge matter of scaling the master node and ensuring high availability is mitigated.
The most important things that we need to know about the nodes themselves:
- They can be either a physical or a virtual machine.
- They have the following components:
- Kubelet - This agent runs on each node, ensures that containers are running in a pod.
- Kube-proxy - This proxy handles networking inside and outside the cluster
- Container runtime - This software is responsible for running the containers themselves, docker is an example, but of course Kubernetes supports several other runtimes.
- Just remember that the nodes inside of a kubernetes cluster can be scaled out "virtually" indefinitely. We usually add logical limits to our autoscale groups (In the case of using cluster autoscaler) or Karpenter
nodepool
definitions to protect ourselves from extreme surges.
For more in-depth reading, check the official documentation for Nodes
Just like how mitochondria are the powerhouses of cells, pods are the powerhouses of kubernetes
Pods are The smallest deployable units created and managed by Kubernetes, which technically put are a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers.
Here is a YAML representation of a pod (pod.yaml
):
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: hello-world
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
env:
- name: HELLO
value: "World"
If I run the command kubectl apply -f pod.yaml
, I will get a pod
that runs nginx, specifically nginx:1.14.2
, and an env variable of HELLO=World
inside.
As we explained earlier of course, in the pod definition, each pod is not limited to run a single container, but we can run multiple containers with one pod.
For more in-depth reading, check the official documentation for Pods
apiVersion
- Which version of the Kubernetes API you're using to create this objectkind
- What kind of object you want to createmetadata
- Data that helps uniquely identify the object, including a name string, UID, and optional namespacespec
- What state you desire for the object
Alright, great, now we have our application running in a pod somewhere in the cluster, requests are coming in somehow, reaching my pod, and the pod is responding.
What happens if I will receive more traffic? we can always scale up of course, but we're always a single pod. Here comes the Deployment to save the day.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx:1.14.2
name: nginx
The Deployment is technically a Kubernetes object, that ensures we will always have the amount of pods we instruct it to. For example, if we instruct it that we want to have 3 replicas, the controller will make sure that we have 3 replicas at all times. For any reason if we get down to 2 replicas, whoosh a new replica will be created, that is the magic of the deployment.
In short the Deployment "Manages the deployment and scaling of a set of Pods, and provides declarative updates for Pods. You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to the desired state at a controlled rate."
This is highly likely to be your choice of workload.
For more in-depth reading, check the official documentation for deployments
Exactly like a deployment, main difference being, is that deployments generally are stateless, but if you will be running a stateful application (Which is not recommended by any means, unless if you're forced to.) then you have to choose a StatefulSet
In short, StatefulSet
Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
containers:
- name: nginx
image: registry.k8s.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "my-storage-class"
resources:
requests:
storage: 1Gi
For more in-depth reading, check the official documentation for StatefulSets
Not commonly used by end-user applications, but mostly used by us. Daemonsets
are very similar to a deployment, but they allow us to create a pod for "each" node. Use-cases are mostly related to operations.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
containers:
- name: fluentd-elasticsearch
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
volumeMounts:
- name: varlog
mountPath: /var/log
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
For more in-depth reading, check the official documentation for DaemonSets
A Job is a pod under the hood, that runs till completion and then terminates.
apiVersion: batch/v1
kind: Job
metadata:
name: sample
spec:
template:
spec:
containers:
- image: busybox:latest
name: sample
command: ["/bin/sh", "-c", "--"]
args: ["echo Hello > hello.txt"] # Infinite loop so we keep our container running
restartPolicy: Never
For more in-depth reading, check the official documentation for Jobs
A Cronjob is an object, that allows us to define a job that runs on a schedule.
apiVersion: batch/v1
kind: CronJob
metadata:
name: sample
spec:
jobTemplate:
metadata:
name: sample
spec:
template:
spec:
containers:
- image: nginx:latest
name: sample
command: ["/bin/sh", "-c", "--"]
args: ["echo Hello > hello.txt"] # Just echo hello into a new file
restartPolicy: OnFailure
schedule: "0 4 * * *" # Typical cron syntax here
For more in-depth reading, check the official documentation for CronJobs
An abstract way to expose an application running on a set of Pods as a network service, which defines a logical selection of Pods and a policy by which to access them.
Services allow you to define how the different parts of your application communicate with each other and with the outside world.
There are several different types of services, with different purposes, let's explore them.
- ClusterIP: This is the default service type. It provides a service internal to the cluster, that allows other applications in the cluster to access it. This would give the service a specific IP Address, that can be internally used, but not externally used.
- NodePort: A NodePort service exposes the service on each Node's IP at a static port (the NodePort). This exposes your service to the outside world (if your nodes themselves are public) on
<NodeIP>:<NodePort>
. - LoadBalancer: This service type automatically creates an external load balancer (
elb
in our case), that forwards external traffic to the Kubernetes service. - ExternalName: This service type maps a service to a DNS name.
Simply put, most of the time you will be following this pattern.
- Create a
Deployment
- Create a Service (of type
ClusterIP
) that binds to yourDeployment
- Create an ingress that points to your
Service
(more on this at the bottom)
For more in-depth reading, check the official documentation for Services
Exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by logical rules.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
labels:
app: acme
name: acme
namespace: acme
spec:
ingressClassName: nginx
rules:
- host: mywebsite.example.com
http:
paths:
- backend:
service:
name: acme # Needs to reference the service that points to our deployment
port:
number: 80
path: /
pathType: Prefix
For more in-depth reading, check the official documentation for Ingress
An application that can process and satisfy routing rules defined by Ingress objects, usually with a loadbalancer.
In our case, we use the Kubernetes nginx ingress controller. By default it creates a Service
of type LoadBalancer
.
We then when creating an Ingress
, define what class of Ingress to use, we can have multiple different Ingress
controllers. Some can be internal (using a private load balancer service) and some can be external.
For more in-depth reading, check the official documentation for Ingress Controllers
Provides a mechanism for isolating groups of resources within a single cluster
This is a logical distribution for your Kubernetes objects, while not all objects adhere to namespaces, most you will find do.
The default namespace is called default
, whenever we run our kubectl
commands we have to have our namespace defined with the -n
or --namespace
flag.
Example: kubectl get pods -n staging
get me all pods in the staging namespace.
If I want to set a specific default namespace, instead of typing -n staging
everytime, then I can do kubectl config set-context --current --namespace=staging
For more in-depth reading, check the official documentation for Namespaces
Used to store confidential data in key-value pairs.
They are technically only base64 encoded, and anyone that has read access on the secrets themselves will be able to read it, but it's a smart way to inject secret data into your workloads.
You can mount them as volumes, or inject them to the environment.
For more in-depth reading, check the official documentation for Secrets
Used to store non-confidential data in key-value pairs.
Same concept of Secrets
, only difference is they are not encoded.
Might be useful for configuration data, even simple ini
files, or conf
files.
For more in-depth reading, check the official documentation for ConfigMaps
Role Based Access Control
- When a user makes a request to the Kubernetes API, the Kubernetes API server determines what the user is trying to do (the verb), what resources they're trying to affect, and the namespace they're making the request in.
- The API server then checks the RoleBindings/ClusterRoleBindings to see if the user has a Role/ClusterRole that allows them to perform that action.
- If the user has the necessary permissions, the action is allowed; if not, the request is denied.
Some concepts explained in depth
- Roles and ClusterRoles:
- Role: A Role in Kubernetes is used to grant permissions within a specific namespace. It contains rules that represent a set of permissions. These permissions are purely additive (there are no "deny" rules).
- ClusterRole: A ClusterRole is like a Role, but it is cluster-wide. It allows you to grant the same permissions across all namespaces.
- RoleBindings and ClusterRoleBindings:
- RoleBinding: A RoleBinding grants the permissions defined in a Role to a set of users. The RoleBinding can only be used in the namespace where it is created.
- ClusterRoleBinding: A ClusterRoleBinding grants the permissions defined in a ClusterRole to a set of users across the entire cluster.
- Subjects:
- Subjects can be individual users, groups, or service accounts that are granted permissions by a RoleBinding or ClusterRoleBinding.
- Rules:
- Both Roles and ClusterRoles contain rules that define the operations and resources that are allowed. A rule specifies verbs (like get, list, create, delete), resources (like pods, services), and resource names.
For more in-depth reading, check the official documentation for RBAC
- Request comes in from the public internet to our Load Balancer (Created through the nginx ingress controller in this scenario)
- All requests go to the nginx ingress controller
- It checks if there are any ingresses defined and at what paths
- Nginx ingress controller forwards the request to the right ingress
- The ingress forwards the request to the right service
- The service forwards the request to our deployment
- Ensure that you already have your
VPN
credentials, and that it works. (Ask our customer success team on slack via @help) - Ensure that you have your
AWS
access set viaSSO
(Check your infrastructure documentation for that) - Run your
EKS
command to ensure you have your~/.kube/config
populated properly - Now you can run your
kubectl
commands.
Try running kubectl get pods
, if you got no errors, but a generic No resources found in default namespace.
then congrats, your setup works.
Some of the prominent open source platform components that we use:
- Nginx ingress controller: ingress controller for exposing applications
- External DNS: automatic DNS record creation
- Cert-manager: automatic certificate management
- Prometheus & AlertManager: monitoring, alerting and metrics collection
- Grafana: dashboarding and visualization of monitoring
- Grafana Loki: logging
- Keda: scaling
- Karpenter: cluster node scaling
- Other components not mentioned in this document
- Your infrastructure
GitHub
Repository athttps://github.com/skyscrapers/<customer-name>
- Skyscrapers documentation repository
- Skyscrapers Changelog
- Official Kubernetes documentation
- Official Helm documentation