This repository hosts the technical documentation and source code of our SGX-aware container orchestrator project. Its architecture and implementation are described in our paper "SGX-Aware Container Orchestration for Heterogeneous Clusters", to appear in the proceedings of the 38th IEEE International Conference on Distributed Computing Systems (ICDCS 2018). The paper also contains a broad evaluation of the scheduler itself, as well as micro-benchmarks relative to running SGX-enabled containers in a multi-tenant cloud.
Our SGX-aware orchestrator is based on Kubernetes. It allows to efficiently schedule SGX-enabled containers in a heterogeneous cluster of SGX- and non-SGX-enabled machines.
This effort is part of the SecureCloud project. SecureCloud has received funding from the European Union's Horizon 2020 research and innovation programme and was supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under grant agreement No 690111.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
If you use this work in your research, please cite our paper "SGX-Aware Container Orchestration for Heterogeneous Clusters".
@inproceedings{vaucher2018sgxaware,
author={S. Vaucher and R. Pires and P. Felber and M. Pasin and V. Schiavoni and C. Fetzer},
booktitle={2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)},
title={{SGX}-Aware Container Orchestration for Heterogeneous Clusters},
month={July},
year={2018},
pages={730-741},
doi={10.1109/ICDCS.2018.00076},
issn={2575-8411},
}
Also, if you use Stress-SGX as a workload, please also cite the corresponding paper "Stress-SGX: Load and Stress your Enclaves for Fun and Profit".
@inproceedings{vaucher2019stresssgx,
author={Vaucher, S{\'e}bastien and Schiavoni, Valerio and Felber, Pascal},
title={{Stress-SGX}: Load and Stress Your Enclaves for Fun and Profit},
booktitle={Networked Systems},
year={2019},
publisher={Springer International Publishing},
pages={358--363},
doi={10.1007/978-3-030-05529-5_24},
isbn={978-3-030-05529-5},
}
This repository is organized as such:
sgx-scheduler
|_ demo Demonstration utility script (for public presentation)
|_ deviceplugin Kubernetes-compatible device plugin
|_ docker-sgx Root image for SGX-enabled Docker containers
|_ docs Miscellaneous documentation files
|_ heapster Heapster monitoring framework (known-working commit)
|_ kubernetes Modified version of Kubernetes
| |_ pkg
| |_ kubelet
| |_ kuberuntime Communication of EPC limits
|_ metrics-probe Metrics probe running on each node
|_ results-parser Fetches and parses results right after an experiment ends
|_ runner Automated running of benchmarks/demonstrator
|_ scheduler Actual scheduler
|_ sgx-app-mem Sample SGX-enabled workload _(legacy)_
|_ sgx-driver Modified SGX driver that enforces EPC usage limits
|_ standard-app-mem Sample non-SGX workload based on stress-ng
|_ stress-sgx SGX-enabled stressing application, to be used as workload
|_ README.md The document that you are currently reading
In the following sections, we detail how to compile and deploy each component in order to end up with an SGX-compatible Kubernetes cluster.
We assume that the following components are ready to use:
- A Docker registry. In the following explanations, we will
refer to its address as
$docker_registry
. - A cluster of SGX- and non-SGX-enabled machines that meet the following criteria:
- Connected together to the same network
- With access to the Docker registry at
$docker_registry
- One machine is dedicated as Kubernetes master
- SGX-compatible machines are used as SGX-enabled nodes
- The Intel SGX Platform Software (PSW) must be disabled
- The remaining machines are used as standard nodes
In order to leverage our SGX-aware monitoring framework, we need to modify the SGX driver given by Intel. Additionally, our modifications are used to enforce EPC consumption limits in a per-pod basis. The modified driver has to be compiled and then installed on each SGX-enabled node of the cluster.
The following commands will compile, install and insert our custom module:
cd sgx-driver
make
sudo make install
sudo depmod
sudo modprobe isgx
We ship a custom version of Kubernetes to implement EPC utilisation limits. We refer to the official documentation given by Kubernetes' developers for building instructions. We use kubeadm to deploy our test cluster. We similarly refer to its official documentation for deployment guidance.
Some documentation of ours is also available in the docs/
directory.
In the following sections, we assume that the kubectl
command is available and configured to operate on the newly-deployed Kubernetes cluster.
The goal of the device plugin is to enable the execution of SGX applications on Kubernetes nodes. It fetches the number of trusted EPC pages available on each node, and communicates it to Kubelet.
In order to deploy the device plugin to your Kubernetes cluster, it suffices to create a DaemonSet using the following command:
kubectl apply -f sgx_deviceplugin.yml
Heapster is the standard monitoring layer of Kubernetes. We use it to collect regular metrics about the state of the cluster, which are then sent into InfluxDB.
The steps to deploy Heapster to the cluster are as follows:
- Modify the manifest of Grafana to use
NodePort
. Ingrafana.yaml
inside directoryheapster/deploy/kube-config/influxdb
, the following line has to be uncommented:spec: # ... type: NodePort # <>-- This line ports: - port: 80 # ...
- Start Heapster and deploy its RBAC rules
./heapster/deploy/kube.sh start kubectl create -f heapster/deploy/kube-config/rbac/heapster-rbac.yaml
The metrics probe component is built as a Docker image. The following
commands will build the image and push it to the Docker registry
specified as $docker_registry
.
cd metrics-probe
docker build -t ${docker_registry}/metrics-probe:1.5 .
docker push ${docker_registry}/metrics-probe:1.5
Our metric probe has to be deployed on all SGX nodes in the cluster. We leverage Kubernetes to automatically deploy this component.
cd metrics-probe
kubectl apply -f metrics-probe.yaml
Similarly to the metrics probe, the scheduler is deployed as a Kubernetes pod. To build it as a Docker image, the following commands need to be executed:
cd scheduler
docker build -t ${docker_registry}/efficient-scheduler:2.9 .
docker push ${docker_registry}/efficient-scheduler:2.9
Both strategies of the scheduler can be deployed at the same time using the following commands:
cd scheduler
kubectl apply -f spread-scheduler.yaml
kubectl apply -f binpack-scheduler.yaml
After all the steps mentioned above are completed, it becomes possible to deploy jobs towards the Kubernetes cluster. Standard Kubernetes APIs are used to interact with it.
To activate the scheduler for a specific pod as a user, the
schedulerName
key needs to be set in its pod specification. Therefore,
the following YAML description is sufficient to deploy an SGX-enabled
job in the cluster, scheduled using our SGX-aware scheduler.
spec:
template:
metadata:
name: my-workload
spec:
schedulerName: spread # or binpack
containers:
- name: my-container
image: my-image
resources:
requests:
intel.com/sgx: 5000
limits:
intel.com/sgx: 5000
It can be deployed using Kubernetes' standard tool kubectl:
kubectl apply -f workload.yml
As part of this demonstrator, we provide sample jobs that can be executed in the cluster. Their behaviour consists in allocating a particular amount of memory---standard or EPC, depending on the nature of the job---and repeatedly iterating on it to keep it marked as active. As they are based on Docker images, they can be built and deployed in the standard Docker-specific way:
docker build -t ${docker_registry}/sgx-app-mem:1.2 ./sgx-app-mem
docker build -t ${docker_registry}/standard-app-mem:1.2 ./standard-app-mem
docker push ${docker_registry}/sgx-app-mem:1.2
docker push ${docker_registry}/standard-app-mem:1.2
We consider the sample workloads described above as "legacy". Instead, we recommend the use of Stress-SGX as a stress workload. Further instructions are given in its own repository.
Stress-SGX is described in more detail in the associated paper "Stress-SGX: Load and Stress your Enclaves for Fun and Profit", to appear in the proceedings of the 6th Edition of The International Conference on NETworked sYStems (NETYS 2018).
For evaluation and demonstration purposes, we developed a collection of scripts that are able to deploy multiple jobs, with respect to a trace. Our demonstrator uses the Google Borg Trace. The trace was recorded in 2011 on a Google cluster of about machines. The nature of the jobs in the trace is undisclosed. We are not aware of any publicly available trace that would contain SGX-enabled jobs. Therefore, we arbitrarily designate a subset of trace jobs as SGX-enabled. Our scripts are able to insert various percentages of SGX jobs in the system.
The trace reports several metrics measured for the Google jobs. We extract the following metrics out of it: submission time, duration, assigned memory and maximal memory usage. The submission time is crucial to model the same arrival pattern of the jobs in our cluster. The run time of each job matches exactly the one reported in the trace. We use the assigned memory as the value advertised to Kubernetes when submitting the job to the system. However, the job will allocate the amount given in the maximal memory usage field. We believe this creates real-world-like behaviour w.r.t. the memory consumption advertised on creation compared to the memory that is actually used.
The trace specifies the memory usage of each job as a percentage of the
total memory available on Google's servers (without actually reporting
the absolute values). In our experiments, we set the memory usage of
SGX-enabled jobs by multiplying the memory usage factor obtained from
the trace to the total usable size of the EPC (93.5 MiB in our case). As
for standard jobs, we compute their memory usage by multiplying them to
32 GiB. We think that it yields amounts that match real-world values.
These numbers are defined in runner/runner.py
as epc_size_pages
and memory_size_bytes
.
Given the size of Google's cluster, we have to scale down the trace before being able to replay it on our own cluster setup. Our scripts can skip every nth job in the trace to bring it down to a reasonable number of jobs.
In the runner
directory, there are 2 scripts that can be used to
launch a batch of processes: runner.py
and superrunner.py
.
runner.py
is the base script; it launches multiple jobs fetched from
the trace. superrunner.py
uses runner.py
to launch multiple
experiments back-to-back, and makes use of our results-parser
to
produce ready-to-use data for production of plots.
We provide the subset of the Google Borg trace (from 6480s to 10080s) that we used to perform the experiments presented in our paper.
You can find the data in the runner/trace/task_1h.csv.xz
compressed file.
The file must be uncompressed before it can be given to the experiments runner.
The scripts can be run on any machine that is configured with the cluster's API endpoint in its kubectl configuration. Therefore, there is no building needed. The dependencies needed for the scripts can be installed using a Python virtual environment as follows:
cd runner
virtualenv -p /usr/bin/python3.6 --distribute .venv
. ./.venv/bin/activate
pip install -r requirements.txt
runner.py
accepts the following options:
usage: runner.py [-h] [-s [SCHEDULER]] [-k [SKIP]] [-x [SGX]] [-o [OUTPUT]] trace
Experiments runner
positional arguments:
trace Trace file to use
optional arguments:
-h, --help show this help message and exit
-s [SCHEDULER], --scheduler [SCHEDULER]
Name of the custom scheduler to use
-k [SKIP], --skip [SKIP]
Skip every nth job
-x [SGX], --sgx [SGX]
Proportion of SGX jobs between 0 (no SGX) and 1 (all SGX)
-o [OUTPUT], --output [OUTPUT]
Also output prints to file
superrunner.py
accepts the following options:
usage: superrunner.py [-h] -s SCHEDULER [SCHEDULER ...] -x SGX [SGX ...]
-a ATTACKER [ATTACKER ...] [-k [SKIP]] trace
Super Runner
positional arguments:
trace Trace file
optional arguments:
-h, --help show this help message and exit
-s SCHEDULER [SCHEDULER ...], --scheduler SCHEDULER [SCHEDULER ...]
Scheduler(s) to use
-x SGX [SGX ...], --sgx SGX [SGX ...]
Fraction(s) of SGX jobs
-a ATTACKER [ATTACKER ...], --attacker ATTACKER [ATTACKER ...]
Fraction(s) of memory allocated by attacker
-k [SKIP], --skip [SKIP]
Skip every nth job
After an execution of superrunner.py
, results are echoed in the console.
They can be saved by redirecting its output into a file.
Here is an excerpt from an output file.
Each line represents the metrics gathered from a given job that run in the cluster.
job_id,duration,requested_pages,actual_pages,is_sgx,runner_start_time,k8s_created_time,k8s_actual_start_time,k8s_actual_end_time
0,120,1849,1589,True,2017-12-11 15:38:06.553203,2017-12-11 15:38:06+00:00,2017-12-11 15:38:08+00:00,2017-12-11 15:40:08+00:00
2000,300,692,521,True,2017-12-11 15:40:06.653301,2017-12-11 15:40:06+00:00,2017-12-11 15:40:18+00:00,2017-12-11 15:45:19+00:00
4000,300,414,306,True,2017-12-11 15:40:06.668214,2017-12-11 15:40:06+00:00,2017-12-11 15:40:15+00:00,2017-12-11 15:45:16+00:00
10000,300,139,9,True,2017-12-11 15:40:06.694733,2017-12-11 15:40:06+00:00,2017-12-11 15:40:09+00:00,2017-12-11 15:45:09+00:00
12000,300,777,782,True,2017-12-11 15:40:06.705448,2017-12-11 15:40:06+00:00,2017-12-11 15:40:11+00:00,2017-12-11 15:45:12+00:00
14000,300,791,645,True,2017-12-11 15:40:06.715891,2017-12-11 15:40:06+00:00,2017-12-11 15:40:14+00:00,2017-12-11 15:45:14+00:00