-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathstep2.tex
183 lines (167 loc) · 9.73 KB
/
step2.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
In step 2, we will develop new algorithms and techniques that address
energy consumption for mixed critical real-time systems such as
automotive computing. In both cases, we plan to leverage two key
insights about heterogeneous architectures like the ARM big.LITTLE
presented in Figure~\ref{fig:x264-motivation-tradeoffs}:
\begin{itemize}\itemsep 0pt \parskip 0pt
\item Their energy efficiency increases as they slow down.
\item Dark silicon provides reserve compute capacity that can only be
used for short amounts of time.
\end{itemize}
From these insights we will develop two techniques, each providing a
different capability:
\begin{itemize}\itemsep 0pt \parskip 0pt
\item \textbf{Pace-to-sprint}: The goal of this technique is to reduce
energy consumption for hard real-time workloads. Where traditional
techniques allocate for worst case and then idle if work is
completed early (i.e., race-to-idle
\cite{Carroll13,LeSueur11,Imes2014,HotPower}), pace-to-sprint will
develop scheduling techniques to spend as much time as possible in
slower, more energy-efficient states.
\item \textbf{Energy Guarantees}: The goal of this technique will be
to guarantee energy consumption for mixed critical workloads; i.e.,
workloads that contain jobs with both hard timing constraints and
best effort requirements. Specifically, we will use the
unsustainable, but high performance configurations (to the right of
the dashed line in Figure~\ref{fig:pwr-perf-odroid}) to schedule hard
real-time tasks and then use low-power sustainable configurations
(to the left of the dashed line in Figure~\ref{fig:pwr-perf-odroid}) to
schedule other tasks while the processor cools down.
\end{itemize}
\subsubsection{Pace-to-Sprint}
We will develop pace-to-sprint techniques to minimize energy
consumption for workloads with hard real-time requirements. The key
insight here is that allocating for worst case execution time wastes
energy, yet many application's have average-case behavior
significantly less demanding than their worst case behavior. On our
target systems, allocating for worst case is not practical, as such an
allocation would violate thermal constraints (see Figure
\ref{fig:pwr-perf-odroid}). Where traditional approaches would start with
worst case allocation and transition to an idle state when they
finish, in this approach we will begin execution in an
energy-efficient state. If the current job is not a worst-case job,
then it will finish in this energy-efficient mode. As it is
executing, if its time in the energy efficient state exceeds some
predetermined threshold, then we will transition to a thermally
unsustainable, yet fast resource allocation to ensure timing deadlines
are met.
More formally, assume a set of jobs starts at time $0$ and must
complete $W$ units of work by time $t$. The system supports $C$
configurations $c \in \{0,\dots,C-1\}$, each of which has a
computation rate $s_c$ and a power consumption of $p_c$. These
configurations represent settings for all configurable components in
the system (e.g., DRAM speed, processor speed, and number of active
cores). Here, we assume these values are known based on the results
of the schedulability analysis work we will perform in Step 1. The
system has a unique \emph{idle} configuration $c = 0$ with $s_0 = 0$
and $p_0 = p_{idle}$, where $p_{idle}$ is a system dependent (and
workload independent) value representing the system's power
consumption when not executing a computation. We assume configuration
$C-1$ represents making all resources available to a workload. The
goal is to assign a time $0 \le t_c \le t$ to each machine
configuration. Note that configuration $c$ will contribute $t_c \cdot
s_c$ work at a cost of $p_c \cdot t_c$ energy. We have used similar
formulations in our prior work \cite{LEO,POET,kim-cpsna}. Thus, the
problem can be
expressed as:
\begin{wrapfigure}{l}{0.45\textwidth}
%\begin{figure}
\begin{align*}
minimize~& \sum_c t_c \cdot p_c \tag{1} \label{eqn:1}\\
subject~to~\\
& \sum_c t_c \cdot s_c = W \tag{2} \label{eqn:2}\\
& \sum_c t_c = t \tag{3} \label{eqn:3}\\
& 0 \le t_c \le t,~for~c = 0,\dots,C-1 \tag{4} \label{eqn:4}
\end{align*}
%\end{figure}
\end{wrapfigure}
Equations \ref{eqn:1}--\ref{eqn:4} assign values for all $t_c$ to
minimize total energy consumption (Equation \ref{eqn:1}) subject to the
constraints that the total work accomplished is equal to the required
work $W$ (Equation \ref{eqn:2}) and the deadline is met (Equation
\ref{eqn:3}). The final constraint ensures that the time spent in any
configuration is non-negative.
The traditional, race-to-idle approach represents a heuristic solution
to this optimization which uses only two configurations: $C-1$, where
all resources are available and $0$, which idles the machine. Our
proposed approach is to develop new heuristics that beat race-to-idle.
In the past we have developed heuristics that are appropriate to soft
real-time schedules \cite{POET,PTRADE}. In the proposed work, we will
extend these heuristics and develop new algorithms to ensure hard
real-time deadlines.
Specifically, we will look for solutions that use the configuration
$e$ as much as possible, where $e$ represents the most energy
efficient configuration; i.e., $\forall c \,\,\, r_e /p_e \ge
r_c/p_c$. We will determine how long the system can stay in that
configuration before it risks violating timing and we will then
transition to a faster (and less energy efficient configuration) to
ensure that timing constraints are not violated. We will examine
algorithms for determining when to switch both statically and
dynamically. Two insights drive this work: 1) when jobs finish faster
than worst case, we will not have wasted energy, because we will have
been operating in the most energy efficient configuration and 2) when
jobs are in danger of violating timing we can switch to a
computationally intensive configuration that will ensure safety.
\subsubsection{Energy Guarantees for Mixed Critical Workloads}
We will develop techniques to provide energy consumption guarantees
for systems with both hard and soft real-time requirements. The basic
intuition in this step will be to divide the tasks into two groups.
For hard real-time tasks, we will guarantee deadlines are met. For
all other tasks we consider two groups: soft real-time where latency
is bounded and best effort where we simply complete as much work as
possible, but offer no guarantees.
We will realize this approach by dividing scheduling periods into two
fixed-sized quanta. In the first, we will schedule hard real-time
tasks requiring high power consumption. In the second quanta,
we will determine the remaining average power consumption that will
meet the energy goal given the remaining time. During this second
time quantum, we will set the system to a resource configuration that
is guaranteed not to exceed the required power and schedule as many
additional tasks as possible. For soft real-time tasks, we will
develop new, \emph{energy-aware} admission algorithms that will only
allow new soft real-time tasks to enter the system if their latency
bound can be met. For best-effort tasks, we will simply admit as many
as possible -- after all hard and soft real-time tasks have been
scheduled -- and suspend them when needed.
More formally, we will meet an energy budget $E$ for a scheduling
period by dividing this period into two quanta: $t_h$ for scheduling
hard real-time tasks and $t_s$ for scheduling soft real-time tasks.
Using results from step 1, we will configure the system to use
combinations of big and LITTLE cores that are guaranteed to schedule
the hard real-time tasks and measure system energy consumption $E_h$.
We then know the remaining energy available for the remaining tasks is
$E_s = E - E_h$. Since the time allocated for these remaining tasks
is fixed, we simply determine the power consumption that respects the
energy goal $P_s = E_s/t_s$. We will then configure the system to a
resource usage that is guaranteed not to exceed this power
consumption. At that point we will admit soft real-time tasks that
can be scheduled with these resources and finally admit any best
effort tasks once all real-time tasks have been scheduled.
\emph{We believe this technique will provide a new capability for
embedded real-time systems: guaranteeing both hard real-time and
energy consumption. Providing both is essential for cyber physical
systems that rely on battery power.}
We will measure the effectiveness of this new capability using two
metrics:
\begin{itemize}\itemsep 0pt \parskip 0pt
\item \textbf{Energy Consumption:} We will instrument boards with
commercially available energy sensors. We expect that
pace-to-sprint should exhibit lower energy consumption than existing
techniques. While specific savings will vary with workload, our
preliminary results (Figure \ref{fig:x264-motivation-tradeoffs})
show that up to $2 \times$ energy reduction is possible compared to
the commonly used race-to-idle heuristic. When evaluating our
ability to provide energy guarantees, we will measure the number of
violations per workload, where a violation means that a scheduling
period exceeded its budgeted energy.
\item \textbf{Schedulability:} We will evaluate pace-to-sprint by
ensuring that all hard deadlines are indeed met. In addition, we
will evaluate our ability to provide energy guarantees by
determining how many of the soft real-time tasks we can successfully
execute. Clearly, we could execute none and simply idle the
processor, but our goal is to execute as many as possible without
violating the energy constraints. We will compare our approach to a
static scheme that does not dynamically monitor energy and adjust
resource usage. We hope to show that our approach greatly increases
schedulability, allowing more work to be done on the same platform.
\end{itemize}