diff --git a/README.md b/README.md index 5d3da47..95ef587 100644 --- a/README.md +++ b/README.md @@ -50,7 +50,7 @@ vHost ratio is assumed to be 1 if not given. You are free to supply only the utilization or as many additional parameters that the model supports. The model will then be retrained on the new configuration on the spot. -Typically the model gets more accurate the more parameters you can supply. +Typically the model gets more accurate the more parameters you can supply. Please see the *Assumptions & Limitations* part at the end to get an idea how accurate the model will be in different circumstances. # Background @@ -384,6 +384,13 @@ selection against the one from Interact DC. Without Hyperparameter Tuning when comparing the available variables in the cloud they are about the same. +# Assumptions & Limitations +- The model was trained on the SPECpower dataset which almost exclusively includes compute focussed machines. This means it will not be accurate for memory-heavy machines like database servers or ML machines that tend to use GPUs/TPUs or even ASICS +- The main input variable for the model is CPU utilization. This metric is only reliable if the system frequencies do not change much. See our in depth article about [usefulness of CPU Utilization as a metric](https://www.green-coding.berlin/case-studies/cpu-utilization-usefulness/) +- SPECPower machines tend to be rather tuned and do not necessarily represent the reality of current datacenter configurations. So you are likely to get a too small value than a too high value. This was also detailed in the analysis earlier in the README, where we talk about the turned off features. +- If you are in a shared resource system like a Virtual Machine the model will assume a linear fraction of the load. This is debateable and might need improvement. See the discussion here: https://github.com/green-coding-berlin/spec-power-model/issues/4 + + # TODO - vhost operating point