Design of CIL Algorithms and the base class #1064
Replies: 2 comments
-
This is an example from Pytorch for the [Stochastic Gradient Descent].(https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD) In the |
Beta Was this translation helpful? Give feedback.
-
I believe @gfardell @jakobsj @paskino discussed a similar topic for the Thanks for bringing this up, it is a great topic for the developer guidelines. The agreement is that for each class there are essential, and non-essential parameters. The non-essential can be further be divided in often configured and advanced parameters:
To create an instance of a class, the creator of a class should require the essential and often-configured parameters as named parameters. It should not accept positional arguments For all iterative algorithms I'd argue that Looking at the Algorithm base class, the parameters are CIL/Wrappers/Python/cil/optimisation/algorithms/Algorithm.py Lines 39 to 66 in 843b899 Trying to answer all you questions:
In the case of FDK the only essential parameter is the |
Beta Was this translation helpful? Give feedback.
-
I think it is time to decide the design style for our algorithms and the base class
Algorithm
. At the moment, we have the following algorithms:Very soon, more algorithms will be added in CIL so it is urgent to decide the style that we want for our users.
In order to define our algorithms, we use 2 methods
__init__
andset_up
. The__init__
calls theset_up
method with the same signature. In most or all of the algorithms, the__init__
method does not do anything. In practice, using thekwargs
in the signature of__init__
, we have access to two importantkwargs
from the baseAlgorithm
class:max_iteration
,update_objective_interval
.Let's focus on a specific example, e.g., the
GD
algorithmGradientDescent
In the
GD
class, we have:To configure the Gradient Descent algorithm, we need 3 things:
Function
classx^{n+1} = x^{n} - \gamma^{n}\nabla f(x^{n})
See for example GradientDescent.
In general (non convex/strongly convex objective), the initial point is very important. Also, the step size is important for the convergence speed and we certainly need a function that is differentiable. Finally, for an algorithm the number of iterations is also an imporant parameter.
At the moment, these arguments are by default
None
which in my opinion is wrong. These should be required parameters.Let's continue with the
kwargs
. In practice, we have 2 types ofkwargs
.kwargs
that are used for the corresponding algorithm. For example inGD
, we have:that are used in the armijo_rule and also
that are used in the
should_stop
method ofGD
that basically overrides theshould_stop
of theAlgorithm
base class.kwargs
from theAlgorithm
base class, e.g.,max_iteration
,update_objective_interval
.At the moment the UI for
GD
is :or
Note: In the above example,
rate
is used askwargs
but there is no actualrate
in theGD
class. Therefore, it is not used, the correct name isstep_size
. We need to be careful what is passed in kwargs and if it is used. For example, we need to check for the allowed kwargs, e.g.,atol
,rtol
,alpha
,beta
.Below some questions:
Question 1: What do we considered as required parameters for an algorithm.
Question 2: How do we configure required parameters for an algorithm?**
Question 3: What do we considered as optional parameters(kwargs) for an algorithm.
Question 4: What do we considered as optional parameters(kwargs) for the algorithm base class.
Question 5: How do we configure kwargs parameters? What do we want for the UI of an algorithm?**
Personally, provided that we check for the allowed kwargs (of the algorithm and its base), I like the following UI:
Another option :
I will continue with the discussion adding more examples.
Beta Was this translation helpful? Give feedback.
All reactions