0.22.0
Substra 0.22.0 - 2022-10-20
Main changes
- BREAKING CHANGE: the backend type is now set in the
Client
, the env variableDEBUG_SPAWNER
is not used anymore. Default value is deployed.
before:
export DEBUG_SPAWNER=subprocess
client = substra.Client(debug=True)
after:
client = substra.Client(backend_type=substra.BackendType.LOCAL_SUBPROCESS)
- BREAKING CHANGE:
schemas.ComputePlanSpec.clean_models
property is now removed, thetransient
property on tasks outputs should be used instead. - BREAKING CHANGE:
Model.category
field has been removed. - BREAKING CHANGE:
train
andpredict
methods of all substrafl algos now takes datasamples as argument instead of X and y. This is impacting the user code only if he or she overwrite those methods instead of using the_local_train
and_local_predict
methods. - BREAKING CHANGE: The result of the
get_data
method from the opener is automatically provided to the given dataset as__init__
arg instead of x and y within thetrain
andpredict
methods of allTorchAlgo
classes. The user dataset should be adapted accordingly:
from torch.utils.data import Dataset
class MyDataset(Dataset):
def __init__(self, x, y, is_inference=False) -> None:
...
class MyAlgo(TorchFedAvgAlgo):
def __init__(
self,
):
torch.manual_seed(seed)
super().__init__(
model=my_model,
criterion=criterion,
optimizer=optimizer,
index_generator=index_generator,
dataset=MyDataset,
)
should be replaced with
from torch.utils.data import Dataset
class MyDataset(Dataset):
def __init__(self, datasamples, is_inference=False) -> None:
...
class MyAlgo(TorchFedAvgAlgo):
def __init__(
self,
):
torch.manual_seed(seed)
super().__init__(
model=my_model,
criterion=criterion,
optimizer=optimizer,
index_generator=index_generator,
dataset=MyDataset,
)
- BREAKING CHANGE:
Algo.category
: do not rely on categories anymore, all algo categories will be returned asUNKNOWN
. - BREAKING CHANGE: Replaced
algo
byalgo_key
in ComputeTask.
GUI
- Improved user management: the last admin cannot be deleted anymore.
Substra
- Algo categories are not checked anymore in local mode. Validations based on inputs and outputs are sufficient.
- Pass substra-tools arguments via a file instead of the command line. This fixes an issue where compute plan would not run if there was too many data samples.
Substrafl
-
NOTABLE CHANGES due to breaking changes in substra-tools:
- The opener only exposes
get_data
andfake_data
methods. - The results of the above method is passed under the datasamples keys within the inputs dict arg of all tools methods (
train
,predict
,aggregate
,score
). - All method (
train
,predict
,aggregate
,score
) now takes a task_properties argument (dict
) in addition to inputs and outputs. - The rank of a task previously passed under the rank key within the inputs is now given in the
task_properties
dict under the rank key.
- The opener only exposes
This means that all opener.py file should be changed from:
import substratools as tools
class TestOpener(tools.Opener):
def get_X(self, folders):
...
def get_y(self, folders):
...
def fake_X(self, n_samples=None):
...
def fake_y(self, n_samples=None):
...
to:
import substratools as tools
class TestOpener(tools.Opener):
def get_data(self, folders):
...
def fake_data(self, n_samples=None):
...
This also implies that metrics has now access to the results of get_data
and not only get_y
as previously. The user should adapt all of his metrics file accordingly e.g.:
class AUC(tools.Metrics):
def score(self, inputs, outputs):
"""AUC"""
y_true = inputs["y"]
...
def get_predictions(self, path):
return np.load(path)
if __name__ == "__main__":
tools.metrics.execute(AUC())
could be replace with:
class AUC(tools.Metrics):
def score(self, inputs, outputs, task_properties):
"""AUC"""
datasamples = inputs["datasamples"]
y_true = ... # getting target from the whole datasamples
def get_predictions(self, path):
return np.load(path)
if __name__ == "__main__":
tools.metrics.execute(AUC())