Skip to content

Latest commit

 

History

History
494 lines (400 loc) · 28.8 KB

hetero_nn_tutorial.md

File metadata and controls

494 lines (400 loc) · 28.8 KB

Hetero-NN Tutorial

In a hetero-federated learning (vertically partitioned data) setting, multiple parties have different feature sets for the same common user samples. Federated learning enables these parties to collaboratively train a model without sharing their actual data. In FATE-2.0 we introduce our brand new Hetero-NN framework which allows you to quickly set up a hetero federated NN learning task. Since our framework is developed based on pytorch and transformers, it will be easy for you seamlessly integrate your existing dataset, models into our framework.

In this tutorial, we will show you how to run a Hetero-NN task under FATE-2.0 locally without using a FATE-Pipeline. You can refer to this example for local model experimentation, algorithm modification, and testing. Besides, in FATE-2.0 we provides two protection strategis: the SSHE and the FedPass. We will show you how to use them in this tutorial.

Setup Hetero-NN Step by Step

To run a Hetero-NN task, several steps are needed:

  1. Import required classes in a new python script
  2. Prepare data, datasets, models, loss and optimizers for guest side and host side
  3. Configure training parameters; initialize a hetero-nn model; set protection strategy
  4. Prepare the trainer
  5. Run the training script

Import Required Classes

In FATE-2.0, our neural network (NN) framework is constructed on the foundations of PyTorch and transformers libraries. This integration facilitates the incorporation of existing models and datasets into federated training. In our HeteroNN module, we use HeteroNNTrainerGuest and HeteroNNTrainerHost to train the model on guest and host side respectively. They are develop based on huggingface trainer so you can specify the training argument in the same way, via TrainingAruuments class.

We also provide a HeteroNNModelGuest and HeteroNNModelHost to wrap the top/bottom model and aggregate layer and provide a unified interface for the trainer. You can define your own bottom/top model structure and pass them to the HeteroNNModelGuest and HeteroNNModelHost. We offer two protion strategies: SSHE and FedPass. You can specify them in the HeteroNNModelGuest and HeteroNNModelHost with SSHEArgument and FedPassArgument.

import torch as t
from fate.arch import Context
from fate.ml.nn.hetero.hetero_nn import HeteroNNTrainerGuest, HeteroNNTrainerHost, TrainingArguments
from fate.ml.nn.model_zoo.hetero_nn_model import HeteroNNModelGuest, HeteroNNModelHost
from fate.ml.nn.model_zoo.hetero_nn_model import SSHEArgument, FedPassArgument, TopModelStrategyArguments

Tabular Data Example with SSHE

Here we show you an example of using our NN framework, it is a binary classification task whose features are tabular data. You can download our example data from:

And place them in the same directory with your python script.

In this example we will use the SSHEStrategy to protect the data, thus a sshe aggregate layer will be responsible for aggregate the forwards of guest and host side and propagate the gradients back to guest and host side.

Fate Context

FATE-2.0 uses a context object to configure the running environment, including party setting(guest, host and theirs party ids). We can create a context object by calling the create_context function.

def create_ctx(party, session_id='test_fate'):
    parties = [("guest", "9999"), ("host", "10000")]
    if party == "guest":
        local_party = ("guest", "9999")
    else:
        local_party = ("host", "10000")
    context = create_context(local_party, parties=parties, federation_session_id=session_id)
    return context

If we run our task with launch() (we will explain later), it can automatically handle the context creation, this chapter will introduce the concept of context and show you how to do it manually.

Prepare

Before starting training, as in PyTorch, we first define the model structure, prepare data, choose a loss function, and instantiate an optimizer. The following code demonstrates the preparation of data, datasets, models, loss, and optimizers. In a hetero-neural network (Hetero-NN) setting, which differs from a homogeneous (homo) federated learning scenario, features and models are divided, with each party managing its own segment. The code uses 'ctx' to differentiate guest and host codes: the guest has labels and 10 features, thus it creates top/model models, while the host, with 20 features and no label, only creates a bottom model. During the initialization of HeteroNNGuestModel and HeteroNNHostModel, SSHEArgument is passed to build a secure share and homomorphic encryption (SSHE) aggregate layer during training, safeguarding the forward and backward processes.

Similar to using a HuggingFace trainer, TrainingArgument is used for setting training parameters. Note that Hetero-NN currently does not support multi-GPU training, and the SSHE layer is incompatible with GPU training

Once models, datasets are prepared, we can now start the training process.

def get_setting(ctx):

    from fate.ml.nn.dataset.table import TableDataset
    # prepare data
    if ctx.is_on_guest:
        ds = TableDataset(to_tensor=True)
        ds.load("./breast_hetero_guest.csv")

        bottom_model = t.nn.Sequential(
            t.nn.Linear(10, 8),
            t.nn.ReLU(),
        )
        top_model = t.nn.Sequential(
            t.nn.Linear(8, 1),
            t.nn.Sigmoid()
        )
        model = HeteroNNModelGuest(
            top_model=top_model,
            bottom_model=bottom_model,
            agglayer_arg=SSHEArgument(
                guest_in_features=8,
                host_in_features=8,
                out_features=8,
                layer_lr=0.01
            )
        )

        optimizer = t.optim.Adam(model.parameters(), lr=0.01)
        loss = t.nn.BCELoss()

    else:
        ds = TableDataset(to_tensor=True)
        ds.load("./breast_hetero_host.csv")
        bottom_model = t.nn.Sequential(
            t.nn.Linear(20, 8),
            t.nn.ReLU(),
        )

        model = HeteroNNModelHost(
            bottom_model=bottom_model,
            agglayer_arg=SSHEArgument(
                guest_in_features=8,
                host_in_features=8,
                out_features=8,
                layer_lr=0.01
            )
        )
        optimizer = t.optim.Adam(model.parameters(), lr=0.01)
        loss = None

    args = TrainingArguments(
        num_train_epochs=3,
        per_device_train_batch_size=256
    )

    return ds, model, optimizer, loss, args

Run in hetero-federated mode

we add the train() function to initialize trainer for guest and host seperately and add run() function as the entrance for launching the task. The run() function will be called by launch() function in the end of the script. Below is the full code.

import torch as t
from fate.arch import Context
from fate.ml.nn.hetero.hetero_nn import HeteroNNTrainerGuest, HeteroNNTrainerHost, TrainingArguments
from fate.ml.nn.model_zoo.hetero_nn_model import HeteroNNModelGuest, HeteroNNModelHost
from fate.ml.nn.model_zoo.hetero_nn_model import SSHEArgument, FedPassArgument, TopModelStrategyArguments



def train(ctx: Context, 
          dataset = None, 
          model = None, 
          optimizer = None, 
          loss_func = None, 
          args: TrainingArguments = None, 
          ):
    
    if ctx.is_on_guest:
        trainer = HeteroNNTrainerGuest(ctx=ctx,
                                       model=model,
                                       train_set=dataset,
                                       optimizer=optimizer,
                                       loss_fn=loss_func,
                                       training_args=args
                                       )
    else:
        trainer = HeteroNNTrainerHost(ctx=ctx,
                                      model=model,
                                      train_set=dataset,
                                      optimizer=optimizer,
                                      training_args=args
                                    )

    trainer.train()
    return trainer


def predict(trainer, dataset):
    return trainer.predict(dataset)

def get_setting(ctx):

    from fate.ml.nn.dataset.table import TableDataset
    # prepare data
    if ctx.is_on_guest:
        ds = TableDataset(to_tensor=True)
        ds.load("./breast_hetero_guest.csv")

        bottom_model = t.nn.Sequential(
            t.nn.Linear(10, 8),
            t.nn.ReLU(),
        )
        top_model = t.nn.Sequential(
            t.nn.Linear(8, 1),
            t.nn.Sigmoid()
        )
        model = HeteroNNModelGuest(
            top_model=top_model,
            bottom_model=bottom_model,
            agglayer_arg=SSHEArgument(
                guest_in_features=8,
                host_in_features=8,
                out_features=8,
                layer_lr=0.01
            )
        )

        optimizer = t.optim.Adam(model.parameters(), lr=0.01)
        loss = t.nn.BCELoss()

    else:
        ds = TableDataset(to_tensor=True)
        ds.load("./breast_hetero_host.csv")
        bottom_model = t.nn.Sequential(
            t.nn.Linear(20, 8),
            t.nn.ReLU(),
        )

        model = HeteroNNModelHost(
            bottom_model=bottom_model,
            agglayer_arg=SSHEArgument(
                guest_in_features=8,
                host_in_features=8,
                out_features=8,
                layer_lr=0.01
            )
        )
        optimizer = t.optim.Adam(model.parameters(), lr=0.01)
        loss = None

    args = TrainingArguments(
        num_train_epochs=3,
        per_device_train_batch_size=256
    )

    return ds, model, optimizer, loss, args


def run(ctx):
    ds, model, optimizer, loss, args = get_setting(ctx)
    trainer = train(ctx, ds, model, optimizer, loss, args)
    pred = predict(trainer, ds)
    if ctx.is_on_guest:
        # print("pred:", pred)
        # compute auc here
        from sklearn.metrics import roc_auc_score
        print('auc is')
        print(roc_auc_score(pred.label_ids, pred.predictions))
    

if __name__ == '__main__':
    from fate.arch.launchers.multiprocess_launcher import launch
    launch(run)

Save the code as a python script named 'hetero_nn.py' and run it with the following command:

python  hetero_nn.py --parties guest:9999 host:10000 --log_level INFO

Here is the partial outputs of the consle:

[15:16:49] INFO     [Rank:0] disabled tracing                                                                                                                                                                                                                                                _trace.py:31
           INFO     [Rank:0] sample id column not found, generate sample id from 0 to 569                                                                                                                                                                                                    table.py:139
label is None
           INFO     [Rank:0] use "y" as label column                                                                                                                                                                                                                                         table.py:150
[15:16:49] INFO     [Rank:1] disabled tracing                                                                                                                                                                                                                                                _trace.py:31
           INFO     [Rank:1] sample id column not found, generate sample id from 0 to 569                                                                                                                                                                                                    table.py:139
label is None
           INFO     [Rank:1] found no "y"/"label"/"target" in input table, no label will be set                                                                                                                                                                                              table.py:153
           INFO     [Rank:0] ***** Running training *****                                                                                                                                                                                                                                 trainer.py:1706
           INFO     [Rank:0]   Num examples = 569                                                                                                                                                                                                                                         trainer.py:1707
           INFO     [Rank:0]   Num Epochs = 3                                                                                                                                                                                                                                             trainer.py:1708
           INFO     [Rank:0]   Instantaneous batch size per device = 256                                                                                                                                                                                                                  trainer.py:1709
           INFO     [Rank:0]   Total train batch size (w. parallel, distributed & accumulation) = 256                                                                                                                                                                                     trainer.py:1712
           INFO     [Rank:0]   Gradient Accumulation steps = 1                                                                                                                                                                                                                            trainer.py:1713
           INFO     [Rank:0]   Total optimization steps = 9                                                                                                                                                                                                                               trainer.py:1714
           INFO     [Rank:0]   Number of trainable parameters = 97                                                                                                                                                                                                                        trainer.py:1715
           INFO     [Rank:1] ***** Running training *****                                                                                                                                                                                                                                 trainer.py:1706
           INFO     [Rank:1]   Num examples = 569                                                                                                                                                                                                                                         trainer.py:1707
           INFO     [Rank:1]   Num Epochs = 3                                                                                                                                                                                                                                             trainer.py:1708
           INFO     [Rank:1]   Instantaneous batch size per device = 256                                                                                                                                                                                                                  trainer.py:1709
           INFO     [Rank:1]   Total train batch size (w. parallel, distributed & accumulation) = 256                                                                                                                                                                                     trainer.py:1712
           INFO     [Rank:1]   Gradient Accumulation steps = 1                                                                                                                                                                                                                            trainer.py:1713
           INFO     [Rank:1]   Total optimization steps = 9                                                                                                                                                                                                                               trainer.py:1714
           INFO     [Rank:1]   Number of trainable parameters = 168                                                                                                                                                                                                                       trainer.py:1715
{'loss': 0.7817, 'learning_rate': 0.01, 'epoch': 1.0}
[15:17:13] INFO     [Rank:0] {'loss': 0.7817, 'learning_rate': 0.01, 'epoch': 1.0, 'step': 3}                                                                                                                                                                                         trainer_base.py:429
{'loss': 0.0, 'learning_rate': 0.01, 'epoch': 1.0}
[15:17:13] INFO     [Rank:1] {'loss': 0.0, 'learning_rate': 0.01, 'epoch': 1.0, 'step': 3}                                                                                                                                                                                            trainer_base.py:429
{'loss': 0.5714, 'learning_rate': 0.01, 'epoch': 2.0}
[15:17:30] INFO     [Rank:0] {'loss': 0.5714, 'learning_rate': 0.01, 'epoch': 2.0, 'step': 6}                                                                                                                                                                                         trainer_base.py:429
{'loss': 0.0, 'learning_rate': 0.01, 'epoch': 2.0}
[15:17:30] INFO     [Rank:1] {'loss': 0.0, 'learning_rate': 0.01, 'epoch': 2.0, 'step': 6}                                                                                                                                                                                            trainer_base.py:429
{'loss': 0.4975, 'learning_rate': 0.01, 'epoch': 3.0}
[15:17:48] INFO     [Rank:0] {'loss': 0.4975, 'learning_rate': 0.01, 'epoch': 3.0, 'step': 9}                                                                                                                                                                                         trainer_base.py:429
{'train_runtime': 58.4774, 'train_samples_per_second': 29.191, 'train_steps_per_second': 0.154, 'train_loss': 0.616881701681349, 'epoch': 3.0}
           INFO     [Rank:0] {'train_runtime': 58.4774, 'train_samples_per_second': 29.191, 'train_steps_per_second': 0.154, 'total_flos': 0.0, 'train_loss': 0.616881701681349, 'epoch': 3.0, 'step': 9}                                                                             trainer_base.py:429
           INFO     [Rank:0] ***** Running Prediction *****                                                                                                                                                                                                                               trainer.py:3154
           INFO     [Rank:0]   Num examples = 569                                                                                                                                                                                                                                         trainer.py:3156
           INFO     [Rank:0]   Batch size = 8                                                                                                                                                                                                                                             trainer.py:3159
{'loss': 0.0, 'learning_rate': 0.01, 'epoch': 3.0}
[15:17:48] INFO     [Rank:1] {'loss': 0.0, 'learning_rate': 0.01, 'epoch': 3.0, 'step': 9}                                                                                                                                                                                            trainer_base.py:429
{'train_runtime': 58.5118, 'train_samples_per_second': 29.174, 'train_steps_per_second': 0.154, 'train_loss': 0.0, 'epoch': 3.0}
           INFO     [Rank:1] {'train_runtime': 58.5118, 'train_samples_per_second': 29.174, 'train_steps_per_second': 0.154, 'total_flos': 0.0, 'train_loss': 0.0, 'epoch': 3.0, 'step': 9}                                                                                           trainer_base.py:429
           INFO     [Rank:1] ***** Running Prediction *****                                                                                                                                                                                                                               trainer.py:3154
           INFO     [Rank:1]   Num examples = 569                                                                                                                                                                                                                                         trainer.py:3156
           INFO     [Rank:1]   Batch size = 8                                                                                                                                                                                                                                             trainer.py:3159
[15:18:07] INFO     [Rank:1] Total: 76.9601s, Driver: 18.8432s(24.48%), Federation: 57.9809s(75.34%), Computing: 0.1361s(0.18%)                                                                                                                                                           _profile.py:279
auc is
0.9712488769092542

Image Data Example with FedPass and Single Bottom Model

To execute an image classification task with the FedPass protection strategy, a few modifications to the settings are required. In our example, the guest possesses only the labels, while the host holds the image data. Consequently, the guest configures a top model (without a bottom model), and the host sets up a bottom model.

We employ the FedPass strategy, detailed in 'FedPass: Privacy-Preserving Vertical Federated Deep Learning with Adaptive Obfuscation'. This approach enhances privacy in neural networks by integrating private passports for adaptive obfuscation. It incorporates a 'passport layer' that alters scale and bias in response to these private passports, thus offering robust privacy protection without compromising on model performance.

Let us replace the get_setting() function in the previous example with the following code:

def get_setting(ctx):

    from fate.ml.nn.dataset.table import TableDataset
    import torchvision

    # define model
    from torch import nn
    from torch.nn import init

    class ConvBlock(nn.Module):
        def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, bias=True, norm_type=None,
                    relu=False):
            super().__init__()

            self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=bias)
            self.norm_type = norm_type

            if self.norm_type:
                if self.norm_type == 'bn':
                    self.bn = nn.BatchNorm2d(out_channels)
                elif self.norm_type == 'gn':
                    self.bn = nn.GroupNorm(out_channels // 16, out_channels)
                elif self.norm_type == 'in':
                    self.bn = nn.InstanceNorm2d(out_channels)
                else:
                    raise ValueError("Wrong norm_type")
            else:
                self.bn = None

            if relu:
                self.relu = nn.ReLU(inplace=True)
            else:
                self.relu = None

            self.reset_parameters()

        def reset_parameters(self):
            init.kaiming_normal_(self.conv.weight, mode='fan_out', nonlinearity='relu')

        def forward(self, x, scales=None, biases=None):
            x = self.conv(x)
            if self.norm_type is not None:
                x = self.bn(x)
            if scales is not None and biases is not None:
                x = scales[-1] * x + biases[-1]

            if self.relu is not None:
                x = self.relu(x)
            return x
    
    # host top model
    class LeNetBottom(nn.Module):
        def __init__(self):
            super(LeNetBottom, self).__init__()
            self.layer0 = nn.Sequential(
                ConvBlock(1, 8, kernel_size=5),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(2, 2)
            )

        def forward(self, x):
            x = self.layer0(x)
            return x

    # guest top model
    class LeNetTop(nn.Module):

        def __init__(self, out_feat=84):
            super(LeNetTop, self).__init__()
            self.pool = nn.MaxPool2d(2, 2)
            self.fc1 = nn.Linear(16 * 4 * 4, 120)
            self.fc1act = nn.ReLU(inplace=True)
            self.fc2 = nn.Linear(120, 84)
            self.fc2act = nn.ReLU(inplace=True)
            self.fc3 = nn.Linear(84, out_feat)

        def forward(self, x_a):
            x = x_a
            x = self.pool(x)
            x = x.view(x.size(0), -1)
            x = self.fc1(x)
            x = self.fc1act(x)
            x = self.fc2(x)
            x = self.fc2act(x)
            x = self.fc3(x)
            return x
        
    # fed simulate tool
    from torch.utils.data import Dataset

    class NoFeatureDataset(Dataset):
        def __init__(self, ds):
            self.ds = ds
        def __len__(self):
            return len(self.ds)
        def __getitem__(self, item):
            return [self.ds[item][1]]
        
    class NoLabelDataset(Dataset):
        def __init__(self, ds):
            self.ds = ds
        def __len__(self):
            return len(self.ds)
        def __getitem__(self, item):
            return [self.ds[item][0]]


    # prepare mnist data
    train_data = torchvision.datasets.MNIST(root='./',
                                            train=True, download=True, transform=torchvision.transforms.ToTensor())
    
    if ctx.is_on_guest:
        
        model = HeteroNNModelGuest(
            top_model=LeNetTop(),
            top_arg=TopModelStrategyArguments(
                protect_strategy='fedpass',
                fed_pass_arg=FedPassArgument(
                    layer_type='linear',
                    in_channels_or_features=84,
                    hidden_features=64,
                    out_channels_or_features=10,
                    passport_mode='multi',
                    activation='relu',
                    num_passport=1000,
                    low=-10
                )
            )
        )
        optimizer = t.optim.Adam(model.parameters(), lr=0.01)
        loss = t.nn.CrossEntropyLoss()
        ds = NoFeatureDataset(train_data)

    else:

        model = HeteroNNModelHost(
            bottom_model=LeNetBottom(),
            agglayer_arg=FedPassArgument(
                layer_type='conv',
                in_channels_or_features=8,
                out_channels_or_features=16,
                kernel_size=(5, 5),
                stride=(1, 1),
                passport_mode='multi',
                activation='relu',
                num_passport=1000
            )
        )
        optimizer = t.optim.Adam(model.parameters(), lr=0.01)
        loss = None
        ds = NoLabelDataset(train_data)

    args = TrainingArguments(
        num_train_epochs=3,
        per_device_train_batch_size=256,
        disable_tqdm=False
    )

    return ds, model, optimizer, loss, args


def run(ctx):
    ds, model, optimizer, loss, args = get_setting(ctx)
    trainer = train(ctx, ds, model, optimizer, loss, args)
    pred = predict(trainer, ds)

In this configuration, we utilize the LeNet model both as the bottom and top models. The dataset is sourced from torchvision.datasets.MNIST. We use FedPassArgument to establish the FedPass aggregate layer. It's important to note that the FedPass argument for the bottom model is set using agg_layer_arg, and for the top model using top_arg. Both models are equipped with FedPass protection: during training, random passports are generated, which obfuscate the forward hidden features and backward gradients.

Another key aspect is the use of NoFeatureDataset and NoLabelDataset to encapsulate the dataset. This approach reflects the scenario where the guest holds only labels and the host possesses only features. This simplification aids in effectively simulating the federated learning environment.

The task can be submitted using the same command as in the previous example:

python  hetero_nn.py --parties guest:9999 host:10000 --log_level INFO