Information flow control #561

gbryant-arm · 2022-11-09T11:59:44Z

gbryant-arm
Nov 9, 2022
Collaborator

Veracruz strives to defend against attacks where some participants colludes to uncover sensitive information. For example, a data provider and a result fetcher could collaborate by feeding various inputs to a program and observing the output in a black box style (fuzzing) to extract information about the program (e.g. private firewall rules, training data from a ML model). A program could also collude with a result fetcher and leak some input from a data provider.

Though there is no concrete solution against steganography (cf. #103), having some control over how much data egresses/ingresses each program (think a pipeline of programs) would be valuable in mitigating data leaks.

Example

A use case for information flow control is a ML workload (image classification). Let's assume each task is processed by a separate program, which admittedly simplifies the information flow:

Program 1: The input is uncompressed and formatted. Tricky to set a limit on the output here. We can assume that all the quantity of information in the input leaks into the output
Program 2: The uncompressed input is fed to a ML model. In neural networks, the size of the output is typically known in advance since they can be represented as a graph, hence setting a limit on the output is possible
Program 3: The model's output is post-processed (e.g. sorting, ordering, labelling). This requires access to a list of mappings between classes, but doesn't require access to the input or uncompressed input. This task is executed inside Veracruz to protect the confidentiality of the classes. The output can be bound based on the previous output's length and the longest class name

As you can see, the information flow is throttled at various points, which limits the amount of information about the initial input that can leak. Note that it would be much harder to apply that to a single monolithic program. Something like taint tracking could be used in that case.

Practical solutions

A practical and relatively simple solution to this problem is to enforce a limit on the amount of data a participant can read or write.
This could be enforced by a WASI hook on fd_pread()/fd_pwrite() on the WASM side, and a Linux feature (cgroups?) on the runtime (native modules) side. (@ShaleXIONG ?). Note: sandbox2 can limit the size of created files via set_rlimit_fsize().
In addition to that, we could further restrict data extraction via client requests (read_file()), which is typically requested by a client at the end of a computation.

ShaleXIONG · 2022-11-09T13:16:49Z

ShaleXIONG
Nov 9, 2022
Maintainer

it is a good use case and it is possible to implement them in Veracruz. Yet I think we can do some searching on if there is any FS already implement such features.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Information flow control #561

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Information flow control #561

gbryant-arm Nov 9, 2022 Collaborator

Example

Practical solutions

Replies: 1 comment

ShaleXIONG Nov 9, 2022 Maintainer

gbryant-arm
Nov 9, 2022
Collaborator

ShaleXIONG
Nov 9, 2022
Maintainer