Information flow control #561
gbryant-arm
started this conversation in
Ideas
Replies: 1 comment
-
it is a good use case and it is possible to implement them in Veracruz. Yet I think we can do some searching on if there is any FS already implement such features. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Veracruz strives to defend against attacks where some participants colludes to uncover sensitive information. For example, a data provider and a result fetcher could collaborate by feeding various inputs to a program and observing the output in a black box style (fuzzing) to extract information about the program (e.g. private firewall rules, training data from a ML model). A program could also collude with a result fetcher and leak some input from a data provider.
Though there is no concrete solution against steganography (cf. #103), having some control over how much data egresses/ingresses each program (think a pipeline of programs) would be valuable in mitigating data leaks.
Example
A use case for information flow control is a ML workload (image classification). Let's assume each task is processed by a separate program, which admittedly simplifies the information flow:
As you can see, the information flow is throttled at various points, which limits the amount of information about the initial input that can leak. Note that it would be much harder to apply that to a single monolithic program. Something like taint tracking could be used in that case.
Practical solutions
A practical and relatively simple solution to this problem is to enforce a limit on the amount of data a participant can read or write.
This could be enforced by a WASI hook on
fd_pread()
/fd_pwrite()
on the WASM side, and a Linux feature (cgroups?) on the runtime (native modules) side. (@ShaleXIONG ?). Note: sandbox2 can limit the size of created files viaset_rlimit_fsize()
.In addition to that, we could further restrict data extraction via client requests (
read_file()
), which is typically requested by a client at the end of a computation.Beta Was this translation helpful? Give feedback.
All reactions