Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data masking and filtering to the OPA plugin #16

Conversation

vagaerg
Copy link
Member

@vagaerg vagaerg commented Dec 21, 2023

trinodb#9787 Implement column masking and row level filtering to the proposed plugin in trinodb#19532
This PR adds row level filtering and column masking. It creates a TrinoColumn type, which is only used for these operations.

TrinoColumn is (perhaps counterintuitively) not used for FilterColumns for three easons:

  • API stability between the batch & non-batch modes: we don't want to send a full TrinoColumn object (which includes the catalog, schema and table names) per column when performing a FilterColumns operation as this would make the requests rather large
  • TrinoColumn may be changed to include additional information about column-specific operations in the future, which are not relevant to FilterColumns
  • Backwards compatibility

This PR also adds system tests for row level filtering & column masking.

The idea is to take this PR upstream once trinodb#19532 is merged

CC @mosiac1 as co-developer

- Simplify tests and remove some parameterization
- Make more fields final
- Remove unnecessary annotations
- Make tests more parallelizable
- Remove permissioning operations and replace them by a blanket
  allow/deny setting

And rebase - bump pom version
@vagaerg vagaerg changed the title Add open policy agent with masking and filtering Add data masking and filtering to the OPA plugin Dec 21, 2023
Copy link

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the awesome work and continuing on the implementation!
This is really helpful!

Left one relevant comment regarding batched API


import static java.util.Objects.requireNonNull;

public record TrinoColumn(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure whats the convention in Trino on javadoc, but it might sense to include your helpful comments on TrinoColumn here:

TrinoColumn is (perhaps counterintuitively) not used for FilterColumns for three easons:

  • API stability between the batch & non-batch modes: we don't want to send a full TrinoColumn object (which includes the catalog, schema and table names) per column when performing a FilterColumns operation as this would make the requests rather large
  • TrinoColumn may be changed to include additional information about column-specific operations in the future, which are not relevant to FilterColumns
  • Backwards compatibility

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense, will do!

public OpaContainer()
{
this.container = new GenericContainer<>(DockerImageName.parse("openpolicyagent/opa:latest-rootless"))
.withCommand("run", "--server", "--addr", ":%d".formatted(OPA_PORT))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Sorry to bother you with the same again, but we could test we don't fail on the decision_id field being present:

Suggested change
.withCommand("run", "--server", "--addr", ":%d".formatted(OPA_PORT))
.withCommand("run", "--server", "--addr", ":%d".formatted(OPA_PORT)
"--set", "decision_logs.console=true")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, sorry if I missed this on the other PR. Will add it!

@@ -709,6 +714,23 @@ public void checkCanDropFunction(SystemSecurityContext systemSecurityContext, Ca
OpaQueryInputResource.builder().function(TrinoFunction.fromTrinoFunction(functionName)).build());
}

@Override
public List<ViewExpression> getRowFilters(SystemSecurityContext context, CatalogSchemaTableName tableName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding getRowFilters and getColumnMask: Is my understanding correct that this is currently only implemented by having an OPA call per table (for row filter) / column (for column mask)?
Do you have any plans of implementing a batch variant as well?

Would you even go one step further to assume users using these two functionalities are advanced enough to justify only offering a batched API for code complexity reasons?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinged you on Slack to discuss a bit more, will update this PR comment once I understand the options available a bit better

@vagaerg
Copy link
Member Author

vagaerg commented Mar 4, 2024

This is now being pushed upstream

@vagaerg vagaerg closed this Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants