Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency verification #23

Open
wants to merge 55 commits into
base: main
Choose a base branch
from
Open

Dependency verification #23

wants to merge 55 commits into from

Conversation

libretto
Copy link
Collaborator

Added support for dependency verification of external dependencies, (including known dependencuies and built-in types).

Know dependencies are organized as an index of known data types (built-in, google, confluent).

@sujayinstaclustr
Copy link

sujayinstaclustr commented Aug 5, 2022

@libretto As discussed yesterday due to conflict in PR #1 from Aiven upstream main, you have all the changes from PR #1 in PR # 2 . Is that correct?

@libretto
Copy link
Collaborator Author

libretto commented Aug 8, 2022

@libretto As discussed yesterday due to conflict in PR #1 from Aiven upstream main, you have all the changes from PR #1 in PR # 2 . Is that correct?

Yes

Copy link

@jjaakola-aiven jjaakola-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compatibility API endpoint is missing support for references.

What is the need for the proto files of known types? Are all of those used?

I would like to see more comprehensive unit testing and integration testing. E.g. dependency verifier does not have tests. As a technical debt the protobuf parser does not have tests, the changes here are minor but cannot verify for regression.

Commit history needs a cleanup.

I also would appreciate that Instaclustr does a review of this PR.

@@ -159,3 +300,21 @@ def to_schema(self) -> str:

def compare(self, other: "ProtobufSchema", result: CompareResult) -> CompareResult:
self.proto_file_element.compare(other.proto_file_element, result)

def reslove_dependencies(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def reslove_dependencies(self):
def resolve_dependencies(self):

if type(schema).__name__ != "str":
raise IllegalArgumentException("Non str type of schema string")
self.dirty = schema
self.cache_string = ""
self.schema_reader = schema_reader

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the schema as a model of single schema, i.e. separate schema and schema reference validation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -344,6 +346,7 @@ def _handle_msg_schema(self, key: dict, value: Optional[dict]) -> None:
schema_id = value["id"]
schema_version = value["version"]
schema_deleted = value.get("deleted", False)
schema_references = value.get("references", None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use empty array here.
And add it to schema dictionary at lines 385-390. Then it can be expected to behave as a list always.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in SchemaRegistry if the schema has no references they do not store empty references list in kafka so None there is more logical then empty list.

Comment on lines +394 to +397
if schema_version in subjects_schemas:
LOG.info("Updating entry subject: %r version: %r id: %r", schema_subject, schema_version, schema_id)
else:
LOG.info("Adding entry subject: %r version: %r id: %r", schema_subject, schema_version, schema_id)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary move.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which move?

ref_str = reference_key(ref["subject"], ref["version"])
referents = self.referenced_by.get(ref_str, None)
if referents:
LOG.info("Adding entry subject referenced_by : %r", ref_str)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log is identical to one in the else branch.

@@ -97,15 +105,148 @@ def option_element_string(option: OptionElement) -> str:
return f"option {result};\n"


class Dependency:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To own module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines 170 to 172
# result: DependencyVerifierResult = self.verify_ciclyc_dependencies()
# if not result.result:
# return result

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented, not used?


# def verify_ciclyc_dependencies(self) -> DependencyVerifierResult:

# TODO: add recursion detection

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be really nice to have.


import json

if TYPE_CHECKING:
from karapace.schema_reader import KafkaSchemaReader

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema models module should not be dependent on storage logic.

Copy link
Collaborator Author

@libretto libretto Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

references = schema_data.get("references")
parsed_schema = ProtobufSchema(schema, references)
self.dependencies[name] = Dependency(name, subject, version, parsed_schema)
except Exception as e:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very broad exception, what are the conditions and what exceptions are expected?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@libretto
Copy link
Collaborator Author

libretto commented Aug 31, 2022

Compatibility API endpoint is missing support for references.

Compatibility is not supported yet but it is under development and must be released on the next PR.

What is the need for the proto files of known types? Are all of those used?

So actually we used them once in the stage of generation of known dependencies. So current KnownDependency class is based on these proto files. If in the future version of these files will be changed we can track it and apply changes to the KnownDependency class.

I would like to see more comprehensive unit testing and integration testing. E.g. dependency verifier does not have tests. As a technical debt the protobuf parser does not have tests, the changes here are minor but cannot verify for regression.

Yes, the dependency verifier has no unit test, I will add it. But the protobuf parser actually has unit tests and it is already in aiven/karapace master branch.

Commit history needs a cleanup.

I would cleanup history before merge with aiven git repository. I suppose we can rebase all commits to root there before merge.

I also would appreciate that Instaclustr does a review of this PR.

I will be reviewing from Instaclustr side as well.

Copy link

@jjaakola-aiven jjaakola-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit more of reviewing on this. Some comments.

When an InvalidReference exception is raised the error message returned could be more helpful. I get back Invalid PROTOBUF schema. Error: Provided schema is not valid, e.g. in case where the reference is not found.

This needs more testing of different layouts of schema, chaining of schemas, evolution of schemas (adding new field that references another schema) etc.

subject_data = self.schema_reader.subjects.get(subject)
schema_data = subject_data["schemas"][version]
schema = schema_data["schema"].schema_str
references = schema_data.get("references")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When chaining schemas I think this fails.
Consider:

S1, root no references
S2, ref S1
S3, ref S2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will not fails I just added the test

assert not any(x != y for x, y in zip(myjson, referents))

res = await registry_async_client.delete("subjects/customer/versions/1")
assert res.status_code == 404

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gut feeling is that 409 Conflict would be better status code, but I am not sure of CSR returned value. This needs to be looked that it would match CSR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that CSR returns status_code 422, error_code 42206 and message One or more references exist to the schema {magic=1,keytype=SCHEMA,subject=<SUBJECT>,version=<VER>}.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also return this error code but with another message. l will change the message to the SR message

@jjaakola-aiven
Copy link

Please make a PR to https://github.com/aiven/karapace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants