-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload verify file #Issue16 #58
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure to run black and isort before pushing to the repo. You can set up the pre-commit hook if that helps.
librarian_server/api/admin.py
Outdated
if instance is None: | ||
raise HTTPException(status_code=404, detail="File instance not found in the specified store.") | ||
|
||
return {"verified": True} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of just saying 'yes', please return the newly computed checksums for all of the instances!
librarian_server/api/admin.py
Outdated
# Check if the file exists in the specified store and matches the given properties | ||
instance = session.query(Instance).filter_by(file_id=file.id, store_id=store.id).one_or_none() | ||
if instance is None: | ||
raise HTTPException(status_code=404, detail="File instance not found in the specified store.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not great practice to return a 404 here, that is confusing as it is the same thing as the endpoint not being there. You should probably return a 400.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please don't raise a HTTPException; change the response code like in the other endpoints, and return our custom failure model.
librarian_server/api/admin.py
Outdated
|
||
store = session.query(StoreMetadata).filter_by(name=request.store_name).one_or_none() | ||
if store is None: | ||
raise HTTPException(status_code=404, detail="Store not found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as below here.
tests/server_unit_test/test_admin.py
Outdated
request = { | ||
"name": "test_file_to_verify.txt", | ||
"size": get_size_from_path(full_path), | ||
"checksum": get_md5_from_path(full_path), | ||
"store_name": "local_store", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to exercise the pydantic request model here, instead of writing the JSON yourself.
In general, I think we also want to make the checksum-checking happen on the client side. We should ask the remote server for fresh checksums (checked against its own internals), that we can check ourselves. We do not want to send 'our' checksum to the remote server. |
tests/server_unit_test/test_admin.py
Outdated
@@ -134,3 +136,94 @@ def test_add_file_no_store_exists(test_client): | |||
response = AdminRequestFailedResponse.model_validate_json(response.content) | |||
|
|||
assert response.reason == "Store not_a_store does not exist." | |||
|
|||
|
|||
def test_verify_file_success(test_client, test_server, garbage_file, test_orm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the appropriate fixtures as discussed for these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JBorrow you mean change the verification request without passing size and checksum because remote librarian check against its own database without requiring these parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. The remote librarian has all the information needed to perform this check.
librarian_server/api/admin.py
Outdated
checksums_and_sizes = [] | ||
for instance in instances: | ||
path = instance.path | ||
checksum = get_md5_from_path(path) | ||
size = get_size_from_path(path) | ||
checksums_and_sizes.append( | ||
{ | ||
"store_id": str(instance.store_id), | ||
"checksum": checksum, | ||
"size": str(size), | ||
} | ||
) | ||
return {"verified": True, "checksums_and_sizes": checksums_and_sizes} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never return unvalidated JSON. Please write a pydantic model for your return.
hera_librarian/client.py
Outdated
name: str, | ||
size: int, | ||
checksum: str, | ||
store_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I noted in the main comment thread, we do not necessarily have access to anything other than name; please have the remote librarian check aganist its own database instead of passing a size and checksum.
Store names are entirely transparent to users (they do not and should not care which store their files have been uploaded to), please remove that option. The remote server should check all instances.
language_version: python3.11 | ||
language_version: python3.12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change before we merge.
hera_librarian/client.py
Outdated
size : int | ||
Size in bytes of the file to verify. | ||
checksum : str | ||
MD5 checksum of the file to verify. | ||
store_name : str | ||
The name of the store where the file resides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs need to be changed.
librarian_server/api/admin.py
Outdated
|
||
file = session.query(File).filter_by(name=request.name).one_or_none() | ||
if file is None: | ||
raise HTTPException(status_code=400, detail="File not found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refactor to use the failure response models like the other endpoints.
librarian_server/api/admin.py
Outdated
instances = ( | ||
session.query(Instance) | ||
.join(File, File.name == Instance.file_name) | ||
.filter(File.name == file.name) | ||
.all() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would simply do:
file = session.get(File, file.name)
if file is None:
return 400
instances = file.instances
tests/server_unit_test/test_admin.py
Outdated
Tests that a file's properties match the database record. | ||
""" | ||
setup = test_server[2] | ||
store = setup.store_directory | ||
full_path = store / "test_file_to_verify.txt" | ||
# Create the file in the store | ||
shutil.copy2(garbage_file, full_path) | ||
|
||
# Get the session and ORM models from the test_server fixture | ||
session = test_server[1]() | ||
|
||
# Create or find a store in the database | ||
db_store = ( | ||
session.query(test_orm.StoreMetadata).filter_by(name="local_store").first() | ||
) | ||
if not db_store: | ||
# Create a new store if not found (adjust attributes as necessary) | ||
db_store = test_orm.StoreMetadata( | ||
name="local_store", description="A local store for testing" | ||
) | ||
session.add(db_store) | ||
session.commit() | ||
|
||
# Create file and instance records in the database | ||
file_data = open(full_path, "rb").read() | ||
db_file = test_orm.File.new_file( | ||
filename="test_file_to_verify.txt", | ||
size=len(file_data), | ||
checksum=hashlib.md5(file_data).hexdigest(), | ||
uploader="test_uploader", # Adjust as necessary | ||
source="test_source", # Adjust as necessary | ||
) | ||
instance = test_orm.Instance.new_instance( | ||
path=str(full_path), | ||
file=db_file, | ||
store=db_store, | ||
deletion_policy="ALLOWED", # Adjust as necessary | ||
) | ||
session.add_all([db_file, instance]) | ||
session.commit() | ||
|
||
# Assume the file has been added to the database already; here we simulate the verification request | ||
verify_request = AdminVerifyFileRequest( | ||
name="test_file_to_verify.txt", | ||
) | ||
|
||
response = test_client.post_with_auth( | ||
"/api/v2/admin/verify_file", json=verify_request.dict() | ||
) | ||
|
||
assert response.status_code == 200 | ||
|
||
# Clean up: Delete the added records and file | ||
session.delete(instance) | ||
session.delete(db_file) | ||
session.commit() | ||
full_path.unlink() # Remove the file from the filesystem | ||
|
||
session.close() | ||
response_data = response.json() | ||
assert response_data["verified"] == True | ||
assert isinstance(response_data["checksums_and_sizes"], list) | ||
assert len(response_data["checksums_and_sizes"]) > 0 | ||
assert "checksum" in response_data["checksums_and_sizes"][0] | ||
assert "size" in response_data["checksums_and_sizes"][0] | ||
assert "store_id" in response_data["checksums_and_sizes"][0] | ||
|
||
|
||
def test_verify_file_failure(test_client, test_server, test_orm): | ||
""" | ||
Tests that verification fails when file properties do not match. | ||
""" | ||
# Assume a file "mismatched_file.txt" exists in the database but with different properties | ||
request = { | ||
"name": "mismatched_file.txt", | ||
} | ||
|
||
response = test_client.post_with_auth("/api/v2/admin/verify_file", json=request) | ||
|
||
assert response.status_code == 400 | ||
assert response.json() == {"detail": "File not found."} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the provided fixtures, this contains too much unnecessary replication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JBorrow you mean "which provided fixtures"? when you say replication, you mean for the whole file or just this specific function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ones that we discussed in slack; test_server_with_valid_file
and test_server_with_invalid_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a number of changes that are required before merging. Thanks for your work on this so far.
language_version: python3.11 | ||
language_version: python3.12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change before we merge.
sp.add_argument("size", type=int, help="Size in bytes of the file to verify.") | ||
sp.add_argument("checksum", help="MD5 checksum of the file to verify.") | ||
sp.add_argument("store_name", help="The name of the store where the file resides.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These items are no longer needed.
response = self.post( | ||
endpoint="admin/verify_file", | ||
json={ | ||
"name": name, | ||
}, | ||
) | ||
|
||
if response.status_code != 200: | ||
raise LibrarianError( | ||
"Failed to verify the file due to an unexpected error." | ||
) | ||
return response.json() | ||
|
||
except LibrarianHTTPError as e: | ||
if e.status_code == 404: | ||
raise LibrarianError("File or store not found for verification.") | ||
elif e.status_code == 400: | ||
raise LibrarianError( | ||
"File verification failed due to mismatched properties." | ||
) | ||
else: | ||
raise LibrarianError(f"Unknown error during file verification. {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before, please make sure you serialize/deserialize everything to pydantic models. In fact, I don't think json
is even an accepted parameter in our post function! Please can you also expand your test coverage to make sure that these are checked?
for instance in instances: | ||
path = instance.path | ||
checksum = get_md5_from_path(path) | ||
size = get_size_from_path(path) | ||
checksums_and_sizes.append( | ||
{ | ||
"store_id": str(instance.store_id), | ||
"checksum": checksum, | ||
"size": str(size), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will want to check here if the instance is available, or it will not be there and we will get a shock! We should probably check that we have at least one instance on the librarian?
We frequently 'remove' instances (for example during SneakerNet transfers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by Instance? We check the file instance in the above code!!
def test_verify_file_success(capsys): | ||
# Mock the client and its verify_file_row method | ||
client_mock = MagicMock() | ||
client_mock.verify_file_row.return_value = {"verified": True} | ||
|
||
# Mock the get_client function to return the mock client | ||
with patch("hera_librarian.cli.get_client", return_value=client_mock): | ||
# Create a mock args object | ||
args = MagicMock() | ||
args.conn_name = "test_conn" | ||
args.name = "test_name" | ||
|
||
# Call the verify_file function | ||
cli.verify_file(args) | ||
|
||
# Assert that the verify_file_row method was called with the correct arguments | ||
client_mock.verify_file_row.assert_called_once_with(name=args.name) | ||
|
||
# Check the printed output | ||
captured = capsys.readouterr() | ||
assert "File verification successful." in captured.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look at the other unit tests in the library. You are not correctly using the mocked client objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any issue with this!!, can you be more specific?
def setup_test_file(session, test_orm, full_path): | ||
# Create or find a store in the database | ||
db_store = ( | ||
session.query(test_orm.StoreMetadata).filter_by(name="local_store").first() | ||
) | ||
if not db_store: | ||
db_store = test_orm.StoreMetadata( | ||
name="local_store", description="A local store for testing" | ||
) | ||
session.add(db_store) | ||
session.commit() | ||
|
||
# Create file and instance records in the database | ||
file_data = open(full_path, "rb").read() | ||
db_file = test_orm.File.new_file( | ||
filename="test_file_to_verify.txt", | ||
size=len(file_data), | ||
checksum=hashlib.md5(file_data).hexdigest(), | ||
uploader="test_uploader", | ||
source="test_source", | ||
) | ||
instance = test_orm.Instance.new_instance( | ||
path=str(full_path), | ||
file=db_file, | ||
store=db_store, | ||
deletion_policy="ALLOWED", | ||
) | ||
session.add_all([db_file, instance]) | ||
session.commit() | ||
|
||
return db_file, instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code already exists; please don't duplicate it. You should look at the fixtures that we have already defined in the various conftest.py
files and their usage throughout the test suite.
Right now you are not using the client to interact with a server, you are just mocking its methods. So you are not actually testing the server at all.
We already have a very extensive test suite that has functionality designed to do exactly this. You can see tests/integration_test/test_upload_file.py for example. That way the code in your client function is actually executed, and you would see that it does not work.
… On Apr 3, 2024, at 3:57 PM, Mohammad Hasib Sheikh ***@***.***> wrote:
@shaikhhasib commented on this pull request.
In tests/client_unit_test/test_cli.py <#58 (comment)>:
> +def test_verify_file_success(capsys):
+ # Mock the client and its verify_file_row method
+ client_mock = MagicMock()
+ client_mock.verify_file_row.return_value = {"verified": True}
+
+ # Mock the get_client function to return the mock client
+ with patch("hera_librarian.cli.get_client", return_value=client_mock):
+ # Create a mock args object
+ args = MagicMock()
+ args.conn_name = "test_conn"
+ args.name = "test_name"
+
+ # Call the verify_file function
+ cli.verify_file(args)
+
+ # Assert that the verify_file_row method was called with the correct arguments
+ client_mock.verify_file_row.assert_called_once_with(name=args.name)
+
+ # Check the printed output
+ captured = capsys.readouterr()
+ assert "File verification successful." in captured.out
I do not see any issue with this!!, can you be more specific?
—
Reply to this email directly, view it on GitHub <#58 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3Z6G7F7PWOLOD4KHQNLR3Y3QKEHAVCNFSM6AAAAABEFOZEWGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSNZXGA4TMNRWHE>.
You are receiving this because you were mentioned.
|
Yes, but if you take a look at the definition of Instance (orm/instance.py), you will see that there is a field `available` that tells you whether or not an instance of a file is actually available. There may be 100 instances associated with a file, but only one marked available. It doesn’t make much sense to me to try to verify file instances that we have already marked as unavailable; those checks would of course fail as we probably have moved the file!
… On Apr 3, 2024, at 3:55 PM, Mohammad Hasib Sheikh ***@***.***> wrote:
@shaikhhasib commented on this pull request.
In librarian_server/api/admin.py <#58 (comment)>:
> + for instance in instances:
+ path = instance.path
+ checksum = get_md5_from_path(path)
+ size = get_size_from_path(path)
+ checksums_and_sizes.append(
+ {
+ "store_id": str(instance.store_id),
+ "checksum": checksum,
+ "size": str(size),
+ }
What do you mean by Instance? We check the file instance in the above code!!
—
Reply to this email directly, view it on GitHub <#58 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3Z6G3HRXR5LUYUTZ67KDDY3QJ43AVCNFSM6AAAAABEFOZEWGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSNZXGA4TCMRVHA>.
You are receiving this because you were mentioned.
|
No description provided.