Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error details for failed remote job #370

Merged
merged 12 commits into from
Apr 20, 2020
Merged

Conversation

pauljxtan
Copy link
Contributor

@pauljxtan pauljxtan commented Apr 17, 2020

Context:

Currently, a failed remote job produces a generic error message that is difficult to diagnose. The job info response from the remote platform contains a meta field that is currently unused, but has some information that may be useful, e.g.,

{
    'error_code': 'hardware-message-error',
    'error_detail': 'The job failed because the message received from the hardware could not be understood.'
}

Description of the Change:

  • Add a meta property to the Job class. In the event of job failure, this dictionary will contain an error-code and error-detail.
  • Refresh the meta attribute (along with status) when Job.refresh() is called.
  • Add the contents of meta to the error message for a failed job.

Example:

  • Previous error message: "The remote job 928ee53e-677a-48c0-8ba5-a0566ba3e295 failed due to an internal server error. Please try again."
  • New error message: "The remote job 928ee53e-677a-48c0-8ba5-a0566ba3e295 failed due to an internal server error. Please try again. {'error_code': 'hardware-message-error', 'error_detail': 'The job failed because the message received from the hardware could not be understood.'}"

Benefits:

The added context in a copy-pastable error message is likely to make it easier to diagnose and debug job failures.

Possible Drawbacks:

The new error message is slightly verbose.

Related GitHub Issues:

N/A

@pauljxtan pauljxtan added the API label Apr 17, 2020
@pauljxtan pauljxtan marked this pull request as ready for review April 17, 2020 19:16
@pauljxtan pauljxtan marked this pull request as draft April 17, 2020 19:17
@pauljxtan pauljxtan added the WIP label Apr 17, 2020
@pauljxtan pauljxtan marked this pull request as ready for review April 17, 2020 20:53
@pauljxtan pauljxtan removed the WIP label Apr 17, 2020
Copy link
Collaborator

@lneuhaus lneuhaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks a lot for having implemented this so quickly. I did not see any issues in the code, though I am not extremely familiar with this part of the codebase.

Have you been able to run an end-to-end test of this?

@lneuhaus lneuhaus self-requested a review April 17, 2020 20:57
@codecov
Copy link

codecov bot commented Apr 17, 2020

Codecov Report

Merging #370 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #370   +/-   ##
=======================================
  Coverage   97.69%   97.70%           
=======================================
  Files          52       52           
  Lines        6478     6485    +7     
=======================================
+ Hits         6329     6336    +7     
  Misses        149      149           
Impacted Files Coverage Δ
strawberryfields/api/connection.py 100.00% <ø> (ø)
strawberryfields/engine.py 94.90% <ø> (ø)
strawberryfields/api/job.py 94.11% <100.00%> (+0.93%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5917bac...4c81ff2. Read the comment docs.

Copy link
Contributor

@jswinarton jswinarton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -548,7 +548,7 @@ def run(self, program: Program, *, compile_options=None, **kwargs) -> Optional[R
if job.status == "failed":
message = (
"The remote job %s failed due to an internal "
"server error. Please try again." % job.id
"server error. Please try again. %s" % (job.id, job.meta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use new-style formatting here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think this is an artifact of Python's logger expecting printf string formatting. At one point, this line must have been inside the self.log.error() function.

Now that it's been refactored out, it's best to update this to use new-style formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

@josh146 josh146 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@@ -548,7 +548,7 @@ def run(self, program: Program, *, compile_options=None, **kwargs) -> Optional[R
if job.status == "failed":
message = (
"The remote job %s failed due to an internal "
"server error. Please try again." % job.id
"server error. Please try again. %s" % (job.id, job.meta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think this is an artifact of Python's logger expecting printf string formatting. At one point, this line must have been inside the self.log.error() function.

Now that it's been refactored out, it's best to update this to use new-style formatting.

self._status = JobStatus(self._connection.get_job_status(self.id))
job_info = self._connection.get_job(self.id)
self._status = JobStatus(job_info.status)
self._meta = job_info.meta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea to log the retrieved metadata at the DEBUG level here? Combined with #360, this will allow retrieved job metadata to appear within the users logs (if they decide to configure it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, this has been added! Let me know if you think the message formatting needs to be changed.

@pauljxtan
Copy link
Contributor Author

Just FYI, @lneuhaus and I coordinated an end-to-end test for a deliberately failing job and didn't find any problems.

along with the job result if the job is newly completed.
"""
if self._status.is_final:
self.log.warning("A %s job cannot be refreshed", self._status.value)
return
self._status = JobStatus(self._connection.get_job_status(self.id))
job_info = self._connection.get_job(self.id)
self._status = JobStatus(job_info.status)
Copy link
Contributor

@antalszava antalszava Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically for me, the meta data belongs to the status of the Job. At least this attribute will change the similarly to the change in status and as far as I understand it, is meant to provide more details about the status of the Job.

However, making this change in abstraction might not be so easy (would involve restructuring JobStatus, etc.), so I'm happy as is now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with the general idea, and also that it would be a relatively big change, so it seems best to tackle that in the future 🙂

Comment on lines +550 to +551
"The remote job {} failed due to an internal "
"server error. Please try again. {}".format(job.id, job.meta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were previously made so such that they can be used for logging, but here the message is already pre-defined, so I guess format should work just fine as well?

JobStatus.COMPLETED
if self.request_count >= self.REQUESTS_BEFORE_COMPLETED
else JobStatus.QUEUED
)
return Job(id_="123", status=status, connection=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small dummy meta dictionary could be added here, and then later checked similarly to the status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, this has been added!

@antalszava
Copy link
Contributor

Thanks so much for this @pauljxtan, looks really good! 💯 Had a couple of suggestions which might be improvements to add, but I'm also happy with the current state.

Paul Tan and others added 2 commits April 20, 2020 14:55
Copy link
Contributor

@thisac thisac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 🚀

strawberryfields/api/job.py Outdated Show resolved Hide resolved
Co-Authored-By: Theodor <theodor@xanadu.ai>
@pauljxtan pauljxtan merged commit b497f0f into master Apr 20, 2020
@pauljxtan pauljxtan deleted the failed-job-error-details branch April 20, 2020 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants