-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugzilla bugs downloader #104
Conversation
Could you rebase on top of master? |
Codecov Report
@@ Coverage Diff @@
## master #104 +/- ##
==========================================
+ Coverage 87.86% 88.44% +0.58%
==========================================
Files 17 17
Lines 783 805 +22
Branches 92 94 +2
==========================================
+ Hits 688 712 +24
+ Misses 85 84 -1
+ Partials 10 9 -1
Continue to review full report at Codecov.
|
def _clean_signatures(signatures): | ||
clean_signatures = set() | ||
for sig in signatures.split('\r\n'): | ||
pos = sig.find('[@') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do a single substring if you do something like:
start_pos = ...
end_pos = ...
if start_pos != -1 and end_pos != -1:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget it, the pre-existing code was already doing this, so it's fine. We can change it later.
crashsimilarity/downloader.py
Outdated
'o2': 'isnotempty', | ||
'product': ['Firefox', 'Core']} | ||
|
||
key = ('bugzilla_bugs', json.dumps(params), utils.utc_today()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the cache as it's premature optimization. We can add it in the future if needed.
I expect the code that is going to use this module is going to store the downloaded bugs on disk to avoid requesting the same information from Bugzilla all the time, so probably the cache won't be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, should I only fetch bugs with multiple signatures? as bugs with single signature won't help us in testing the model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fetch them all. In the future we might have additional ideas on how to test the model that don't require multiple signatures, so it's good to have all the data handy (hopefully it won't be a HUGE amount of data, so it should be feasible).
Please provide your feedback on this pull request here. Privacy statement: We don't store any personal information such as your email address or name. We ask for GitHub authentication as an anonymous identifier to account for duplicate feedback entries and to see people specific preferences. |
To be merged after (#103)
Part of issue (#39)