You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 22, 2019. It is now read-only.
I'm reading this document and wondering what this command is for.
The description says preprocess your data before passing it to any command you need but this is too vague to be useful. What are the common use cases of the tool? Why was it created?
Finally, the last flag is dzhigurda ... is that Nikita Dhzigurda?
The text was updated successfully, but these errors were encountered:
The description is not updated - the real one is https://github.com/src-d/ml/blob/master/sourced/ml/__main__.py#L34 Thus we cache UASTs and/or file contents so that we do not have to extract them again for downstream tasks (especially because it is typically the trickiest and the most unreliable step).
Regarding Nikita, yep. He is a legendary Russian freak, and his surname sounds funny even for ourselves. Mail.Ru group developers (thousands of them) have an internal convention to call the conditions for A/B tests "dzhigurdas". The goal of dzhigurdas is to select the proper configuration depending on the context. I decided that it was funny to continue the tradition and used that name for the dirty hack to artificially extend the dataset in src-d/ml. So dzhigurda chooses which commits to process.
Is there some way to access commits from a particular date? I am trying to convert a repo of the size of 440 Mb, having 6k commits. Siva file size is 1.2 Gb, but I am wondering, what would be the size of .parquet...
It takes forever on a cluster node (dzhigurda -1), then crashes - apparently 200 Gb RAM is not enough for this task.
I think, I should use gitbase for that...
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm reading this document and wondering what this command is for.
The description says
preprocess your data before passing it to any command you need
but this is too vague to be useful. What are the common use cases of the tool? Why was it created?Finally, the last flag is
dzhigurda
... is that Nikita Dhzigurda?The text was updated successfully, but these errors were encountered: