Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding the predicted variable #48

Open
nagsubhadeep opened this issue Sep 3, 2020 · 7 comments
Open

Question regarding the predicted variable #48

nagsubhadeep opened this issue Sep 3, 2020 · 7 comments

Comments

@nagsubhadeep
Copy link

Yifan,

Source: LogKeyModel_predict.py

In the code below, can you please explain the difference between the output and predicted variables? Is output the same as predicted except it being sorted in tensors? Also, shouldn't the value of the predicted variable be something binary so that we can determine whether the predicted outcome is anomalous or not?

output = model(seq)
predicted = torch.argsort(output, 1)[0][-num_candidates:]

Thanks,
Deep

@wuyifan18
Copy link
Owner

Deep,
The output is a probability distribution describing the probability for each log key to appear as the next log key value given the history.

@nagsubhadeep
Copy link
Author

nagsubhadeep commented Sep 3, 2020

Shouldn't the value of the predicted variable be something binary so that we can determine whether the predicted outcome is anomalous or not? I am getting a one-dimensional array instead.

@wuyifan18
Copy link
Owner

Sort the possible log keys based on their probabilities and treat a key value as normal if it’s among the top g candidates. A log key is flagged as being from an abnormal execution otherwise.

You can read the paper for details.

@Rufaida94
Copy link

@wuyifan18 where can I modify top g in your code?

@wuyifan18
Copy link
Owner

@Rufaida94 here

parser.add_argument('-num_candidates', default=9, type=int)

@Rufaida94
Copy link

than you @wuyifan18 , I know that num_candidates here is a hyperparameter that is supposed to be changed according to the dataset. But my question is if my data has 24297 num_classes (while your HDFS dataset has only 28 num_classes) what can be a reasonable num_candidates? for example is 1000 too high or too low for num_candidates? I know this is a very vague question but any pointers are appreciated.

@wuyifan18
Copy link
Owner

wuyifan18 commented Jul 5, 2021

@Rufaida94 the num_candidates is a hyperparameter, which means you should adjust it according to the metrics, such as F1 measure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants