-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing from the HDFS logs including parsing and encoding #46
Comments
This repo only implements the log key anomaly detection model. |
Thanks @wuyifan18 . How did you go about tokenizing and creating the numerical representation from the log keys ? |
@c1505 Just encode log keys from 0 to the number of log keys. |
Could you share the orginal labeled logs in this code? |
@wuyifan18 I have same question. could you please share encoding technique. If you don't feel comfortable please share some articles! |
@shoaib-intro can you please check |
the encording technique uses the loghub and logparser, the first one present the original log files and the second presents the log template generator, which can be found on github |
Yes, I have gone through thanks for that but the problem there is not always block id available if we talk about application logs and in that case I have combined log keys based on Component which is unique in my case. where some components has sequence length of |
@shoaib-intro |
this happened my training data contains negative numbers which I removed and issue resolved. |
The data in this repo is already encoded. I tried looking through the other issues to get an understanding of how to reproduce the results using the original HDFS dataset and haven't been able to understand what to do.
I understand that the data needs to be parsed and encoded and Drain is a recommended tool for parsing. From there, it isn't clear if that is actually the tool used and what part or parts of the parsed data to use. I see in the conclusion of the paper this :
DeepLog learns and encodes entire log message including timestamp, log key, and parameter values.
Unsure if that is also what is done for this implementation or not.The text was updated successfully, but these errors were encountered: