Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing from the HDFS logs including parsing and encoding #46

Open
c1505 opened this issue Aug 2, 2020 · 10 comments
Open

Reproducing from the HDFS logs including parsing and encoding #46

c1505 opened this issue Aug 2, 2020 · 10 comments

Comments

@c1505
Copy link

c1505 commented Aug 2, 2020

The data in this repo is already encoded. I tried looking through the other issues to get an understanding of how to reproduce the results using the original HDFS dataset and haven't been able to understand what to do.

I understand that the data needs to be parsed and encoded and Drain is a recommended tool for parsing. From there, it isn't clear if that is actually the tool used and what part or parts of the parsed data to use. I see in the conclusion of the paper this : DeepLog learns and encodes entire log message including timestamp, log key, and parameter values. Unsure if that is also what is done for this implementation or not.

@wuyifan18
Copy link
Owner

This repo only implements the log key anomaly detection model.

@c1505
Copy link
Author

c1505 commented Aug 3, 2020

Thanks @wuyifan18 . How did you go about tokenizing and creating the numerical representation from the log keys ?

@wuyifan18
Copy link
Owner

@c1505 Just encode log keys from 0 to the number of log keys.

@Nothing-bit
Copy link

Could you share the orginal labeled logs in this code?

@shoaib-intro
Copy link

@wuyifan18 I have same question. could you please share encoding technique. If you don't feel comfortable please share some articles!

@OutOfBoundCats
Copy link

@shoaib-intro can you please check
#41 (comment)
it may help although I am not so sure

@Nothing-bit
Copy link

@wuyifan18 I have same question. could you please share encoding technique. If you don't feel comfortable please share some articles!

the encording technique uses the loghub and logparser, the first one present the original log files and the second presents the log template generator, which can be found on github

@shoaib-intro
Copy link

shoaib-intro commented Apr 7, 2022

@shoaib-intro can you please check #41 (comment) it may help although I am not so sure

Yes, I have gone through thanks for that but the problem there is not always block id available if we talk about application logs and in that case I have combined log keys based on Component which is unique in my case. where some components has sequence length of **213k** in that case I face index out of bound error IndexError: Target -1 is out of bounds. over line loss = criterion(output, label.to(device)) any idea for that

@OutOfBoundCats
Copy link

@shoaib-intro
i am sorry but i really dont have any idea on that

@shoaib-intro
Copy link

@shoaib-intro can you please check #41 (comment) it may help although I am not so sure

Yes, I have gone through thanks for that but the problem there is not always block id available if we talk about application logs and in that case I have combined log keys based on Component which is unique in my case. where some components has sequence length of **213k** in that case I face index out of bound error IndexError: Target -1 is out of bounds. over line loss = criterion(output, label.to(device)) any idea for that

this happened my training data contains negative numbers which I removed and issue resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants