Share how I transformed the logs into lines of IDs here #35

ying1016 · 2020-03-30T02:31:00Z

Hey guys,
I used Drain3 to transform the HDFS logs into lines of IDs here:https://github.com/ying1016/Drain3.git.
Hope it can help you if you don't know what to do.
One thing that should be noticed: the rawdata is ordered by time of the log, not block ID.
If you want to transform the logs, you need to have the data ordered by block ID,
not my test data in the URL. But I think it might not be a problem.

DuoweiPan · 2020-07-24T09:51:35Z

@ying1016 Thank you for your implementation! I noticed in the IDblks.log there are a lot of single log messages like 06 01, 01 which is smaller than the window size and are quite different from hdfs_train. Those messages will be detected as abnormal if I use the model trained with hdfs_train. Correct if I'm wrong, I think the original log data you used is the same as log data that DeepLog used, then why is the log key so different between them? Any hint would be helpful! Thank you!

edocorallo · 2020-09-25T11:43:36Z

Hello, @DuoweiPan

I noticed in the IDblks.log there are a lot of single log messages like 06 01, 01 which is smaller than the window size and are quite different from hdfs_train. Those messages will be detected as abnormal if I use the model trained with hdfs_train.

For what i understood the minimal length of the session should never be less than the window size (eg. window_size=9, len(session)>=9) during the training stage (could be wrong thought)

then why is the log key so different between them?

Also For what i understood, the log keys are kinda arbitrary.
I numerated them by appearing order using a simple dictionary and saved the dictionary for later parsing. But if I did the parsing starting from some random lines, I would still obtain a good training set containing the same sequences of logs, but named differently.
(eg. the sequence [ 2 5 2 5 4 7 8 ] is equivalent to [6 8 6 8 1 9 13] and, as long the enumeration of the log keys is consistent through the entire dataset, DeepLog obtains similar results on both enumerations)
Obviously, if you use one enumeration for the train that has to be the same for predicting.

I hope to be helpful. Bye

OneStepAndTwoSteps · 2022-10-12T08:44:28Z

This's very helpful to me, thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share how I transformed the logs into lines of IDs here #35

Share how I transformed the logs into lines of IDs here #35

ying1016 commented Mar 30, 2020

DuoweiPan commented Jul 24, 2020

edocorallo commented Sep 25, 2020 •

edited

Loading

OneStepAndTwoSteps commented Oct 12, 2022

Share how I transformed the logs into lines of IDs here #35

Share how I transformed the logs into lines of IDs here #35

Comments

ying1016 commented Mar 30, 2020

DuoweiPan commented Jul 24, 2020

edocorallo commented Sep 25, 2020 • edited Loading

OneStepAndTwoSteps commented Oct 12, 2022

edocorallo commented Sep 25, 2020 •

edited

Loading