Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meaning of numbers in dataset #15

Open
danielhanbitlee opened this issue May 17, 2019 · 9 comments
Open

Meaning of numbers in dataset #15

danielhanbitlee opened this issue May 17, 2019 · 9 comments

Comments

@danielhanbitlee
Copy link

Hi,

I'm looking at the data folder of this repo. Can someone explain what the numbers in these files mean?

Here's an example of the numbers in the file I'm referring to:
https://github.com/wuyifan18/DeepLog/blob/master/data/hdfs_train

Any help would be appreciated.

@riyu94
Copy link

riyu94 commented May 17, 2019

@danielhanbitlee, Each row represents the integer representation of event (logs) sequence corresponding to each unique block_id/session id

@danielhanbitlee
Copy link
Author

@riyu94 Thanks for the quick response.

@danielhanbitlee
Copy link
Author

@riyu94 One more question. How are the rows divided? When do you create a new row?

@riyu94
Copy link

riyu94 commented May 20, 2019

@danielhanbitlee Rows are divided based on each unique block_id and we create a new row when there is a new block_id in the structured event logs

@danielhanbitlee
Copy link
Author

@riyu94 I see. How do you create a block_id?

@riyu94
Copy link

riyu94 commented May 21, 2019

You need to look for unique block id in each event log message field

@wuyifan18
Copy link
Owner

@riyu94 Thanks for your answer!

@RahulShrivastava22
Copy link

Hello @riyu94 you explained it very well but it is my lack of understanding that I am still not getting how actually a log key sequence is being generated kindly elaborate on the process of generating log key sequence from the log file(Which log file to be used for generating log key sequence)?

@danielhanbitlee
Copy link
Author

@riyu94 I have another question as I'm a little bit confused about the steps. Which of the following steps do we take to generate to generate the numbers as seen here?

  1. logs -> Spell -> sort log keys based on block id
  2. logs -> sort log keys based on block id -> Spell for each block id separately

I'm thinking 1 is the way it's done. Just want to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants