Meaning of numbers in dataset #15

danielhanbitlee · 2019-05-17T21:27:30Z

Hi,

I'm looking at the data folder of this repo. Can someone explain what the numbers in these files mean?

Here's an example of the numbers in the file I'm referring to:
https://github.com/wuyifan18/DeepLog/blob/master/data/hdfs_train

Any help would be appreciated.

riyu94 · 2019-05-17T23:29:57Z

@danielhanbitlee, Each row represents the integer representation of event (logs) sequence corresponding to each unique block_id/session id

danielhanbitlee · 2019-05-18T00:06:36Z

@riyu94 Thanks for the quick response.

danielhanbitlee · 2019-05-18T00:26:49Z

@riyu94 One more question. How are the rows divided? When do you create a new row?

riyu94 · 2019-05-20T18:23:16Z

@danielhanbitlee Rows are divided based on each unique block_id and we create a new row when there is a new block_id in the structured event logs

danielhanbitlee · 2019-05-20T18:56:14Z

@riyu94 I see. How do you create a block_id?

riyu94 · 2019-05-21T00:05:24Z

You need to look for unique block id in each event log message field

wuyifan18 · 2019-05-21T02:20:35Z

@riyu94 Thanks for your answer！

RahulShrivastava22 · 2019-05-21T05:57:55Z

Hello @riyu94 you explained it very well but it is my lack of understanding that I am still not getting how actually a log key sequence is being generated kindly elaborate on the process of generating log key sequence from the log file(Which log file to be used for generating log key sequence)?

danielhanbitlee · 2019-05-21T22:38:13Z

@riyu94 I have another question as I'm a little bit confused about the steps. Which of the following steps do we take to generate to generate the numbers as seen here?

logs -> Spell -> sort log keys based on block id
logs -> sort log keys based on block id -> Spell for each block id separately

I'm thinking 1 is the way it's done. Just want to confirm.

Gharibim mentioned this issue Aug 21, 2019

How to convert the parsed data to training data? #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meaning of numbers in dataset #15

Meaning of numbers in dataset #15

danielhanbitlee commented May 17, 2019

riyu94 commented May 17, 2019

danielhanbitlee commented May 18, 2019

danielhanbitlee commented May 18, 2019

riyu94 commented May 20, 2019

danielhanbitlee commented May 20, 2019

riyu94 commented May 21, 2019

wuyifan18 commented May 21, 2019

RahulShrivastava22 commented May 21, 2019

danielhanbitlee commented May 21, 2019

Meaning of numbers in dataset #15

Meaning of numbers in dataset #15

Comments

danielhanbitlee commented May 17, 2019

riyu94 commented May 17, 2019

danielhanbitlee commented May 18, 2019

danielhanbitlee commented May 18, 2019

riyu94 commented May 20, 2019

danielhanbitlee commented May 20, 2019

riyu94 commented May 21, 2019

wuyifan18 commented May 21, 2019

RahulShrivastava22 commented May 21, 2019

danielhanbitlee commented May 21, 2019