data conversion #1

Athiq · 2019-02-18T09:43:13Z

Do you have script that converts the log files(HDFS files - text ) to numbers ??

https://github.com/wuyifan18/DeepLog/blob/master/data/hdfs_train

How did you get the above ?? -- using Spell ?? ... after running the parser i still have text data --- how did you convert to numbers (vectors) ?? --- do you have a script ?? can you please upload ??

https://github.com/logpai/logparser/tree/master/logs/HDFS

is this the above data converted to numbers ??

thanks in advance

wuyifan18 · 2019-02-19T05:29:06Z

I use the dataset provided by the author of this paper.
More details please refer to the web page.

sotiristsak · 2019-02-23T15:09:57Z

hello. thanks @wuyifan18 for the great job. I think what @Athiq is talking about can be found in the 4.3 paragraph of the published paper. I'm also trying to find out how this could be implemented! any help would be much appreciated

Athiq · 2019-02-25T09:10:37Z

@sotiristsak exactly ... i want the raw text which has been converted to numbers in the data provided (i think, its TF-IDF) -- if its so, then shouldn't be a problem to implement. Please let me know, if that's the case @wuyifan18

wuyifan18 · 2019-02-25T09:26:23Z

@Athiq the raw text can be found in the web page.

Athiq · 2019-02-25T09:42:03Z

@wuyifan18 thanks for the response, i am looking for text data, so that i can use Spell and Deeplog .. but where i fail is after Spell i have text parsed data. This text parsed data i want to train with Deeplog, but i am not sure how to convert this parsed data from Spell to numbers (is it TF-IDF) ?.

wuyifan18 · 2019-02-25T09:59:27Z

@Athiq you mean convert the data to numbers according to log keys you have parsed from Spell?
If so, I have no idea. Maybe @sotiristsak can give a hand.

sotiristsak · 2019-03-02T18:49:26Z

Sorry for the delayed replay. Unfortunately, I also don't have a clue. I'm thinking I have to implement the 4.3.1 paragraph of the paper on my own, because, I think, this is where the logs are being split into tasks in order to be grouped into workflows. @Athiq what do you mean by TF-IDF? Also, is anyone interested in collaborating to do the above work?

sotiristsak · 2019-03-03T09:45:20Z

Btw, @Athiq , the numbers are not TF-IDF. They are the ids of each different log type. So, a sequence of such numbers denotes the workflow of a specific task pattern. The hdfs_train file contains the workflows that were extracted from the raw log file of the normal execution.

wuyifan18 · 2019-03-04T04:16:30Z

@sotiristsak You're right.

Athiq · 2019-03-05T12:28:04Z

@sotiristsak @wuyifan18 what i am trying is to run DeepLog for this data as below

https://github.com/logpai/loghub/blob/master/Hadoop/Hadoop_2k.log.

I have successfully ran Spell(parser) on this data then i have two files as below

Sample
structured_file.csv

LineId	Date	Time	Pid	Level	Component	Content	EventId	EventTemplate
1	81109	203615	148	INFO	dfs.DataNode$PacketResponder	PacketResponder 1 for block blk_38865049064139660 terminating	ead21f08	PacketResponder * for block * terminating
2	81109	203807	222	INFO	dfs.DataNode$PacketResponder	PacketResponder 0 for block blk_-6952295868487656571 terminating	ead21f08	PacketResponder * for block * terminating
3	81109	204005	35	INFO	dfs.FSNamesystem	BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.251.73.220:50010 is added to blk_7128370237687728475 size 67108864	54e007d2	BLOCK* NameSystem.addStoredBlock blockMap updated * 50010 is added to * size *
4	81109	204015	308	INFO	dfs.DataNode$PacketResponder	PacketResponder 2 for block blk_8229193803249955061 terminating	ead21f08	PacketResponder * for block * terminating
5	81109	204106	329	INFO	dfs.DataNode$PacketResponder	PacketResponder 2 for block blk_-6670958622368987959 terminating	ead21f08	PacketResponder * for block * terminating
6	81109	204132	26	INFO	dfs.FSNamesystem	BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.251.43.115:50010 is added to blk_3050920587428079149 size 67108864	54e007d2	BLOCK* NameSystem.addStoredBlock blockMap updated * 50010 is added to * size *

Sample
template_file.csv

EventId	EventTemplate	Occurrences
ead21f08	PacketResponder * for block * terminating	311
54e007d2	BLOCK* NameSystem.addStoredBlock blockMap updated * 50010 is added to * size *	314
74cae9fd	Received block * of size * from *	292
dd632e5d	Receiving block * src * * dest * 50010	292

Now the big question is --- to run Deeplog on this structured_file and template files. Is this possible ?? or i am missing something ??.

thanks in advance

wuyifan18 · 2019-03-20T03:36:45Z

@Athiq You should convert the structured_file to numbers according to the template files you have got using Spell.

Hammadtcs · 2019-03-20T10:03:10Z

@wuyifan18 : Thanks for your response, Sorry but I am also struggling on how to convert structured files into numbers, can you guide us by given any example if how to do it please. Any example would help.

williamceli · 2019-03-20T15:47:02Z

Hello! From my understanding, once raw text logs have been parsed(using Spell or any other parsing tool), I think they should be converted into sequences of log templates to be fed to LSTM model.

hzxGoForward · 2019-03-21T12:08:46Z

Hello! From my understanding, once raw text logs have been parsed(using Spell or any other parsing tool), I think they should be converted into sequences of log templates to be fed to LSTM model.

I agree with u opinion, that's why I am confused about their format of training data, I am also confused why the paper's author divide log to lines, and each line have different length, I think it is not the correct format of training data according to his paper, do you have any idea?

williamceli · 2019-03-21T14:11:40Z

@hzxGoForward I think there is a preprocessing step missing, which is, for each line(block/session), building sequences of same length. I guess that is not the actual final input for training.
My problem is I don't get the same number of block lines. If I group by block in the first 100K log lines I get a different number of sessions. Maybe I am extracting the wrong block id from each line.

wuyifan18 · 2019-03-21T14:19:56Z

@williamceli exactly, the actual final input for training need to padding whose length is the hyperparameter window_size.

hzxGoForward · 2019-03-21T14:31:40Z

@hzxGoForward I think there is a preprocessing step missing, which is, for each line(block/session), building sequences of same length. I guess that is not the actual final input for training.
My problem is I don't get the same number of block lines. If I group by block in the first 100K log lines I get a different number of sessions. Maybe I am extracting the wrong block id from each line.

may be you can use the number of each log key extract by the following dataset:
http://iiis.tsinghua.edu.cn/~weixu/sospdata.html
DeepLog's author cited this dataset, in this dataset, there are log key and their number.

Hammadtcs · 2019-03-21T14:36:36Z

@wuyifan18 @hzxGoForward : Can you add preprocessing, how you converted lines to numericals by using hyperparameter or window_size or timestamps for LSTM?

We are referring the openstack logs and for your reference, i have attached log.
https://github.com/logpai/logparser/blob/master/logs/OpenStack/OpenStack_2k.log

And we are able to convert unstructured logs to structured logs using spell or log parser but after that we are unable to feed the data to training and I understood by using hyperparmeter window size you are trying to convert . Can you add that details or sample source code?

Huhu-ooo · 2020-06-03T03:57:13Z

Btw, @Athiq , the numbers are not TF-IDF. They are the ids of each different log type. So, a sequence of such numbers denotes the workflow of a specific task pattern. The hdfs_train file contains the workflows that were extracted from the raw log file of the normal execution.

@Athiq Hi，thanks for your response and it also helps me a lot! And I have something to verify, is you mean that I can verify code of realizing workflow by the hdfs_train file ?Thank you so much!

stuti-madaan · 2020-08-31T22:29:21Z

@Athiq hi! I am going through the same issue, I have parsed the logs and I am clueless on how to convert them into numbers for processing. Were you able to find a solution?

Nightmare2334 · 2023-03-19T08:16:18Z

@sotiristsak @wuyifan18 what i am trying is to run DeepLog for this data as below

https://github.com/logpai/loghub/blob/master/Hadoop/Hadoop_2k.log.

I have successfully ran Spell(parser) on this data then i have two files as below

Sample structured_file.csv

LineId Date Time Pid Level Component Content EventId EventTemplate
1 81109 203615 148 INFO dfs.DataNode$PacketResponder PacketResponder 1 for block blk_38865049064139660 terminating ead21f08 PacketResponder * for block * terminating
2 81109 203807 222 INFO dfs.DataNode$PacketResponder PacketResponder 0 for block blk_-6952295868487656571 terminating ead21f08 PacketResponder * for block * terminating
3 81109 204005 35 INFO dfs.FSNamesystem BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.251.73.220:50010 is added to blk_7128370237687728475 size 67108864 54e007d2 BLOCK* NameSystem.addStoredBlock blockMap updated * 50010 is added to * size *
4 81109 204015 308 INFO dfs.DataNode$PacketResponder PacketResponder 2 for block blk_8229193803249955061 terminating ead21f08 PacketResponder * for block * terminating
5 81109 204106 329 INFO dfs.DataNode$PacketResponder PacketResponder 2 for block blk_-6670958622368987959 terminating ead21f08 PacketResponder * for block * terminating
6 81109 204132 26 INFO dfs.FSNamesystem BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.251.43.115:50010 is added to blk_3050920587428079149 size 67108864 54e007d2 BLOCK* NameSystem.addStoredBlock blockMap updated * 50010 is added to * size *
Sample template_file.csv

EventId EventTemplate Occurrences
ead21f08 PacketResponder * for block * terminating 311
54e007d2 BLOCK* NameSystem.addStoredBlock blockMap updated * 50010 is added to * size * 314
74cae9fd Received block * of size * from * 292
dd632e5d Receiving block * src * * dest * 50010 292
Now the big question is --- to run Deeplog on this structured_file and template files. Is this possible ?? or i am missing something ??.

thanks in advance

@Athiq Hello buddy, I have already obtained the template file and the templated log file, but how can I turn them into digital sequence files? Like the author's hdfs_ As with train data, do you have a way? I hope you can reply to me when you see it. This is very important to me. Thank you!

Athiq mentioned this issue May 6, 2019

Predictive maintenance logpai/loglizer#39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data conversion #1

data conversion #1

Athiq commented Feb 18, 2019

wuyifan18 commented Feb 19, 2019 •

edited

Loading

sotiristsak commented Feb 23, 2019

Athiq commented Feb 25, 2019

wuyifan18 commented Feb 25, 2019

Athiq commented Feb 25, 2019

wuyifan18 commented Feb 25, 2019

sotiristsak commented Mar 2, 2019

sotiristsak commented Mar 3, 2019

wuyifan18 commented Mar 4, 2019

Athiq commented Mar 5, 2019 •

edited

Loading

wuyifan18 commented Mar 20, 2019

Hammadtcs commented Mar 20, 2019

williamceli commented Mar 20, 2019

hzxGoForward commented Mar 21, 2019 •

edited

Loading

williamceli commented Mar 21, 2019 •

edited

Loading

wuyifan18 commented Mar 21, 2019 •

edited

Loading

hzxGoForward commented Mar 21, 2019

Hammadtcs commented Mar 21, 2019

Huhu-ooo commented Jun 3, 2020

stuti-madaan commented Aug 31, 2020

Nightmare2334 commented Mar 19, 2023

data conversion #1

data conversion #1

Comments

Athiq commented Feb 18, 2019

wuyifan18 commented Feb 19, 2019 • edited Loading

sotiristsak commented Feb 23, 2019

Athiq commented Feb 25, 2019

wuyifan18 commented Feb 25, 2019

Athiq commented Feb 25, 2019

wuyifan18 commented Feb 25, 2019

sotiristsak commented Mar 2, 2019

sotiristsak commented Mar 3, 2019

wuyifan18 commented Mar 4, 2019

Athiq commented Mar 5, 2019 • edited Loading

wuyifan18 commented Mar 20, 2019

Hammadtcs commented Mar 20, 2019

williamceli commented Mar 20, 2019

hzxGoForward commented Mar 21, 2019 • edited Loading

williamceli commented Mar 21, 2019 • edited Loading

wuyifan18 commented Mar 21, 2019 • edited Loading

hzxGoForward commented Mar 21, 2019

Hammadtcs commented Mar 21, 2019

Huhu-ooo commented Jun 3, 2020

stuti-madaan commented Aug 31, 2020

Nightmare2334 commented Mar 19, 2023

wuyifan18 commented Feb 19, 2019 •

edited

Loading

Athiq commented Mar 5, 2019 •

edited

Loading

hzxGoForward commented Mar 21, 2019 •

edited

Loading

williamceli commented Mar 21, 2019 •

edited

Loading

wuyifan18 commented Mar 21, 2019 •

edited

Loading