-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data conversion #1
Comments
I use the dataset provided by the author of this paper. |
hello. thanks @wuyifan18 for the great job. I think what @Athiq is talking about can be found in the 4.3 paragraph of the published paper. I'm also trying to find out how this could be implemented! any help would be much appreciated |
@sotiristsak exactly ... i want the raw text which has been converted to numbers in the data provided (i think, its TF-IDF) -- if its so, then shouldn't be a problem to implement. Please let me know, if that's the case @wuyifan18 |
@wuyifan18 thanks for the response, i am looking for text data, so that i can use Spell and Deeplog .. but where i fail is after Spell i have text parsed data. This text parsed data i want to train with Deeplog, but i am not sure how to convert this parsed data from Spell to numbers (is it TF-IDF) ?. |
@Athiq you mean convert the data to numbers according to log keys you have parsed from Spell? |
Sorry for the delayed replay. Unfortunately, I also don't have a clue. I'm thinking I have to implement the 4.3.1 paragraph of the paper on my own, because, I think, this is where the logs are being split into tasks in order to be grouped into workflows. @Athiq what do you mean by TF-IDF? Also, is anyone interested in collaborating to do the above work? |
Btw, @Athiq , the numbers are not TF-IDF. They are the ids of each different log type. So, a sequence of such numbers denotes the workflow of a specific task pattern. The hdfs_train file contains the workflows that were extracted from the raw log file of the normal execution. |
@sotiristsak You're right. |
@sotiristsak @wuyifan18 what i am trying is to run DeepLog for this data as below https://github.com/logpai/loghub/blob/master/Hadoop/Hadoop_2k.log. I have successfully ran Spell(parser) on this data then i have two files as below Sample
Sample
Now the big question is --- to run Deeplog on this structured_file and template files. Is this possible ?? or i am missing something ??. thanks in advance |
@Athiq You should convert the structured_file to numbers according to the template files you have got using Spell. |
@wuyifan18 : Thanks for your response, Sorry but I am also struggling on how to convert structured files into numbers, can you guide us by given any example if how to do it please. Any example would help. |
Hello! From my understanding, once raw text logs have been parsed(using Spell or any other parsing tool), I think they should be converted into sequences of log templates to be fed to LSTM model. |
I agree with u opinion, that's why I am confused about their format of training data, I am also confused why the paper's author divide log to lines, and each line have different length, I think it is not the correct format of training data according to his paper, do you have any idea? |
@hzxGoForward I think there is a preprocessing step missing, which is, for each line(block/session), building sequences of same length. I guess that is not the actual final input for training. |
@williamceli exactly, the actual final input for training need to padding whose length is the hyperparameter window_size. |
may be you can use the number of each log key extract by the following dataset: |
@wuyifan18 @hzxGoForward : Can you add preprocessing, how you converted lines to numericals by using hyperparameter or window_size or timestamps for LSTM? We are referring the openstack logs and for your reference, i have attached log. And we are able to convert unstructured logs to structured logs using spell or log parser but after that we are unable to feed the data to training and I understood by using hyperparmeter window size you are trying to convert . Can you add that details or sample source code? |
@Athiq Hi,thanks for your response and it also helps me a lot! And I have something to verify, is you mean that I can verify code of realizing workflow by the hdfs_train file ?Thank you so much! |
@Athiq hi! I am going through the same issue, I have parsed the logs and I am clueless on how to convert them into numbers for processing. Were you able to find a solution? |
@Athiq Hello buddy, I have already obtained the template file and the templated log file, but how can I turn them into digital sequence files? Like the author's hdfs_ As with train data, do you have a way? I hope you can reply to me when you see it. This is very important to me. Thank you! |
Do you have script that converts the log files(HDFS files - text ) to numbers ??
https://github.com/wuyifan18/DeepLog/blob/master/data/hdfs_train
How did you get the above ?? -- using Spell ?? ... after running the parser i still have text data --- how did you convert to numbers (vectors) ?? --- do you have a script ?? can you please upload ??
https://github.com/logpai/logparser/tree/master/logs/HDFS
is this the above data converted to numbers ??
thanks in advance
The text was updated successfully, but these errors were encountered: