Predictive maintenance #39

amineebenamor · 2019-04-23T15:14:21Z

Hello,
First, thank you for your implementations, it helped me a lot!
I want to build a predictive maintenance pipeline based on logs for research purposes.
Therefore, for predictive maintenance, I'm implementing DeepLog (that's the only solution I found, I don't think there are others, let me know if you have other solutions) but it supposes that the training set in input contains logs from normal execution.
In order to apply my pipeline for any logs, I don't want to use labelled datasets. In addition, logs can be huge and it's in many cases complicated to say whether it's an anomaly or not if I'm not an expert (Spark logs for example).
The LSTM (DeepLog) needs normal execution logs (otherwise, it would consider anomalies as normal behavior).
I thought of, as a preprocessing step, using one of the unsupervised learning methods (Log Clustering, PCA or Invariant Mining for example) to keep normal execution logs for my training set of DeepLog.
I would run this whole pipeline on datasets from loghub which are not labelled.
What do you think about this approach? Is there anything better that can be done? Any advice?
Thank you for your answer!

ShilinHe · 2019-04-24T03:00:13Z

Yes, DeepLog indeed has such a problem that normal execution logs are required. You proposed a good idea, but it really depends on the accuracy of those unsupervised methods. Invariant Mining performs good, and you should have a try first. As an alternative, you may use those labeled data but remove all anomalous logs. In this case, your experimental setting should be same as Deeplog. BTW, we welcome pull requests from the community.

Athiq · 2019-05-06T13:20:23Z

Hi @ShilinHe
wuyifan18/DeepLog#1
is this something you people are working on ??

idea is --- any random log file --- i use spell to get the template --- then the question is how to convert those templates into numbers to feed them into LSTM(lets say) for forecasting. Any suggestions are very well appreciated.

ShilinHe · 2019-05-06T13:59:57Z

Actually, no... we do not release code anywhere else.
If you have parsed the template out, then you can name the template as 0,1,2,... That's the number.
But usually, in LSTM, I think the input is the one-hot vectors, which should go through the embedding layer. I am not sure whether DeepLog works in this manner.

amineebenamor · 2019-05-06T14:52:18Z

Actually, no... we do not release code anywhere else.
If you have parsed the template out, then you can name the template as 0,1,2,... That's the number.
But usually, in LSTM, I think the input is the one-hot vectors, which should go through the embedding layer. I am not sure whether DeepLog works in this manner.

Yes, the input layer in the LSTM of DeepLog encodes the n possible log keys from K as one-hot
vectors, as specified in the DeepLog paper.

Athiq · 2019-05-07T08:40:12Z

@amineebenamor did you implement with any other dataset ?? from spell or any other parser and fed it to Deeplog ??

amineebenamor · 2019-05-07T10:30:31Z

@amineebenamor did you implement with any other dataset ?? from spell or any other parser and fed it to Deeplog ??

I'm working on it when I have time.
I haven't done anything yet, you can also try by yourself and see what you have :)

amineebenamor closed this as completed May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictive maintenance #39

Predictive maintenance #39

amineebenamor commented Apr 23, 2019

ShilinHe commented Apr 24, 2019

Athiq commented May 6, 2019

ShilinHe commented May 6, 2019 •

edited

Loading

amineebenamor commented May 6, 2019

Athiq commented May 7, 2019

amineebenamor commented May 7, 2019

Predictive maintenance #39

Predictive maintenance #39

Comments

amineebenamor commented Apr 23, 2019

ShilinHe commented Apr 24, 2019

Athiq commented May 6, 2019

ShilinHe commented May 6, 2019 • edited Loading

amineebenamor commented May 6, 2019

Athiq commented May 7, 2019

amineebenamor commented May 7, 2019

ShilinHe commented May 6, 2019 •

edited

Loading