TadGAN does not work with the default setup #5

rruizdeaustri · 2021-06-22T12:45:26Z

Hi,

I have tried to run the code with the current setup (number of epochs is 30) but I get

File TadGAN/anomaly_detection.py", line 129, in find_scores
precision = tp / (tp + fp)
ZeroDivisionError: division by zero

Any ideas about what is going on ?

With Kind Regards,
Roberto

arunppsg · 2021-06-22T15:11:35Z

Hi Roberto,

The dataset example-2_cpc_results.csv does not contain any negative points. Hence, tp=0. The model also detects all points as negative. Hence, fp=0.

The attached dataset is not the write one to evaluate the model (sorry for the unnecessary hurdle) since it does not contain any anomalous point. I need to update it with some other time series anomaly detection dataset. You can see here on using the code with other dataset.

Thanks,
Arun

rruizdeaustri · 2021-06-23T09:43:15Z

Hi Arun,

Ok, then I'll try with another dataset.

Thanks a lot !

Best,
Rbt

rruizdeaustri · 2021-06-30T12:50:01Z

Hi Arun,

I have labeled the nyc_taxi.csv dataset from NAB and I have a question about the split of the data used in your code.
As it is, 70% of the data is used for training and 30% for testing but in this way the training data contain anomalies for this particular dataset. Since the method is unsupervised, shouldn't anomalies be excluded in the training process ? I guess we want to learn the distribution of the say normal samples, right ?

Thanks a lot !!

All the best,
Roberto

arunppsg · 2021-06-30T14:11:31Z

Hi Roberto,

The anomalies are excluded in training process. The anomaly values are used only for evaluation process and not during training. Training uses the time series signals. The generator learns the distribution of normal samples.

Cheers,
Arun.

rruizdeaustri · 2021-07-01T14:41:36Z

Hi Arun,

Yes this is what I expect though in some blog about the model in Orion have seen they use the whole time series (including anomalous timesteps). That is why I got confused.

I will split the data and pickup just normal data and let you know whether the code works with this dataset as it does with the "official" implementation in Orion.

BTW, have you tried with this dataset ? I could send it to you with the right format for your code.

Thanks a lot !!

Best,
Rbt

arunppsg · 2021-07-02T15:02:30Z

Hi Roberto,

Thanks for your interest. Training of GANs are highly unstable and it requires more computation power. Access to computation power is currently out of scope for me.

Best,
Arun.

rruizdeaustri · 2021-07-02T15:13:13Z

Hi Arun,

In fact I have been training the model and the performance is really poor for this dataset in comparison with what is reported in the Orion webpage for the say official version.

I have used the default hyperparameters which are identical to the ones used in the report by the Orion guys:

Accuracy 0.79
Precision 1.00
Recall 0.07
F1 Score 0.13

Any advice to improve this ?

Thanks a lot !!!

Rbt

arunppsg · 2021-07-02T17:11:36Z

Hi Rbt,

The same was the result observed in my scenario. But the loss value seems to improve in the right direction after successive epochs. I don't have any particular advice other than the following:

Try Orion
Use other time series modelling approaches like fbprophet etc

Best,
Arun.

amanuel2 · 2021-07-08T21:57:36Z

Can one of you send a CSV file that works with this source code? (I get the same error) I can't find any online.

natkhosh · 2021-07-09T18:31:22Z

Hi Arun,

Yes this is what I expect though in some blog about the model in Orion have seen they use the whole time series (including anomalous timesteps). That is why I got confused.

I will split the data and pickup just normal data and let you know whether the code works with this dataset as it does with the "official" implementation in Orion.

BTW, have you tried with this dataset ? I could send it to you with the right format for your code.

Thanks a lot !!

Best,
Rbt

Hi, could you please send me your dataset. I'll try to use it in my diploma work.
I have the same problem with datasets (I tried NAB too).

rruizdeaustri · 2021-07-12T10:28:50Z

Hi Arun,

Maybe I can send you the data and you can add them to the repo ?

Rbt

arunppsg · 2021-07-12T11:40:32Z

Adding your data to repo will be great. You can make a pull request with that data and I will merge it. Best, Arun.

…

On Mon, 12 Jul 2021, 15:59 rruizdeaustri, ***@***.***> wrote: Hi Arun, Maybe I can send you the data and you can add them to the repo ? Rbt — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGJNAINKYCEVQSG7N62LAODTXK7W3ANCNFSM47DTU7NA> .

rruizdeaustri · 2021-07-13T10:26:04Z

Hi Arun,

I have created a branch called rruiz-branch where the file nyc_taxi_new.csv has been added and made a pull request.
Could you pls merge it ?

Best,
Rbt

arunppsg · 2021-07-13T14:05:33Z

You need to create a pull request. I don't see any pull request currently.

AugustComte · 2021-09-02T10:39:22Z

Hi @arunppsg,

Firstly thank you for this, its super cool. I am new to this and have a few questions, which I hope are not too stupid, if you can indulge me?

Looking through this I notice both this and the Orion examples only use a value and date column, it it possible to make this work with additional regressors/columns, so called Xregs i.e. temperature, sales price etc.

Secondly is it necessary to have the labelled anomalies? My anomaly labels (in my datasets) were achieved by using the deviation between a true value and predicted with an RNN, I am expecting tadGAN to be better. So it does not seem appropriate to measure the GAN performance by the results of the RNN, I was under the impression that tadGAN was unsupervised. All I really want is to get the anomaly scores. Does that mean I would need to delete the evaluation section of the code, or will it run regardless and output the outlier scores? Where can I get these?

Again, sorry if these are poor questions. I'm not sure I entirely understand the code.

Best
August

arunppsg · 2021-09-03T03:12:30Z

Hello August,

You can also use other variables but for that you might need to change model architecture. I am not sure on how we can change it. Maybe I will think through it and get back to you after some time. In the current architecture, there is only one regressor and it is normalized first, and then the input is a window of data points (window size: 100 * 1). Consider giving a read through this paper for using Multivariate time-series with RNNs.
Labelled anomalies are not necessary since it is an unsupervised approach. Labels are only required to evaluate the model. Anomaly scores are the computed as product of reconstruction error and critic score. See the test function in anomaly_detection.py for anomaly scores. To use it without labels, just create a dummy column called anomaly or modify code in main.py and anomaly_detection.py

Thanks!

The-Boyy · 2021-10-10T05:35:05Z

Hi Arun,

I have labeled the nyc_taxi.csv dataset from NAB and I have a question about the split of the data used in your code. As it is, 70% of the data is used for training and 30% for testing but in this way the training data contain anomalies for this particular dataset. Since the method is unsupervised, shouldn't anomalies be excluded in the training process ? I guess we want to learn the distribution of the say normal samples, right ?

Thanks a lot !!

All the best, Roberto

Excuse me, can you send me your dataset? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TadGAN does not work with the default setup #5

TadGAN does not work with the default setup #5

rruizdeaustri commented Jun 22, 2021

arunppsg commented Jun 22, 2021 •

edited

Loading

rruizdeaustri commented Jun 23, 2021

rruizdeaustri commented Jun 30, 2021

arunppsg commented Jun 30, 2021

rruizdeaustri commented Jul 1, 2021

arunppsg commented Jul 2, 2021

rruizdeaustri commented Jul 2, 2021

arunppsg commented Jul 2, 2021 •

edited

Loading

amanuel2 commented Jul 8, 2021

natkhosh commented Jul 9, 2021

rruizdeaustri commented Jul 12, 2021

arunppsg commented Jul 12, 2021 via email

rruizdeaustri commented Jul 13, 2021

arunppsg commented Jul 13, 2021

AugustComte commented Sep 2, 2021 •

edited

Loading

arunppsg commented Sep 3, 2021 •

edited

Loading

The-Boyy commented Oct 10, 2021

TadGAN does not work with the default setup #5

TadGAN does not work with the default setup #5

Comments

rruizdeaustri commented Jun 22, 2021

arunppsg commented Jun 22, 2021 • edited Loading

rruizdeaustri commented Jun 23, 2021

rruizdeaustri commented Jun 30, 2021

arunppsg commented Jun 30, 2021

rruizdeaustri commented Jul 1, 2021

arunppsg commented Jul 2, 2021

rruizdeaustri commented Jul 2, 2021

arunppsg commented Jul 2, 2021 • edited Loading

amanuel2 commented Jul 8, 2021

natkhosh commented Jul 9, 2021

rruizdeaustri commented Jul 12, 2021

arunppsg commented Jul 12, 2021 via email

rruizdeaustri commented Jul 13, 2021

arunppsg commented Jul 13, 2021

AugustComte commented Sep 2, 2021 • edited Loading

arunppsg commented Sep 3, 2021 • edited Loading

The-Boyy commented Oct 10, 2021

arunppsg commented Jun 22, 2021 •

edited

Loading

arunppsg commented Jul 2, 2021 •

edited

Loading

AugustComte commented Sep 2, 2021 •

edited

Loading

arunppsg commented Sep 3, 2021 •

edited

Loading