Refactor real data generators #30

kwinkunks · 2021-09-16T12:42:26Z

Not strictly a bug, but it's current behaviour is not ideal.

Right now, the data are not shuffled — and hitting Regenerate does nothing. So you always have the same test data points with these datasets.

Current method: slice 2n positive and n negative samples from the beginning of the array for train, and 2n +ve and n -ve from the end of the array (hence the negative slices in the Test function). It's not a great way to do it. It also means you can't shuffle the data.

The problem is that there is two separate functions and they don't know about each other. So how does the Test generator know which samples were used for Train? I think the Train generator has to pass back the indices of the Test set, which we can then pass to the Test generator.

This combines and closes #15 and #23.

kwinkunks added the bug Something isn't working label Sep 16, 2021

This was referenced Sep 16, 2021

Improve the real data shuffling etc #15

Closed

Most of the poroperm points aren't showing up #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor real data generators #30

Refactor real data generators #30

kwinkunks commented Sep 16, 2021

Refactor real data generators #30

Refactor real data generators #30

Comments

kwinkunks commented Sep 16, 2021