Update get_labelled_samples and get_unlabelled_samples to feed data in batches #3

dmonllao · 2017-11-03T16:30:57Z

Copied from dmonllao/moodleinspire-python-backend#1 before this gets lost:

Current implementation would swallow all system memory if a massive dataset (many GBs) is used, data should be read in batches (https://www.tensorflow.org/programmers_guide/reading_data) this is not likely to happen soon as datasets generated by moodle will hardly reach 10MB but it is still something we should fix.

The only problem I can think of is models evaluation, because we need to shuffle the dataset to evaluate the moodle model using different combinations of training and test data. We can use a subset (limited to X MBs) of the evaluation dataset instead of shuffling all big dataset.

dmonllao · 2017-11-03T16:31:43Z

I started working on this on my last project week (https://github.com/dmonllao/moodleinspire-python-backend/tree/batch-evaluation) and I couldn't finish it; I will try to look at it at some point in future.

douglasbagnall · 2022-12-08T23:39:29Z

Catalyst more or less fixed this with 3a811cf but it looks like we never made a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update get_labelled_samples and get_unlabelled_samples to feed data in batches #3

Update get_labelled_samples and get_unlabelled_samples to feed data in batches #3

dmonllao commented Nov 3, 2017

dmonllao commented Nov 3, 2017

douglasbagnall commented Dec 8, 2022

Update get_labelled_samples and get_unlabelled_samples to feed data in batches #3

Update get_labelled_samples and get_unlabelled_samples to feed data in batches #3

Comments

dmonllao commented Nov 3, 2017

dmonllao commented Nov 3, 2017

douglasbagnall commented Dec 8, 2022