Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update get_labelled_samples and get_unlabelled_samples to feed data in batches #3

Open
dmonllao opened this issue Nov 3, 2017 · 2 comments

Comments

@dmonllao
Copy link
Contributor

dmonllao commented Nov 3, 2017

Copied from dmonllao/moodleinspire-python-backend#1 before this gets lost:

Current implementation would swallow all system memory if a massive dataset (many GBs) is used, data should be read in batches (https://www.tensorflow.org/programmers_guide/reading_data) this is not likely to happen soon as datasets generated by moodle will hardly reach 10MB but it is still something we should fix.

The only problem I can think of is models evaluation, because we need to shuffle the dataset to evaluate the moodle model using different combinations of training and test data. We can use a subset (limited to X MBs) of the evaluation dataset instead of shuffling all big dataset.

@dmonllao
Copy link
Contributor Author

dmonllao commented Nov 3, 2017

I started working on this on my last project week (https://github.com/dmonllao/moodleinspire-python-backend/tree/batch-evaluation) and I couldn't finish it; I will try to look at it at some point in future.

@douglasbagnall
Copy link
Contributor

Catalyst more or less fixed this with 3a811cf but it looks like we never made a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants