-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom video dataset encoding/serialize uses all memory, process killed. How to fix? #5499
Comments
Hey, Thanks for your question. Those are some cool datasets! I'm very sorry to hear that you're running into these problems. We brainstormed a bit and came up with a couple of ideas:
Even if we make storing encoded videos work, I'm worried that the problem would just be moved to when the dataset is used. Namely, reading a single example would still require 14-15 GB of memory. After the dataset has been prepared, how are you expecting that it will be used? Would it make sense to lower the FPS (it's 50 now right)? Will users only use chunks of the video? If so, perhaps you can store the chunks instead of the entire video. Kind regards, |
Tom, Thank you very much for your reply, and those ideas! How will they be used:I'm just getting into Sign Language Processing research, so I'm still not quite sure how I want to use these, but potentially for training translation models for signed language videos to spoken-language text, or for pretraining a vision transformer, or a bunch of other things A few use-cases follow: test out models on real dataI figured I'd start learning by at least running some inference pipelines with already-trained models, and got stuck on this step. I expected running a model to take significant memory, but didn't expect that loading the video would be the issue. I guess I'm successfully learning things! Specifically I'd like to load in some videos and run this demo of segmentation+recognition pipeline. replicate other research on githubI went looking for examples of people using these, and it seems that not many use the video option, perhaps for this very reason, that loading them is too cumbersome.
replicate WMT results, or at least re-run their modelsOne thing I wanted to do was replicate results for the WMT Sign Language Translation contests, which provides data in a number of formats including video, and a number of the submissions do use video as inputs instead of poses.
At least load the videos and then run pose estimation on themAnother thing I wanted to do was to be able to load the videos, run a pose estimator on them, and then use that, in order to potentially improve that part of the pipeline. A number of sign language translation models take pose keypoints as inputs, and I'd like to try those out. At the very least I'd like to be able to do this! And then the pose methods may take less compute from there. |
Regarding the suggestions:
I guess I'd like to be able to, and I don't know if any of this is feasible, but:
Did some further Googling, and I found a few things:
|
FPS lowering: that's another good idea, I think there might be a method in there to set that already. Maybe tweaking that would reduce memory usage, I can try. |
What I need help with / What I was wondering
I want to load a dataset containing these
without this happening (Colab notebook for replicating)
...How can I edit my dataset loader to use less memory when encoding videos?
Background:
I am trying to load a custom dataset with a Video feature.
When I try to tfds.load() it, or even just
download_and_prepare
, RAM usage goes up very high and then the process gets killed.For example this notebook will crash if allowed to run, though with a High-RAM instance it may not.
It seems it is using over 30GB of memory to encode one or two 10 MB videos.
I would like to know how to edit/update this custom dataset so that it will not use so much memory.
What I've tried so far
I did a bunch of debugging and tracing of the problem with memray, etc. See this notebook and this issue for detailed analysis including a copy of the memray report.
Tried various different ideas in the notebook, including loading just a slice, editing buffer size, and switching from .load() to download_and_prepare()
Finally I traced the problem to serializing and encoding steps under the
See this comment, which was allocating many GiB of memory to encode even one 10MB video.
I discovered that even one 10MB video was extracted to over 13k video frames, taking up nearly 5GiB of space. And then the
serializing would take up 14-15 GiB, and the encoding would take another 14-15, and so the process would be killed.
Relevant items:
It would be nice if...
--download_only
in the CLIEnvironment information
I've tested it on Colab and a few other Ubuntu workstations. High-Ram Colab Instances seem to have enough memory to get past this.
The text was updated successfully, but these errors were encountered: