-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Idea]: Achieving higher FPS from FFmpeg: Both RAW and RENDER. #15
Comments
@roninpawn Thank you for detailed and well-explained post. It sounds very interesting to me and you've made some very good suggestions. But let's get down to the nitty-gritty and practically implementating of these ideas. I'll focus on Wrap-up part of the post and will discuss each point one-by-one here:
Yes, but currently
This is very good, but I need to do some benchmarks. Actually, it almost seems too good to be true, and I need some internal testing to draw any conclusions. Also, I'm not going to use OpenCV at all in DeFFcode because it is designed in the first place to replace OpenCV in my vidgear library, and rather I'll be implementing a well-optimized YUV to RGB converter here myself in cython.
@roninpawn That's a big no. Threading do not mixes well with
Yeah, that's doable and I'm thinking the same as you. Rather than enforcing YUV in general, let the user decide what they want.
Yes I'm well aware of your work and I think you did a commendable job with the things in hand. I personally wasn't in favor of ffmpeg-python, and wanted to implement solution myself to have full control over the library. I've also considered other wrappers but some have installation problems, others have no support for hardware decoding: abhiTronix/vidgear#148 |
Too Good to be TrueIt does "seem too good to be true," doesn't it! ;) This notion that FFmpeg's insanely optimized multi-processing could have its legs swept by low-level, old-school 'bandwidth' issues, is ridiculously unintuitive! But where I presume that FFmpeg achieves its YUV>RGB transforms faster than the OpenCV library can, FFmpeg still ends up needing to push 6-million bytes of RGB through a pipe, instead of 3-million bytes of YUV -- per frame! Which I suppose is something like deciding whether to carry an inflatable bed up from the basement, BEFORE or AFTER, you've inflated it. 😄 BenchmarksWhile I know this is no supplement for the benchmarks you'll want to craft and run on deFFcode, I have these figures in pocket so I'll share them anyway. These are the results of the RAW access and PyGame-Rendered benchmarks I wrote to test the various implementations of my 'FFmpeg Videostream' class object. Given the matched results I got testing deFFcode>RGB against my Videostream>RGB script, I would expect you to find similar results. Notwithstanding Intel vs AMD type factors.
I can add to this benchmark that in an experimental build of my script I added a "Threaded Frames Extractor" method based on Benjamin Lowe's work here: https://github.com/bml1g12/benchmarking_video_reading_python, and liberated an extra 10-15fps over TEST 3 from the list above. The method simply stacks a threaded queue of unprocessed frames from the bytestream. I have not tested the virtue of threading the YUV > RGB process, either alone or in conjunction with this. YUV > RGB Transform MethodI'm thrilled to hear that you mean to implement your own method(s) for color-space conversion! My first thought, once I realized the throughput benefit of ingesting as YUV, was "there's got to be a better transform than OpenCV out there!" In particular, I was hoping for a hardware implementation. I don't know the hardware end of things, nor the difficulties of accessing dedicated GPU functions from within Python. All I know is that MP4 is decoded at a relative BLAZE even on weakly-powered tablets and phones, where manufacturers seek to advertise the many 'hours of Netflix' you can watch on a single battery charge. With that, my assumption is: modern processors include dedicated hardware for the screen-rendering and decoding of YUV420p and its common variants. But, like I say: These are darts thrown in the dark. I'm also currently under the impression that Simple DirectMedia Layer (SDL) https://www.libsdl.org/ implements methods that render a YUV pipe direct to screen. And I intended to look into this at some point. What do I think?It's a difficult call. My benchmarks suggest a 25-33% performance increase with YUV ingest + local RGB conversion. And that's a LOT of performance to leave on the table. Especially knowing that most developers will never so much as dream there could be such massive benefits by this approach. On the other hand, it is highly specific. And there are additional concerns where increased CPU usage comes into play. It's been my experience that FFmpeg, left to handle the YUV > RGB conversion on it's own consumes about 50% of my CPU while active. When OpenCV is implemented for the conversion, I see a bump of around 10% additional core usage. Which could be a factor to a given developer. Also, 12-bit YUV is technically a lossy format, I believe. Some color information is lost in reducing the storage footprint. The human eye doesn't notice the difference, and subsampling helps maintain the accuracy of input to output. But there's no guarantee of perfect fidelity. That's not a problem if the source is already in this -- the most-common format on the planet. But if the source is not already 12-bit YUV, implicitly forcing an RGB or BGR source to 12-bit YUV would not be suitable for fine scientific purposes, where the color-averaging that occurs across pixel quadrants would skew results. (Don't just take my word for everything in these last 2 paragraphs -- this is what I THINK I understand of it all) With those considerations: I suppose I agree. deFFcode should not default FFmpeg's output to YUV420p. But I also feel that merely documenting the benefits of the proposed FFmpeg > YUV420p > RGB method isn't enough. Many will never discover that for the cost of two extra lines of code they might increase the speed of their application by a full 1/3rd; That for every hour-long job they queue, they might've been loading up the next one in just 40 minutes. With that in mind I would propose for consideration: A public method that implements this local YUV > RGB concept. One that can be optimized and maintained by the library's developers; which implements the library's own conversion matrix, and that is perhaps even benefited by the kind of queue'd frame preparation Benjamin Lowe tested in his benchmarks. Because ultimately, I see it as a public-awareness challenge that needs overcome. And by maintaining a stock-method, (especially if appropriately named) unfamiliar developers draw nearer to discovering what - in likely the MAJORITY of all use cases - is measurably the fastest, most powerful, and best solution going. Noting again that the majority of the world's video media is encoded and circulated in this format, and that the stock FFmpeg>RGB method, leaves 25-33% of attainable efficiency untapped. Those are my thoughts. |
Faster frame rates from FFmpeg by YUV
YUV420 vs RGB24
Hey! It looks like you've got yourself quite the well-built FFmpeg-for-Python package here. I've been down this rabbit hole myself and thought I'd share some tips for optimizing performance. The speed of FFmpeg's input to Python can be massively accelerated by use of the YUV420 format instead of RGB.
The far and away most prevalent video format on the planet today is YUV 4:2:0. MP4's, DVDs, Blu-ray even, all come packed in YUV420... This is because it stores the raw binary of each pixel within an average of 12-bits, instead of the 24-bits per pixel consumed by RGB/BGR formats. So, literally half the diskspace.
In my own testing I've found that asking FFmpeg to output YUV420 video as RGB ends up taking significantly longer than collecting the raw YUV data and transforming it to RGB within Python. And that fact is a bit unintuitive. I would certainly expect FFmpeg's multi-processor operations to handle every aspect of video faster than anything a library in Python could. But the slow down that simply can't be overcome here is in the data pipe itself.
Every frame of RGB video is twice as much data to move through memory space, compared to YUV420. And that access-speed-inhibitor is so massive that even a single-threaded, blocking operation - grabbing one frame at a time from the FFmpeg pipe, shaping that frame into a compatible NumPy array, and pushing the resulting array through an OpenCV transform matrix (YUV2RGB_I420) - is FASTER!
Benchmarks
Render
2700(*edit) (6 core / 12 virtual) platform, I get about 75fps from your 'deFFcode' library when rendering a 1080p mp4 to an OpenCV window, using the sample code provided in the documentation.RAW
Threading Possbilities for YUV -> RGB
Additionally, even higher speeds are theoretically possible, if multiple threads are established for the YUV -> RGB process separately from a RAW ingest-stacking process.
If one thread is dedicated to queueing 'up to X frames' worth of raw binary snips from the FFmpeg pipe, and another thread is established to process and queue 'up to X frames' of YUV2RGB converted frames, the ingest function ceases to block the processing function and the processing function ceases to block ingest. Resulting in the CPU spinning as fast as it can to have multiple frames ready, for whatever method might come along to ask for one.
Wrap-Up
With appropriate configuration I presume a developer could accomplish everything I've described using your library as it exists today. But I think the following are worth considering across continued development:
All that said, it really depends on how you choose to define the scope of your library whether any of this fits. I just wanted to pass the results of my own tests and development with FFmpeg in Python, along for your consideration. Again, here's the link to the script I developed atop Karl Kroening's 'ffmpeg-python' library if you care to look it over. It's short and sweet: https://github.com/roninpawn/ffmpeg_videostream/
My Current Environment
The text was updated successfully, but these errors were encountered: