Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blur support #11

Closed
PandorasFox opened this issue Oct 15, 2016 · 41 comments
Closed

Add blur support #11

PandorasFox opened this issue Oct 15, 2016 · 41 comments
Assignees

Comments

@PandorasFox
Copy link

Notes for myself:

  • do something similar to this
  • maybe incrementally blur so it gradually blurs into the lockscreen
  • (also maybe allow any keystroke in that period to cause it to exit?)
  • potentially let i3lock also capture the screen and blur that
  • allow blurring of an image passed in is a must (needs to hold it in ram, though)
    need to play around and see how it handles large blur operations and what it displays in the meantime (i.e. if nothing displays, or if it shows the "locking..." thing)
@PandorasFox
Copy link
Author

I believe setting some libev timers to handle the progressive blurring should work pretty well. I'll do some playing around with that.

@PandorasFox
Copy link
Author

c3a95b8 adds initial blur support. Some notes:

  • A white 'fog' appears around the edges. I'll need to figure out how to fix this.
  • I need to update to allow for custom blur radius
  • There's a noticeable delay before i3lock actually shows up. This is problematic, and I need to fork off the blurring with libev or something similar.
  • Thank god @shiver has code that grabs the XCB framebuffer if there's no image passed in with the blur command. Dude's a wizard.

@PandorasFox
Copy link
Author

@meskarune thoughts? (Time to see if you get notifications for @ mentions)

@frysztak
Copy link

Hi, I profiled your blurring code with Callgrind and it turns out that application spends over 90% of its runtime inside blur_image_surface function. It's gonna need some optimization :).

I have some experience manually vectorising code (see here) and I can take a look at the blurring code if you'd like.

@PandorasFox
Copy link
Author

PandorasFox commented Oct 16, 2016

That'd be excellent! I was mostly just pulling from shiver/i3lock@69b40f1 to begin with and was then going to work on optimizing the blurring/working on an interative blur process, and any help would be appreciated.

I knew the runtime was going to be pretty awful at first; I was just using that as a base and was then going to see how it went from there.

I'm going to play around with it some more today and see what happens when I blur the image in another thread while also reading it and drawing, mostly just to see if it can be parallelized at all.

@frysztak
Copy link

Great, I'll start with SSE2 as it's the most widespread. We'll see if we'll need multiple threads for this.

About progressive blurring - that's definitely a cool idea, but I think it should be optional.

@PandorasFox
Copy link
Author

Oh definitely, it'd be optional since it'd definitely increase CPU load (on a workstation it'd be pretty neat; on a laptop you'd want it locked as soon as possible so that you can hibernate/suspend/whatever).

Threading it off would mostly just be so that the blurring can be done in the background while i3lock grabs the display and locks (and perhaps display the image unblurred, until it's done blurring). If the blurring can be done quickly enough then it likely won't matter for most resolutions; I'll do some testing with some virtual desktops and extremely high resolutions once you have some blurring stuff done.

@frysztak
Copy link

A quick update: I have a working version implemented with SSE intrinsics. It still needs a lot of work, so I didn't even benchmark it. I'd be further along if it wasn't for university.

Meanwhile I discovered that Makefile doesn't set any optimization flags. It's pretty bizarre. Anyway, I recommend adding -O2 to CFLAGS. I didn't measure the speedup for naive implementation, but it feels faster.

@PandorasFox
Copy link
Author

I'll look into that, thanks!

University has kept me pretty busy this semester. I'm a c++ data structures TA and I've been grading tests for most of this past week D:

@PandorasFox
Copy link
Author

Also yeah; I just tried that out (and committed it); it definitely does seem to take about ~1/3 as much time as before from invocation to blurred lockscreen.

@frysztak
Copy link

It's there: https://github.com/sebastian-frysztak/i3lock-color/commits/fast-blur. And it's way faster.

Blurring 1080p image on my Thinkpad x220 with i7 takes about 40-60 ms. It used to take about 260 ms. Delay is almost unnoticable now and IMHO my code handles borders way better than one from Cairo cookbook. And there's still some room for optimization.

Passing arguments from command line is currently not implemented, so to change "blur factor" you have to change this line.

Code from Cairo cookbook used radius as the only parameter, which doesn't make much sense to me. I use standard deviation. Radius is constant (and equals 3), which means kernel is 7x7. This is more or less SIMD-friendly size. Because kernel size is constant, increasing stddev only makes sense up to a certain point. Afterwards, we're limited by radius (and smooth Gaussian blur basically turns into ordinary and ugly average).

It'd like you to let me know if current kernel size is fine. If output images are not blurry enough, than I'll either increase kernel size manually, or maybe figure out a way to implement dynamic kernel size.

@frysztak
Copy link

I tested slightly larger radii at constant stddev. You can take a look here: http://imgur.com/a/hEI2g.

@PandorasFox
Copy link
Author

I'll play around with it some tomorrow and tinker with some different values / look at implementing CLI and stuff. It's looking great so far, and thanks for the work!

@frysztak
Copy link

No problem, I like tinkering with stuff like that. It's more fun than high-level programming.

I've been thinking and I came to a conclusion that using dynamic kernel size is probably the best solution. This way we can produce moderately blurry images fast, and very blurry images slower (but not too slow, I hope). It leaves the most space for end-user customization.

And as for input parameters, we can use ImageMagick's approach: radius = 3*stddev, where only stddev is user-visible.

I have some ideas how to further boost performance, so I hope that won't be an issue for large kernels.

In the future, I could implement other effects (pixelisation comes to mind as a obvious choice).

Oh, and I almost forgot: I made a mistake labelling those pictures. 7, 9, and 11 and kernel sizes, not radii.

@PandorasFox
Copy link
Author

PandorasFox commented Oct 23, 2016

Ah, alright.

I think pixelisation should just be a quick downscale / upscale by the inverse of the scaling factor, right? I think resizing it to pixelize should be the fastest way to do so.

I haven't actually had much time to play with it yet since I'm basically doing all of a term project for a group this weekend 😂

@frysztak
Copy link

School keeps me rather busy too, but I managed to find some time to extend kernel size to 15x15, improve edges handling and implement SSSE3 version. I also added some benchmarking code. For 10 runs and beyond, SSSE3 version takes about 20 ms per run, SSE2 - 65 ms and 'naive' - 264 ms.

Since blurring is so fast now, we can perform it multiple times to produce strong blur.

The only problem with SSSE3 version is that after several iterations it darkens the image. This is, I suppose, due to quantization errors. I think it's fixable, but solution will slow down the code. Not that it matters that much :).

@PandorasFox
Copy link
Author

Awesome! I'll try and play around with this some when I get the time, though that might be a few days.

@PandorasFox
Copy link
Author

I haven't gotten around to playing with it much (I do think it's working excellently, though), but I think I may have figured out a way to get around this problem:

I have no idea how this is really going to help you in the long run, since I'll need to add support for overlaying text on the lockscreen as well. The lock icon can probably be handled as well, but I'll have to see.

I think I could probably rig up something with -I to allow passing in an arbitrary number of images, which'll then be layered over the pixmap that gets blurred. (Or at least, allow passing in one image... you get the idea).

If I get that working I'll be pretty happy. Hopefully I'll have time to work on this soon.

@frysztak
Copy link

frysztak commented Nov 1, 2016

I'm glad you like it :).
However I'm not sure how useful overlaying multiple images would be. Are they all going to be centered? If not, user would have to specify their positions. I see no real use for it (but I'm not one of those hardcode desktop ricers). But specifying one image (which might even default to a nice key icon) is a good idea IMO.

@PandorasFox
Copy link
Author

Yeah, multiple images seems kinda silly in retrospect, but a single image (i.e. for a lock icon with text, over the blurred lockscreen) has its uses.

@frysztak
Copy link

frysztak commented Nov 1, 2016

So what's the plan for now? Do you need my help with anything? (apart from fixing SSSE3 impl.)

@PandorasFox
Copy link
Author

PandorasFox commented Nov 1, 2016

I don't think so. The blurring stuff you've done so far is tremendous and I think that once you finish tidying it up it'll be pretty much ready to shit as it is now (maybe with some changes to make it more match the structure/style of what's there so far, although honestly, I kinda need to do that for a lot of the other stuff I've hacked into this fork).

I should be able to do this soon ish. I've got a few interviews this month, so I'll try to do this one of my bus trips/flights, assuming I'm not drowning in schoolwork on those, lol.

I know about what I need to do [should just be moving the stuff here to export the blurred background to a new cairo_surface_t, and then painting that onto the xcb_ctx before I paint the img onto it), but it just comes down to "when can I sit down for a few hours and do this properly (I might do it during a meeting tonight)" ;-;

Thanks for all the work so far, though! Seriously, it's amazing.

@PandorasFox
Copy link
Author

Also, tangentially related (I haven't had the time to fully grok what the hell this is doing), but about how do you think the performance of your code stacks up against ffmpeg's gaussian blur stuff? @Airblader mentioned it over here, and I thought it was somewhat interesting (I have a feeling he'd look at this fork and probably wonder wtf I was doing with half these hacks).

I'm admittedly pretty inexperienced when it comes to low-level image manipulation, so I'm kinda just poking around and trying to learn some while tinkering when I can.

@PandorasFox
Copy link
Author

PandorasFox commented Nov 1, 2016

It turns out that it was easier to do the image overlaying than I thought it would be, and I've got it implemented in my blur branch now. I'll start tidying everything up some soon.

@frysztak
Copy link

frysztak commented Nov 1, 2016

Like I said before, I like this low-level stuff, so really, I'm just glad I can finally be more active in open source community.

Regarding style of code - it seems that original i3lock uses Clang formatter. I don't particularly the style the original authors went with, but nevertheless I ran it over initial SSE2 commit. I might have forgotten to do it for SSSE3, though.
Before I forget, a list of things I need to do:

  • fix SSSE3
  • add one more optimization I have in my mind
  • add CPU-capabilities detection code (some old hardware doesn't support SSSE3)
  • add AVX2 version? this one could be tricky for my CPU doesn't support AVX2, I would have to use an emulator to test correctness. do you have AVX2-capable CPU?

Once those things are done, I'll create a pull request and then we can talk about adjusting style.

I took a brief look at this FFmpeg code. They don't use intrinsic functions, but if packagers compile it with -O3, some of those loops should be automatically vectorised.
It's hard to compare performance, because FFmpeg does quite a lot more than my code, but time returns about 400 ms. Thing is, to me, this blur in FFmpeg looks odd, and what's worse - there are blocky artifacts on perfectly flat surfaces.

@PandorasFox
Copy link
Author

PandorasFox commented Nov 1, 2016

It doesn't appear my laptop's CPU (i7 3632QM) supports AVX2, but I'm pretty sure the 4690k in my desktop will (I'll check when I get back tonight, but from what google tells me, it does support it!).

I noticed the artifacts as well, but wasn't sure if they were noticeable for most displays (they're somewhat noticeable on my 1366768 laptop, but I dunno how noticeable it'd be on my 38401080 desktop).

I was assuming that the ffmpeg solution wouldn't be the best since it seems to do more than is necessary, but again, not super familiar with this stuff, so I figured I'd ask.

@PandorasFox
Copy link
Author

Oh yeah, just kinda what the image overlaying looks like right now. Stuff is just algined to the top-left corner.

2016-11-01_17 12 42-f9af73

@frysztak
Copy link

frysztak commented Nov 1, 2016

Hey, that's pretty good. Even transparency works. So it's just cairo_paint(), essentially?

@PandorasFox
Copy link
Author

Pretty much. I just set up an additional cairo_surface_t for storing the blur_img separately, and then I paint that onto the surface before painting the image passed in.

I felt like there'd be some edge bugs I'd find or some tweaks needed, but it seems to work perfectly fine on the first try (well, I think that there might be some memory leaks; I never checked that...)

@frysztak
Copy link

frysztak commented Nov 3, 2016

I updated SSSE3 implementation and I don't like how it looks. I'll let images speak.
SSE2 (floating point) after 5 iterations:
sse2_5_runs
SSSE3 (integer) after 5 iterations:
ssse3_5_runs

It must be due to roundings that happen when kernel is scaled and rounded to integers. I don't think I can do anything about it, I'm already using biggest scaling factors I can. So, I think I'll drop SSSE3 version. Instead I'll prepare something else, I'm not sure what exactly will make the most sense yet. I'm considering AVX2, and perhaps mixed AVX and SSE2, so that folks like myself can still benefit from wider registers.

@PandorasFox
Copy link
Author

Hm, is it just darker? That's rather curious.

Sounds good to me either way.

@frysztak
Copy link

frysztak commented Nov 3, 2016

Kind of. Background itself is not actually darker, but all the text is (it looks more blended-in comparing to SSE2). There are also some artifacts on the right hand side, near the border.

@PandorasFox
Copy link
Author

Ah, I sort of see, now. I'm travelling for an interview and only have my laptop (wonderful old laptop with great guts, but it has a 1366*768 display...), so I won't really be able to tell the images apart easily until the weekend.

That's definitely problematic.

@frysztak
Copy link

frysztak commented Nov 4, 2016

Maybe it's too late, but if not, I wanted to wish you best of luck :).

@PandorasFox
Copy link
Author

Thanks! I think I did pretty well on it :)

I'll be playing with the code some on my bus back, I think.

@frysztak
Copy link

frysztak commented Nov 11, 2016

You might want to check out box-blur branch. It approximates Gaussian blur very closely and is faster.
I think I'm going to prepare a generic version, to replace that blurring code that was written for Cairo. So that, you know, blurred image will look the same, regardless of system's capabilities.

@PandorasFox
Copy link
Author

I'll check it out when I can. I glanced at the code and it looks pretty good.

@PandorasFox
Copy link
Author

Hey, where's this at, currently? Courses have been eating up all my time, but I figured I should check.

I think last time I tried, things weren't blurred quite enough/there was some darkening of the screen buffer after blurring for some reason, so I can't really merge / push yet. Any idea what would need to be changed/how it'd need to be implemented? I may give a try at this later.

@frysztak
Copy link

frysztak commented Feb 14, 2017

I use code from box-blur branch since November. Haven't had a single issue.
There's only SSE2-based version, SSSE3 and AVX got removed, as they don't seem to be necessary.
I should probably implement generic, SSE2-free version for really old x86 and ARM CPUs. It's a very niche market I imagine, but it ought to be done. I'll send you a pull request once I'm done, okay?

edit - I forgot one thing: blurring factor. I'll add that too.

@PandorasFox
Copy link
Author

Alright, thanks! If you need any help / testing, just let me know

@PandorasFox
Copy link
Author

Implemented with #17 :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants