Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: enable invscov into afl++ (Fuzzbench/OSS-Fuzz) #1

Open
laurentsimon opened this issue Jun 7, 2021 · 11 comments
Open

Feature: enable invscov into afl++ (Fuzzbench/OSS-Fuzz) #1

laurentsimon opened this issue Jun 7, 2021 · 11 comments

Comments

@laurentsimon
Copy link

Your code is not even released and here's your first issue... :-)

We'd love to see your code enabled into afl++ as a special mode.

Afl++ is already supported in Fuzzbench and it used actively to test new fuzzing techniques. If you could enable invscov in mainstream afl++, we'd easily be able to test it with a combination of other afl++ options!

Afl++ is also used in OSS-Fuzz, which is used to continuously fuzz hundreds of open source projects.

@laurentsimon laurentsimon changed the title Feature: enable invscov into afl++ and Fuzzbench Feature: enable invscov into afl++ (Fuzzbench/OSS-Fuzz) Jun 7, 2021
@andreafioraldi
Copy link
Member

andreafioraldi commented Jun 7, 2021

Afl++ is already supported in Fuzzbench and it used actively to test new fuzzing techniques

Wow I was not aware of it, really :)

The practical challenges to integrate the feedback from LLVM values in-tree in AFL++ are not trivial. Basically, I used Daikon to not reinvent the wheel, but I really don't like it and I feel that single-thread Java monster does not fit in the AFL++ codebase that we distribute to the public that is not an academic protoype. In addition, stopping the campaign to run a tool and recompile many times the PUT is not so user-friendly as we want AFL++ to be.

So, the way to go, is to code another invariants miner and embed it direclty into the runtime to learn invariants during the execution. I'm working on it, there are several challenges to overcome (e.g. as you don't know the variables involved in invariants at compile time, the pass instruments all, so the produced binary is slow) but I will adress them in the near future. Research is incremental :)

@laurentsimon
Copy link
Author

I see what you mean. Note that it's fine to have a non-optimized version of invscov in Fuzzbench to get early feedback on how it performs on the benchmarks. Perfect is the enemy of good, and industry is incremental too!

If it's easier to integrate a non-afl++ version, that's fine too. We don't need to necessarily learn the invariants at runtime. We (Fuzzbench), could start invscov with an existing corpus (we have plenty) and invscov can learn the invariants once at start time. We can manually create several invscov campaigns, run one after the other, each time hardcoding the previous corpus as initial seed to learn the invariants from. We could do that, say, 3 times to see if it plateaus or not. This would already give us a good idea of how well it will work.

We certainly don't need to have the final optimized version to test it out. Sure, for OSS-Fuzz we'd like a more polished version, but don't let that get in the way for Fuzzbench. We've run several fuzzers from academic researchers (e.g. symcc) and they included the results in their submission. We've had a good experience so far with PoC code.

@andreafioraldi
Copy link
Member

andreafioraldi commented Jun 7, 2021

If it's easier to integrate a non-afl++ version, that's fine too.

No I don't think that running Daikon inside FuzzBench will be easy.
Btw, InvsCov is already AFL++, just the LLVM pass is out-of-tree and there is a bit of python wrapping things that I need to polish and Daikon.

Btw, I will for sure have an usable invariants mode into AFL++ (devel in the "unusual_values" branch) before USENIX (and so before that this prototype will be public).
FuzzBench is already evaluating it, I already explained some details in the upcoming experiment google/fuzzbench#1170 about the first one as it seems promising even if a lot slow ATM.

@laurentsimon
Copy link
Author

SG! Thanks for submitting the PR!

@vanhauser-thc
Copy link

We'd love to see your code enabled into afl++ as a special mode.

you are aware that @andreafioraldi is one of the 4 maintainers of afl++? ;)

@laurentsimon
Copy link
Author

I was not until yesterday, but someone on my team pointed this out :-) hahah
I took so much care of introducing afl++... to the maintainers! ^^

@andreafioraldi
Copy link
Member

andreafioraldi commented Jun 9, 2021

Screenshot at 2021-06-09 10-10-05

That's when all the fuzzers are between 4 and 5 hours (the OSS Fuzz time IIRC). At least here the new prototype seems to work (disabled is vanilla AFL++), but there are several challenges.

  • I haven't really fixed the speed problem
  • Invariants are too naive compared to Daikon
  • The policy to decide when alternate CGF with learning and invscov is dumb (I tested normal that does learning on the first cycle and early that does learning only on the initial corpus, but they are both too naive)

@andreafioraldi
Copy link
Member

The results seems also unstable, maybe max is high because the fuzzer can find the bug but the median is low. This is IMO related to the policy used to alternate learning and invariants feedback fuzzing.

@andreafioraldi
Copy link
Member

andreafioraldi commented Jun 9, 2021

aflplusplus_unusual_disabled is now after 14h slowly reaching enabled. While finding bugs early is very good, I feel that I can do far better because of this additional problem that I forgot:

When an invariant is violated the current impl cannot learn a more strict version and get a feedback when this new version is violated (e.g. x > 0 is violated with x==0 so it should be transformed to x >= 0).

Also, doing a local experiment with pcre2 the stability is dramatically shitty, I have to find a way to compile two binaries with the exact same coverage map :(

Screenshot at 2021-06-09 17-26-41

@andreafioraldi
Copy link
Member

@laurentsimon the thing that I noticed is that the initial corpus from OSS-Fuzz is not really saturated, do u know why?

@laurentsimon
Copy link
Author

I have to find a way to compile two binaries with the exact same coverage map :(

you mean you want deterministic edge IDs?

the initial corpus from OSS-Fuzz is not really saturated, do u know why?

What's the name of your experiment?
@jonathanmetzman where did we get the corpus from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants