-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft PR for Pysa Fuzzer #886
Conversation
Cool, this is a good start. A few comments:
|
Enhanced Pysa fuzzer by adding type annotations, ensuring variables are defined before use, making expression generation truly recursive, and using textwrap.indent for better code formatting. Added a defined_variables set to track declared variables, improving code validity. Despite these improvements, the fuzzer is still far from perfect and requires further refinement to enhance code generation diversity and flow validation. Will continue to work on it extensively! Might even approach it in a different manner now that I have been playing around with a it for a bit. |
Cool, another round of feedback:
I believe @alexkassil also has a different idea to generate code that always has a valid flow, feel free to ask him if you are interested. |
My feedback:
|
…stead of just prev
…ing after too many slices. we do not want that
…g outputs in places
Hey @esohel30 , so the idea for this project is to automatically find false negatives - ie flows that should be security issues, but for whatever reason pysa doesn't find it. The way to do this is to generate increasingly complex valid flows for pysa to find. Everything generated should be a valid security issue. https://github.com/facebook/pyre-check/tree/main/source/interprocedural_analyses/taint/test/integration take a look at the tests here (in the .py files). Here's an example: https://github.com/facebook/pyre-check/blob/main/source/interprocedural_analyses/taint/test/integration/source_sink_flow.py
One issue in this file is the flow in One way to always generate valid issues is to start with For example, let's say you add 3 functionalities of mutations to the fuzzer:
And now you randomly pick from those 3 elements 4 times to get [1, 2, 3, 2]. Applying those mutations step by step gets you:
Continuing adding more and more single hop transformations that preserve the flow will make the fuzzer be able to generate all the valid flows present in https://github.com/facebook/pyre-check/tree/main/source/interprocedural_analyses/taint/test/integration - I think for simplicity do not worry about any modelling other than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, here is how I recommend you change it high level going forward:
- Look through https://github.com/facebook/pyre-check/tree/main/source/interprocedural_analyses/taint/test/integration and pick a few simpler cases to try to model with the fuzzer. Say somewhere around 5 cases
- If you exhaustively generate all 4 length mutations, that's 555*5, or 625 valid but different pysa flows.
- run pysa on all of them and validate pysa catches those errors
…tion genration functions can be more easily understood
@arthaud has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Congrats and well done @esohel30 ! |
Thanks for the hard work! We have finally merged this and found a few false negatives (see upcommit commits). |
Draft PR for the pysa fuzzer project 🚀