-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ideas for fuzz string generation #7
Comments
If an Rcpp function takes a C string argument (do they?), DeepState will allow that to be any (null-delimited, length-controlled) byte pattern, if you use one of the string generation functions, so I think any bytes that hose the code would at least be possible to generate. Whether they would actually be generated would depend on how dense in the space they are, what fuzzer we're using, how much control-flow signal the fuzzer gets, etc. But in this case, I'd guess "oh yes, DeepState will blow this wide open, even with the default dumb fuzzer." Main downside to finding these is that this post makes me think a lot of people will say "if you hand me this weird junk, I don't expect my code to work!" -- is that going to be a problem? |
yes Rcpp functions can take string arguments, and in fact std::string (single string) and Rcpp::CharacterVector (vector of strings) are among most frequent arguments that we plan to implement a fuzzer for.
you may be right about the "I don't expect my code to work with weird inputs" problem... |
Is there an easy way to guess what size string the code is expecting? Note you'll have to convert the DeepState generated C-string to std::string, don't think we have a generator for std::string (we can add one, it'd just be a wrapper around the C string generator). |
No way to guess expected string size. Conversion from C string to
std::string is a straightforward one-liner, right?
https://www.techiedelight.com/convert-c-string-std-string-cpp/
…On Thu, Jun 25, 2020 at 10:59 AM Alex Groce ***@***.***> wrote:
Is there an easy way to guess what size string the code is expecting? Note
you'll have to convert the DeepState generated C-string to std::string,
don't think we have a generator for std::string (we can add one, it'd just
be a wrapper around the C string generator).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHDX4RNG4ZZ6S6LIUSN2DTRYOF75ANCNFSM4OHZQ4EA>
.
|
Yeah, that part is trivial. I am wondering about size defaults. What are these strings used for, usually? Keyword type stuff? Identifiers? |
strings could be used for anything. in my work I have used C++ code that
processes R strings / character vectors that represent file names and
regular expression subjects /patterns.
…On Fri, Jun 26, 2020 at 6:29 PM Alex Groce ***@***.***> wrote:
Yeah, that part is trivial. I am wondering about size defaults. What are
these strings used for, usually? Keyword type stuff? Identifiers?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHDX4XQ7XTTRVWVCRPBA4LRYVDOZANCNFSM4OHZQ4EA>
.
|
Hrm. Filename behavior in DeepState would probably be... a bad idea unless we're in a VM. :) Regexps sound ripe for finding interesting bugs. Maybe a default length of 256 chars? We're really not worrying about memory problems as much, since this comes from std::string, so going for crazy long seems un-necessary, as default anyway. |
an interesting blog post by Tomas Kalibera https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html explains that automatic testing (using typical fixed inputs) will fail to catch some errors ... can RcppDeepState generate such exotic string inputs?? @agroce do think you DeepState could find such problems?
Most interesting/relevant paragraph from the blog:
However, R has been careful not to introduce UTF-8 strings for things the user has not already intentionally made UTF-8, because of problems that this would cause for packages not handling encodings correctly. Such packages will mysteriously start failing when incorrectly using strings in UTF-8 but thinking they were in native encoding. Such problems will not be found by automated testing, because tests don’t use such unusual inputs and are often run in English or similar locales.
The text was updated successfully, but these errors were encountered: