ideas for fuzz string generation #7

tdhock · 2020-06-25T04:12:41Z

an interesting blog post by Tomas Kalibera https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html explains that automatic testing (using typical fixed inputs) will fail to catch some errors ... can RcppDeepState generate such exotic string inputs?? @agroce do think you DeepState could find such problems?
Most interesting/relevant paragraph from the blog:
However, R has been careful not to introduce UTF-8 strings for things the user has not already intentionally made UTF-8, because of problems that this would cause for packages not handling encodings correctly. Such packages will mysteriously start failing when incorrectly using strings in UTF-8 but thinking they were in native encoding. Such problems will not be found by automated testing, because tests don’t use such unusual inputs and are often run in English or similar locales.

agroce · 2020-06-25T13:01:45Z

If an Rcpp function takes a C string argument (do they?), DeepState will allow that to be any (null-delimited, length-controlled) byte pattern, if you use one of the string generation functions, so I think any bytes that hose the code would at least be possible to generate. Whether they would actually be generated would depend on how dense in the space they are, what fuzzer we're using, how much control-flow signal the fuzzer gets, etc. But in this case, I'd guess "oh yes, DeepState will blow this wide open, even with the default dumb fuzzer."

Main downside to finding these is that this post makes me think a lot of people will say "if you hand me this weird junk, I don't expect my code to work!" -- is that going to be a problem?

tdhock · 2020-06-25T16:11:19Z

yes Rcpp functions can take string arguments, and in fact std::string (single string) and Rcpp::CharacterVector (vector of strings) are among most frequent arguments that we plan to implement a fuzzer for.

[1] "Rcpp::NumericVector"   "Rcpp::NumericMatrix"   "arma::mat"            
[4] "std::string"           "Rcpp::CharacterVector" "int"                  
[7] "Rcpp::IntegerVector"   "double"

you may be right about the "I don't expect my code to work with weird inputs" problem...

agroce · 2020-06-25T17:59:11Z

Is there an easy way to guess what size string the code is expecting? Note you'll have to convert the DeepState generated C-string to std::string, don't think we have a generator for std::string (we can add one, it'd just be a wrapper around the C string generator).

tdhock · 2020-06-25T23:18:26Z

No way to guess expected string size. Conversion from C string to std::string is a straightforward one-liner, right? https://www.techiedelight.com/convert-c-string-std-string-cpp/

…

On Thu, Jun 25, 2020 at 10:59 AM Alex Groce ***@***.***> wrote: Is there an easy way to guess what size string the code is expecting? Note you'll have to convert the DeepState generated C-string to std::string, don't think we have a generator for std::string (we can add one, it'd just be a wrapper around the C string generator). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHDX4RNG4ZZ6S6LIUSN2DTRYOF75ANCNFSM4OHZQ4EA> .

agroce · 2020-06-27T01:29:05Z

Yeah, that part is trivial. I am wondering about size defaults. What are these strings used for, usually? Keyword type stuff? Identifiers?

tdhock · 2020-06-27T15:29:45Z

strings could be used for anything. in my work I have used C++ code that processes R strings / character vectors that represent file names and regular expression subjects /patterns.

…

On Fri, Jun 26, 2020 at 6:29 PM Alex Groce ***@***.***> wrote: Yeah, that part is trivial. I am wondering about size defaults. What are these strings used for, usually? Keyword type stuff? Identifiers? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHDX4XQ7XTTRVWVCRPBA4LRYVDOZANCNFSM4OHZQ4EA> .

agroce · 2020-06-27T20:35:14Z

Hrm. Filename behavior in DeepState would probably be... a bad idea unless we're in a VM. :)

Regexps sound ripe for finding interesting bugs.

Maybe a default length of 256 chars? We're really not worrying about memory problems as much, since this comes from std::string, so going for crazy long seems un-necessary, as default anyway.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ideas for fuzz string generation #7

ideas for fuzz string generation #7

tdhock commented Jun 25, 2020 •

edited

Loading

agroce commented Jun 25, 2020

tdhock commented Jun 25, 2020

agroce commented Jun 25, 2020

tdhock commented Jun 25, 2020 via email

agroce commented Jun 27, 2020

tdhock commented Jun 27, 2020 via email

agroce commented Jun 27, 2020

ideas for fuzz string generation #7

ideas for fuzz string generation #7

Comments

tdhock commented Jun 25, 2020 • edited Loading

agroce commented Jun 25, 2020

tdhock commented Jun 25, 2020

agroce commented Jun 25, 2020

tdhock commented Jun 25, 2020 via email

agroce commented Jun 27, 2020

tdhock commented Jun 27, 2020 via email

agroce commented Jun 27, 2020

tdhock commented Jun 25, 2020 •

edited

Loading