Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ideas for fuzz string generation #7

Open
tdhock opened this issue Jun 25, 2020 · 7 comments
Open

ideas for fuzz string generation #7

tdhock opened this issue Jun 25, 2020 · 7 comments

Comments

@tdhock
Copy link

tdhock commented Jun 25, 2020

an interesting blog post by Tomas Kalibera https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html explains that automatic testing (using typical fixed inputs) will fail to catch some errors ... can RcppDeepState generate such exotic string inputs?? @agroce do think you DeepState could find such problems?
Most interesting/relevant paragraph from the blog:
However, R has been careful not to introduce UTF-8 strings for things the user has not already intentionally made UTF-8, because of problems that this would cause for packages not handling encodings correctly. Such packages will mysteriously start failing when incorrectly using strings in UTF-8 but thinking they were in native encoding. Such problems will not be found by automated testing, because tests don’t use such unusual inputs and are often run in English or similar locales.

@agroce
Copy link

agroce commented Jun 25, 2020

If an Rcpp function takes a C string argument (do they?), DeepState will allow that to be any (null-delimited, length-controlled) byte pattern, if you use one of the string generation functions, so I think any bytes that hose the code would at least be possible to generate. Whether they would actually be generated would depend on how dense in the space they are, what fuzzer we're using, how much control-flow signal the fuzzer gets, etc. But in this case, I'd guess "oh yes, DeepState will blow this wide open, even with the default dumb fuzzer."

Main downside to finding these is that this post makes me think a lot of people will say "if you hand me this weird junk, I don't expect my code to work!" -- is that going to be a problem?

@tdhock
Copy link
Author

tdhock commented Jun 25, 2020

yes Rcpp functions can take string arguments, and in fact std::string (single string) and Rcpp::CharacterVector (vector of strings) are among most frequent arguments that we plan to implement a fuzzer for.

[1] "Rcpp::NumericVector"   "Rcpp::NumericMatrix"   "arma::mat"            
[4] "std::string"           "Rcpp::CharacterVector" "int"                  
[7] "Rcpp::IntegerVector"   "double"               

you may be right about the "I don't expect my code to work with weird inputs" problem...

@agroce
Copy link

agroce commented Jun 25, 2020

Is there an easy way to guess what size string the code is expecting? Note you'll have to convert the DeepState generated C-string to std::string, don't think we have a generator for std::string (we can add one, it'd just be a wrapper around the C string generator).

@tdhock
Copy link
Author

tdhock commented Jun 25, 2020 via email

@agroce
Copy link

agroce commented Jun 27, 2020

Yeah, that part is trivial. I am wondering about size defaults. What are these strings used for, usually? Keyword type stuff? Identifiers?

@tdhock
Copy link
Author

tdhock commented Jun 27, 2020 via email

@agroce
Copy link

agroce commented Jun 27, 2020

Hrm. Filename behavior in DeepState would probably be... a bad idea unless we're in a VM. :)

Regexps sound ripe for finding interesting bugs.

Maybe a default length of 256 chars? We're really not worrying about memory problems as much, since this comes from std::string, so going for crazy long seems un-necessary, as default anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants