Note
|
This chapter is being reworked to cover both PropER and QuickCheck. |
Property based testing is a field of testing that represents a major improvement over unit testing in many ways. While a unit test will try one or two possible inputs to a function to ensure that it is doing the correct thing, a property based testing system will use a set of rules to apply hundreds or thousands of inputs to a system to ensure that they all act correctly.
With Unit testing we have been focusing on covering a high percentage of the code, with Property Based testing we can not only cover a high percentage of the code but also attempt to statisticly cover the problem space as well. For most interesting problems we will probably not get every possible input, but enough to find any weird corner cases.
The problem from the developer’s point of view is that this is as this is a newer form of testing we the programming community may not be as familiar with it and may not understand how and when to best apply it to get optimal results.
Furthermore the two main tools (QuickCheck and PropER) both suffer from a lack of documentation and examples of how to build these tests for our code in non trivial cases. It is my aim to improve the documentation for both tools, with the hope that the developer community can use both to lead to better software.
QuickCheck was invited by John Hughes and his team at Chalmers University in Gotenburg Sweden. They originally created QuickCheck for Haskell, then moved it to Erlang.
They formed a company Quviq to commercialize the technology. There are two versions of QuickCheck, one is commercial, and requires a licence. They also have a free to use (though closed sourced) version that is somewhat limited in function.
PropER was created by Kostis Sagonas and his students at Upsala University and NTUA. It is similar to QuickCheck but with some differences.
QuickCheck is proprietary software, while PropER is licenced under the GPL. This alone may cause you to choose to use one or the other.
Both QuickCheck and Proper are testing frameworks that allow you to use a model to create random inputs or events and run your program against them. The two packages are broadly similar with a few key differences that are worth noting.
PropER can use the language of specs and types that was created for dialyzer to build tests. While this is not applicable in all cases it can often get you some basic testing for free or close to it.
QuickCheck does not have this feature, but have some more advanced features for testing a FSM. It also has some features for testing C code. QuickCheck also may be somewhat faster and have better shrinking features.
Most of what is in this chapter will apply to both packages, however there are details that will different. I will do my best to note where those are.
However there is an art of how to do this. You need to find a way to test the code such that you are not testing it against itself. If you say f(A) =:= f(A) then you probably have not learned very much from your tests, other than the fact that a function returns consistent output for consistent input.
This can work very well for some types of code but is more difficult for others, This chapter will explore how I applied PropER to the tiny_pq library which is used by Chicago Boss as part of its TinyMQ, as well as other projects.
TinyMQ has a tree structure in which each element looks like this, where the keys are integers and the values are a possible list. We want to generate an initial starting state and then apply an operation to it and validate the end result. For example one possible operation is to change the priority of an item. In this case we want to verify that after we do it that it does not exist in the list for the old priority but does exist for the new one.
3 -> [ 5, 9, 11]
To instal proper just add the proper repository to your project’s rebar.config file like this and run the standard rebar get-deps compile command
{proper, ".*", {git, "git://github.com/manopapad/proper.git", "HEAD"}},
Suppose that we have a function that removes an element from a list (Yes there is one in the library, its an example), as in this example. And let us assume all of the elements are integers, just to keep things simple. Then we would like to assert that given an element and a list if we apply the function to the list then the resulting element might would not have that element in it.
If we used a classical unit test we might setup test cases like this:
remove_from_list_test() ->
?assertEqual([], remove_from_list(3,[3])),
?assertEqual([1], remove_from_list(2,[1,2])),
?assertEqual([1], remove_from_list(2,[1])).
And we might assume that we have covered all the cases. However there is a bug in this code, in that if it will not do the right thing if an element is duplicated, in that case it will only remove the first instance of that element.
link:proper/remove_from_list.erl[role=include]
We can test this with the property shown above, the function prop_not_in_list/0 states for an integer, El, and a list of integers, List, that El will be removed from the list by the function remove_from_list/2.
This will then generate a lot of random values until it can find a counter example. When I ran the property It came up with this counter example:
{2,[-53,39,2,11,6,72,2,-6,-10,-24,0,-14,-14,-13,18,17,31,15,71,-6,10]}
However this example has more than 20 elements in the list (and there is no reason why it could not have had 250 or more). So the exact problem might be hard to find. In technical terms this counter example could be said to have a high signal to noise ratio.
As such PropER will attempt to shrink the counter example to the most simplified form it can find. In this case after shrinking it found the example of {0, [0,0]} which is expresses the idea that a duplicate item is what is exposing the bug.
{0,[0,0]}
When working with Property based testing you will hear the term shrinking a lot. The idea of shrinking is that since PBT uses randomness to find issues then the problem cases are often very complex with a lot of unimportant details in them, (See previous example). In general shrinking will apply several rules to any created values.
Type | Operation |
---|---|
Integer |
Move closer to 0 |
Float |
Move closer to 0 |
Atom |
Drop Chars |
Bitstring |
Drop bits, convert 1’s to 0’s |
Function |
return simpler values |
Tuple |
Shrink elements |
Loose Tuple |
Drop or simplify Elements |
List |
Drop or simplify Elements |
Vector |
Shrink Elements |
Fixed List |
Shirnk Elements |
Weighted Union (oneof/1) |
Move to head of list of options |
Literal |
None |
If you can derive your tests directly from the -spec of a function then you can run that property directly by using the function proper:check_spec/1 and passing it a MFA tuple. For example this test checks a function from the Chicago Boss boss_compiler module. As long as that function’s spec defines what it will do completely then this makes a very good test.
To test a functions spec you can call the command proper:check_spec/1 from the Erlang REPL. This will use the function spec to generate 100 random tests and ensure that they all pass. If they do not pass it will then try to shrink the failing case. In this case the function passes all 100 tests so check_spec will return true.
(erlang@sag)4> proper:check_spec({boss_compiler, flatten_token_locations, 1}).
....................................................................................................
OK: Passed 100 test(s).
true
unpack_id_test() ->
?assert(proper:check_spec({boss_db_adapter_mongodb,unpack_id, 2},
[{to_file, user}])),
ok.
The problem is that Erlang’s functional specs are not always able to define all that we might want to do. For example there is no way to say that a binary should only consist of valid UTF-8 strings.
If you want to test some invariant of a function that can not be expressed only with the spec of a function then you can create your own properties of the function for proper to run. The most basic generator can be done with the ?FORALL/3 macro as shown in this example. This macro takes three parameters A tuple of variables that will be substituted into the function, generators for those variables, and then a block to test that.
prop_reverse_list() ->
?FORALL(List, list(any()),
begin
L =:= reverse(reverse(List))
end).
In this example we only have one variable List which is a list (the type of the elements does not really matter in this case, so we will give it a type of "any()", We will then assert that for all lists that if we reverse them twice we will get the original list back. The block should return a boolean.
In general instead of any it is often better to use a list of integers, any can create atoms which can fill up the VM’s atom table. It can also create large recursive data structures, which can cause the test to time out.
%%TODO Expand
If we want to limit the generated in lists in some way we can use the ?IMPLIES/2 macro which will reject items that do not match. For example if we want to modify the above property so that we only tests lists that have a length that is greater than 5 we could do this. When you run this you will see interspersed into the periods of the test an x which represent the times that the property rejects the generated macro.
prop_reverse_list() ->
?FORALL(List, list(any()),
?IMPLIES(length(List) > 5,
begin
L =:= reverse(reverse(List))
end)).
If you have more than one condition then you can stack the implies or just put all your conditions into an external function that will return a boolean and use that as your guard clause.
This is also useful to ensure that an element is not in a dictionary or other key/value data structure. If you need to ensure that it is present then generate it and insert it yourself.
While the use of the type system will sometimes provide a way to generate the mock data for your tests there are other times when it will not do so. In cases where you need more that just the filtering that the ?IMPLIES macro can provide you will need to generate your own data.
Proper provides a rich set of tools to allow you to do this, you can find a reference on their website here: http://proper.softlab.ntua.gr/doc/proper_types.html.
in the case where the input might be a finite set of inputs, we can very easily provide proper with a list and use the oneof/1 function to have it select one of them.
In this case our function expects to get the day of the week as a string, and it could be either with an initial capital letter or without. As we can enumerate the options we simply provide them as a list. When we run our property it will pick one at random.
link:proper/weekday_test.erl[role=include]
The way PropER will shrink this input is by moving to the front of the list. So If our test finds an error when it sends in "friday", it will attempt to use earlier days of the week until "sunday". In general if you find that the first element of the list is causing the error that is probably a sign that all inputs are broken.
It may happen that you want to try a wide range of dictionary words as input. Thankfully most Unix systems come with a dictionary file or two somewhere on the system for a spell check program. (If your system does not have this you may want to install ispell or aspell). In addition dictionaries are available for a wide range of languages, so if you need to test Hebrew, German, Greek, Russian etc you can probably find those in your standard package repositories as well.
In my system the spelling dictionaries are in the /usr/share/dict directory, (yours may be somewhere else). We could just read the file in at run time, but that might be rather slow as we would end up reading it many times.
Thankfully Ulf Wiger has a nice library of parse transforms that will enable us to do it at compile time. the ct_expand parse transform will execute the ct_expand:term/1 function at compile time and replace it with the result. So we can use something like this example which will read in the file (on my computer is is about 99,000 words BTW) and then load it into an Erlang list at compile time.
link:proper/words.erl[role=include]
Once again when shrinking the system will tend to move to the front of the list, which is in alphabetical order.
If we want to create something that sort of looks like text, we can use this to create a list of words. However to get something that really looks like text we will also need to add spaces between words. (the idea of splitting text into paragraphs is left as an exercise for the reader).
Sometimes the default shrinking behavior is not what an property needs. In this case you will want to define how your generator shrinks its data.
TODO: Expand
There are many times that we need to test functions that can be reversed. By reversed what I mean is that For any I, f(g(I)) =:= I For example a reader and a writer would make a good pair, In theory if we data any legal data structure, write it out and then read it back in we should get the original data structure back.
TODO: JSX test here
TODO: Show database test here
Sometimes you will wish to test a property that can be checked by some code in an external language. For example if you have some code that will generate a JSON structure, and you would like to know if the output is a valid JSON document. One obvious way to this is use an existing JSON program written in another language.
There are programs to validate JSON in most languages, in this case the python json.tool will provide a good check on our code. (If python is not your thing, then feel free to substitute Javascript, Perl, PHP, Ruby or other language of your choice.)
link:proper/test_json_vs_python.erl[role=include]
Property based tests work on the principle that we will create a number of random tests and ensure that they all pass your model. The question is how many tests to run, too few and we won’t find as many bugs, too many and the tests will take too long to run. So how to find the optimal number of tests?
By default Proper will run 100 tests, and absent some reason to change it this is probably a good default. Remember that if you are running it in your CI type system then to will run 100 every time.
However if your property tends to time out then reducing that number can help. In the ChicagoBoss Framework there are some tests that we run only 25 or 40 times as part of our regular build, as anything larger will cause a timeout.
We may wish to run a property based test from the Erlang REPL. by calling proper:QuickCheck(my_prop()) This is good for exploratory testing and the like. However you may also wish to run your property based tests is part of a larger test suite. The easiest way to do that is to wrap it in an eunit assertion like this.
delete_test() ->
?assert(proper:quickcheck(delete_prop(), [{to_file,user}])),
ok
A few things to note, first of all function proper:quickcheck/2 will return true if everything passed so the eunit ?assert macro will run the property.Secondly you will note the option of {to_file,user} this is because eunit will capture output of the tests. By passing the option we get around this.
The other issue is that both proper and eunit define a macro ?LET. To ensure that the proper version of this macro is used include proper before eunit, or you could just have the PropER and EUnit tests in separate modules.
So far we have covered the case where we are using Proper For what is in effect unit testing, where we want to ensure that the logic of some application is correct.
But what if we wish to ensure that for example our connection to the database is working well. In this case we have an obvious set of properties that we can exploit, in that we can test that the round trip to the database works. If we Create a Record, write it to the database, then read it back from the database it should be the same (with the possible exception of ID columns and timestamps.
In doing this for the Chicago Boss I discovered a major bug in the MySQL integration. I found that the escaping of strings was not being done correctly. If you saved the string "\\ " it would strip out the 2 backslashes. I don’t think I would have found this by manual testing but proper found it in the first block of 100 tests. It actually found it in a much larger string, but then reduced it down to the "\\ " that I was able to test. Once I had found the problem then actually solving it was pretty easy.
As for how I solved it I connected boss to a mysql virtual machine and had a table with the correct fields. Then I used this code to exercise write a bunch of records to the database and read them back, with the assumption that the returned value should be the same as the written value.
link:proper/mysql_test.erl[role=include]