Skip to content

FSE 2024: Natural Symbolic Execution-based Testing for Big Data Analytics

License

Notifications You must be signed in to change notification settings

UCLA-SEAL/NaturalSym

Repository files navigation

NaturalSym, proposed in Natural Symbolic Execution-based Testing for Big Data Analytics (FSE24), is a symbolic execution-based tool to generate natural tests for big data analytics. To replicate our evaluation, please go to https://zenodo.org/records/11090237.

Installation

Users can download NaturalSym from github. Ubuntu>=20.04 with Python>=3.12 are suggested.

> apt-get install python3.12 python3-pip ant -y
> pip3 install numpy scipy
> git clone https://github.com/UCLA-SEAL/NaturalSym.git
> cd NaturalSym

Run an example

NaturalSym is a test generator for scala-based Apache Spark programs. To use NaturalSym, users need to provide a scala program under test and a input annotation file, e.g. grades.scala and grades.config.

  • grades.scala contains a method def execute(input1: RDD[String], input2: RDD[String]), which takes two input tables respectively representing students' math and physics scores, and filters out students who fail the total score.
  • grades.config contains user knowledge about the program input. Each line in grades.config corresponds to one input table in grades.scala, e.g. input1 := Discrete("alice","bob") | scipy.binom(100, 0.1). The above annotation means that users give two examples of the first column (student name), i.e. "alice" and "bob". The second column (maths score) should follow a binomial distribution binom(n=100,p=0.1).

To run NaturalSym, try out > ./naturalsym.sh grades.scala grades.config. You'll see the generated tests both in console and grades.tests. The output snippet is shown below which refers to the test generated for the first condition that a student's math grade and physics grade can be found from two tables and the sum is smaller than 60.

Generated tests for Path1 in Rundir/1.smt2
input1.csv
alice,8
input2.csv
alice,34

Try it yourself!

Under the root folder, please execute > ./naturalsym <target.scala> <target.config>. You'll see the output in both console and <target.tests>.

  • Limited by the back-end symbolic execution engine, the target method must be execute and input arguments must be of the shape RDD[String]. Please see our template in template.scala.
  • The configuration file format is shown below. In general, each line of the configuration file declares user knowledge about each column from a input table delimited by "|". User knowledge can be none, an example list, Gaussian distribution, uniform distribution, or any distribution supported by Python scipy library.
<config>     :=     ""    |   <config> "\n" <tab>
<tab>        :=     <tab-name> ":=" <cols>
<cols>       :=     ""    |   <cols> "|" <col>
<col>        :=     <none> | <examples> | <uniform> | <gaussian> | <trunc-gaussian> | <scipy-distr>
<none>       :=     ""
<examples>   :=     Discrete(<examples-delimited-by-comma>)
<uniform>    :=     Uniform(<l-bound>,<r-bound>)
<gaussian>   :=     Gaussian(<mu>,<sigma>)
<trunc-gaussian>:=  <l-bound><=Gaussian(<mu>,<sigma>)<=<r-bound>
<scipy-distr>:=     scipy.<distr-name>(<parameter-list>)

Run our benchmark (Optional)

Users can run > ./NaturalSym/scripts/run1.sh <bench> to run a subject program from our benchmark suite. <bench> should be one of airport,movie1,usedcars,transit,credit,Q1,Q3,Q6,Q7,Q12,Q15,Q19,Q20.

For example, > ./NaturalSym/scripts/run1.sh airport will run NatualSym for NaturalSym/newbench/src/airport/airport.scala with the configuration file NaturalSym/newbench/config/airport.config. Generated tests are under NaturalSym/newbench/geninputs/airport.

About

FSE 2024: Natural Symbolic Execution-based Testing for Big Data Analytics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published