Table of Contents
Yeager is a framework for performing quality tests on data files.
This code has been developed and tested with Python 3.11
- Python version 3.11 or later
- Navigate to https://github.com/seanmurphy1661/yeager
- Download and extract yeager-main.zip
Analyzes sample file and builds a yaml file that can be used for testing production files.
autoflight.py filename
[-c|--config configfile]
[-d|--delimiter delimiter]
[-o|--overwrite]
[-s|--sample sample_size]
Verifies files against a set of tests defined in the configfile.
yeager.py configfile
The configuration file, aka configfile, is a specification for the tests that will be preformed against the file named in input_filename. As shown below, the configuration file also includes processing directives that govern processing.
Autoflight.py will create a fully functional configuration file. This is particularly helpful for files with many columns.
input_filename: specified the file to be tested
input_filetype: specifies the type of file
- "csv"
column_delimiter: used to separate columns in a row
- any valid character
number_of_columns: used to verify row size
dump_throttle: specifies the number of rows to check
- Use 0 to disable
dump_header: contols printing header row
dump_config: contols print config from file.yaml
findings_filename: any valid file name that can
The stats section is used to enable cProfile statistics.
enabled: True|False
file: specify the name of the cProfile output
report: specify the name of the report
name: name of column that will be tested.
- for csv files, a column in the header must match or the entire test is rejected
range: specifies a numeric range
- range is specified as an array '[min,max]'
date_range: specifies a date range test. True is returned if the date in questions is between the min and max values, inclusive.
- range is specified as '[min,max]'
- both min and max are specified in ISO 88601 format, except ordinal format.
regex: a regular expression compatible with the Python pe library.
required: specifies if content must be present
- True: Zero length strings are flagged as a finding
- False: Zero length strings are allowed
width: specifies column size properties
- width is specified as an array '[min,max]'
type: specifies a datatype check. default datatype is string. Valid types:
- string - no validation
- date - use dateutil.parse to check for date datatype conformance
- number - use regex to verify number data type
- money - use regex to verify money format
input_filename: "file_to_test.csv"
input_filetype: "csv"
dump_throttle: 0
dump_header: True
dump_config: True
column_delimiter: ","
number_of_columns: 10
findings_filename: "file_to_test.csv.findings"
option:
- test:
name: column_to_test
regex: "^[0-9]{1,2}$"
range: [2,99]
.
.
.
Testing for a number between 2 and 99.
- Number : '^[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)$'
- BinaryChoice : "^Choice 1$|^Choice 2$"
- Integer: "^(0|[1-9][0-9]*)$"
- Two digit code, Not required: "^[0-9]{0,2}$"
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the GNU GENERAL PUBLIC LICENSE. See LICENSE.txt
for more information.
- Best README Template: https://github.com/othneildrew/Best-README-Template