Table of Contents

About The Project
Getting Started
Usage
Configuration
Contributing
License
Acknowledgments

About The Project

Yeager is a framework for performing quality tests on data files.

(back to top)

Getting Started

Prerequisites

This code has been developed and tested with Python 3.11

Depenency List

Python version 3.11 or later

Installation

Navigate to https://github.com/seanmurphy1661/yeager
Download and extract yeager-main.zip

(back to top)

Usage

Autoflight

Analyzes sample file and builds a yaml file that can be used for testing production files.

autoflight.py filename 
              [-c|--config configfile] 
              [-d|--delimiter delimiter] 
              [-o|--overwrite] 
              [-s|--sample sample_size]

Yeager

Verifies files against a set of tests defined in the configfile.

yeager.py configfile

(back to top)

Configuration

The configuration file, aka configfile, is a specification for the tests that will be preformed against the file named in input_filename. As shown below, the configuration file also includes processing directives that govern processing.

Autoflight.py will create a fully functional configuration file. This is particularly helpful for files with many columns.

File settings

input_filename: specified the file to be tested

input_filetype: specifies the type of file

"csv"

column_delimiter: used to separate columns in a row

any valid character

number_of_columns: used to verify row size

Control settings

dump_throttle: specifies the number of rows to check

Use 0 to disable

dump_header: contols printing header row

dump_config: contols print config from file.yaml

Reporting

findings_filename: any valid file name that can

Stats

The stats section is used to enable cProfile statistics.

enabled: True|False

file: specify the name of the cProfile output

report: specify the name of the report

Test Options

name: name of column that will be tested.

for csv files, a column in the header must match or the entire test is rejected

range: specifies a numeric range

range is specified as an array '[min,max]'

date_range: specifies a date range test. True is returned if the date in questions is between the min and max values, inclusive.

range is specified as '[min,max]'
- both min and max are specified in ISO 88601 format, except ordinal format.

regex: a regular expression compatible with the Python pe library.

required: specifies if content must be present

True: Zero length strings are flagged as a finding
False: Zero length strings are allowed

width: specifies column size properties

width is specified as an array '[min,max]'

type: specifies a datatype check. default datatype is string. Valid types:

string - no validation
date - use dateutil.parse to check for date datatype conformance
number - use regex to verify number data type
money - use regex to verify money format

(back to top)

Appendix

Example file.yaml

input_filename: "file_to_test.csv"
input_filetype: "csv"
dump_throttle: 0
dump_header: True
dump_config: True
column_delimiter: ","
number_of_columns: 10
findings_filename: "file_to_test.csv.findings"

option:
  - test: 
      name: column_to_test
      regex: "^[0-9]{1,2}$"
      range: [2,99]
  .
  .
  .

Testing for a number between 2 and 99.

Common expressions

Number : '^[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)$'
BinaryChoice : "^Choice 1$|^Choice 2$"
Integer: "^(0|[1-9][0-9]*)$"
Two digit code, Not required: "^[0-9]{0,2}$"

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the GNU GENERAL PUBLIC LICENSE. See LICENSE.txt for more information.

(back to top)

Acknowledgments

Best README Template: https://github.com/othneildrew/Best-README-Template

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

About The Project

Getting Started

Prerequisites

Depenency List

Installation

Usage

Autoflight

Yeager

Configuration

File settings

Control settings

Reporting

Stats

Test Options

Appendix

Example file.yaml

Common expressions

Contributing

License

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

About The Project

Getting Started

Prerequisites

Depenency List

Installation

Usage

Autoflight

Yeager

Configuration

File settings

Control settings

Reporting

Stats

Test Options

Appendix

Example file.yaml

Common expressions

Contributing

License

Acknowledgments