Skip to content

Latest commit

 

History

History
189 lines (135 loc) · 5.46 KB

README.md

File metadata and controls

189 lines (135 loc) · 5.46 KB

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Configuration
  5. Contributing
  6. License
  7. Acknowledgments

About The Project

Yeager is a framework for performing quality tests on data files.

(back to top)

Getting Started

Prerequisites

This code has been developed and tested with Python 3.11

Depenency List

  • Python version 3.11 or later

Installation

  1. Navigate to https://github.com/seanmurphy1661/yeager
  2. Download and extract yeager-main.zip

(back to top)

Usage

Autoflight

Analyzes sample file and builds a yaml file that can be used for testing production files.

autoflight.py filename 
              [-c|--config configfile] 
              [-d|--delimiter delimiter] 
              [-o|--overwrite] 
              [-s|--sample sample_size]

Yeager

Verifies files against a set of tests defined in the configfile.

yeager.py configfile

(back to top)

Configuration

The configuration file, aka configfile, is a specification for the tests that will be preformed against the file named in input_filename. As shown below, the configuration file also includes processing directives that govern processing.

Autoflight.py will create a fully functional configuration file. This is particularly helpful for files with many columns.

File settings

input_filename: specified the file to be tested

input_filetype: specifies the type of file

  • "csv"

column_delimiter: used to separate columns in a row

  • any valid character

number_of_columns: used to verify row size

Control settings

dump_throttle: specifies the number of rows to check

  • Use 0 to disable

dump_header: contols printing header row

dump_config: contols print config from file.yaml

Reporting

findings_filename: any valid file name that can

Stats

The stats section is used to enable cProfile statistics.

enabled: True|False

file: specify the name of the cProfile output

report: specify the name of the report

Test Options

name: name of column that will be tested.

  • for csv files, a column in the header must match or the entire test is rejected

range: specifies a numeric range

  • range is specified as an array '[min,max]'

date_range: specifies a date range test. True is returned if the date in questions is between the min and max values, inclusive.

  • range is specified as '[min,max]'
    • both min and max are specified in ISO 88601 format, except ordinal format.

regex: a regular expression compatible with the Python pe library.

required: specifies if content must be present

  • True: Zero length strings are flagged as a finding
  • False: Zero length strings are allowed

width: specifies column size properties

  • width is specified as an array '[min,max]'

type: specifies a datatype check. default datatype is string. Valid types:

  • string - no validation
  • date - use dateutil.parse to check for date datatype conformance
  • number - use regex to verify number data type
  • money - use regex to verify money format

(back to top)

Appendix

Example file.yaml

input_filename: "file_to_test.csv"
input_filetype: "csv"
dump_throttle: 0
dump_header: True
dump_config: True
column_delimiter: ","
number_of_columns: 10
findings_filename: "file_to_test.csv.findings"

option:
  - test: 
      name: column_to_test
      regex: "^[0-9]{1,2}$"
      range: [2,99]
  .
  .
  .

Testing for a number between 2 and 99.

Common expressions

  • Number : '^[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)$'
  • BinaryChoice : "^Choice 1$|^Choice 2$"
  • Integer: "^(0|[1-9][0-9]*)$"
  • Two digit code, Not required: "^[0-9]{0,2}$"

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the GNU GENERAL PUBLIC LICENSE. See LICENSE.txt for more information.

(back to top)

Acknowledgments

(back to top)