Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: yaml_check internal CLI utility #22

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.vscode
__pycache__
*.rock
.venv/
yaml_checker/yaml_checker.egg-info
111 changes: 111 additions & 0 deletions yaml_checker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# YAML Checker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: let's stick with clayaml since this is an internal tool anyway.


An internal CLI util for formatting and validating YAML files. This project
relies on Pydantic and Ruamel libraries.
Comment on lines +3 to +4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the second line is not necessary, since there is a requirement.txt which lists those.


**Installation**
```bash
pip install -e yaml_checker
```

**Usage**
```
usage: yaml_checker [-h] [-v] [-w] [--config CONFIG] [files ...]
Comment on lines +12 to +13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add the following line here?

An internal CLI util for formatting and validating YAML files.


positional arguments:
files Additional files to process (optional).

options:
-h, --help show this help message and exit
-v, --verbose Enable verbose output.
-w, --write Write yaml output to disk.
--config CONFIG CheckYAML subclass to load
Comment on lines +16 to +22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's be consistent with the trailing dots. Usually we don't have trailing dots in help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the following? Or, something like that.

Suggested change
--config CONFIG CheckYAML subclass to load
--config CONFIG Use pre-defined YAML configurations (e.g. Chisel)

```

**Example**

```bash
# Lets cat a demonstration file for comparison.
$ cat yaml_checker/demo/slice.yaml
Comment on lines +28 to +29
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have this as a bash block and the following yaml as a yaml block for better syntax.

# yaml_checker --config=Chisel demo/slice.yaml

package: grep

essential:
- grep_copyright

# hello: world

slices:
bins:
essential:
- libpcre2-8-0_libs # tests

# another test
- libc6_libs
contents:
/usr/bin/grep:

deprecated:
# These are shell scripts requiring a symlink from /usr/bin/dash to
# /usr/bin/sh.
# See: https://manpages.ubuntu.com/manpages/noble/en/man1/grep.1.html
essential:
- dash_bins
- grep_bins
contents:
# we ned this leading comment
/usr/bin/rgrep: # this should be last

/usr/bin/fgrep:

# careful with this path ...
/usr/bin/egrep: # it is my favorite
copyright:
contents:
/usr/share/doc/grep/copyright:
# Note: Missing new line at EOF

# Now we can run the yaml_checker to format the same file.
# Note how comments are preserved during sorting of lists and
# dict type objects. If you want to test the validator,
# uncomment the hello field.
Comment on lines +67 to +72
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's end the previous code block before line 67, have this text as normal text in the markdown and start another code block after line 72.

$ yaml_checker --config=Chisel yaml_checker/demo/slice.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the previous comment, let's have this as a bash block and the rest as an yaml block for better syntax highlighting.

# yaml_checker --config=Chisel demo/slice.yaml

package: grep

essential:
- grep_copyright

# hello: world

slices:
bins:
essential:
- libc6_libs
- libpcre2-8-0_libs # tests

# another test
contents:
/usr/bin/grep:

deprecated:
# These are shell scripts requiring a symlink from /usr/bin/dash to
# /usr/bin/sh.
# See: https://manpages.ubuntu.com/manpages/noble/en/man1/grep.1.html
essential:
- dash_bins
- grep_bins
contents:
# we ned this leading comment

# careful with this path ...
/usr/bin/egrep: # it is my favorite
/usr/bin/fgrep:
/usr/bin/rgrep: # this should be last
copyright:
contents:
/usr/share/doc/grep/copyright:

```
37 changes: 37 additions & 0 deletions yaml_checker/demo/slice.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# yaml_checker --config=Chisel demo/slice.yaml

package: grep

essential:
- grep_copyright

# hello: world

slices:
bins:
essential:
- libpcre2-8-0_libs # tests

# another test
- libc6_libs
contents:
/usr/bin/grep:

deprecated:
# These are shell scripts requiring a symlink from /usr/bin/dash to
# /usr/bin/sh.
# See: https://manpages.ubuntu.com/manpages/noble/en/man1/grep.1.html
essential:
- dash_bins
- grep_bins
contents:
# we ned this leading comment
/usr/bin/rgrep: # this should be last

/usr/bin/fgrep:

# careful with this path ...
/usr/bin/egrep: # it is my favorite
copyright:
contents:
/usr/share/doc/grep/copyright:
2 changes: 2 additions & 0 deletions yaml_checker/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pydantic==2.8.2
ruamel.yaml==0.18.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's add a line break at the end here.

23 changes: 23 additions & 0 deletions yaml_checker/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from pathlib import Path

from setuptools import find_packages, setup


def read_text(filename):
filepath = Path(__file__).parent / filename
return filepath.read_text()


setup(
name="yaml_checker",
version="0.1.0",
long_description=read_text("README.md"),
packages=find_packages(),
install_requires=read_text("requirements.txt"),
entry_points={
"console_scripts": [
"yaml_checker=yaml_checker.__main__:main",
"clayaml=yaml_checker.__main__:main",
],
},
)
Empty file.
54 changes: 54 additions & 0 deletions yaml_checker/yaml_checker/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import argparse
import logging
from pathlib import Path

from .config.base import YAMLCheckConfigBase

# TODO: display all available configs in help
parser = argparse.ArgumentParser()

parser.add_argument(
"-v", "--verbose", action="store_true", help="Enable verbose output."
)

parser.add_argument(
"-w", "--write", action="store_true", help="Write yaml output to disk."
)

parser.add_argument(
"--config",
type=str,
default="YAMLCheckConfigBase",
help="CheckYAML subclass to load",
)

parser.add_argument(
"files", type=Path, nargs="*", help="Additional files to process (optional)."
)


def main():
args = parser.parse_args()

log_level = logging.DEBUG if args.verbose else logging.INFO
logging.basicConfig(level=log_level)

check_yaml_config = YAMLCheckConfigBase.configs[args.config]

yaml = check_yaml_config()

for file in args.files:
data = yaml.load(file.read_text())
data = yaml.apply_rules(data)
yaml.validate_model(data)

output = yaml.dump(data)

if args.write:
file.write_text(output)
else:
print(output)


if __name__ == "__main__":
main()
14 changes: 14 additions & 0 deletions yaml_checker/yaml_checker/config/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from importlib import import_module
from pathlib import Path

submodule_root = Path(__file__).parent
package_name = __name__

# import all submodules so our configs registry is populated
for submodule in submodule_root.glob("*.py"):
submodule_name = submodule.stem

if submodule_name.startswith("_"):
continue

import_module(f"{__name__}.{submodule_name}")
99 changes: 99 additions & 0 deletions yaml_checker/yaml_checker/config/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
import fnmatch
import logging
from io import StringIO
from pathlib import Path
from typing import Any

from pydantic import BaseModel
from ruamel.yaml import YAML


class YAMLCheckConfigReg(type):
def __init__(cls, *args, **kwargs):
"""Track all subclass configurations of YAMLCheckConfigBase for CLI"""
super().__init__(*args, **kwargs)
name = cls.__name__
if name not in cls.configs:
cls.configs[name] = cls


class YAMLCheckConfigBase(metaclass=YAMLCheckConfigReg):
configs = {} # Store configs for access from CLI
rules = {} # map glob strings to class method names

class Model(BaseModel):
"""Pydantic BaseModel to provide validation"""

class Config:
extra = "allow"

class Config:
"""ruamel.yaml configuration set before loading."""

preserve_quotes = True
width = 80
map_indent = 2
sequence_indent = 4
sequence_dash_offset = 2

def __init__(self):
"""YAMLCheck Base Config"""
self.yaml = YAML()

# load Config into yaml
for attr in dir(self.Config):
if attr.startswith("__"):
continue

attr_val = getattr(self.Config, attr)

if hasattr(self.yaml, attr):
setattr(self.yaml, attr, attr_val)
else:
raise AttributeError(f"Invalid ruamel.yaml attribute: {attr}")

def load(self, yaml_str: str):
"""Load YAML data from string"""
data = self.yaml.load(yaml_str)

return data

def dump(self, data: Any):
"""Dump data to YAML string"""
with StringIO() as sio:
self.yaml.dump(data, sio)
sio.seek(0)

return sio.read()

def validate_model(self, data: Any):
"""Apply validate data against model"""
if issubclass(self.Model, BaseModel):
_ = self.Model(**data)

def _apply_rules(self, path: Path, data: Any):
"""Recursively apply rules starting from the outermost elements."""
logging.debug(f"Walking path {path}.")

# recurse over dicts and lists
if isinstance(data, dict):
for key, value in data.items():
data[key] = self._apply_rules(path / str(key), value)

elif isinstance(data, list):
for index, item in enumerate(data):
data[index] = self._apply_rules(path / str(item), item)

# scan for applicable rules at each directory
# TODO: selection of rules here does not scale well and should be improved
for key, value in self.rules.items():
if fnmatch.fnmatch(path, key):
logging.debug(f'Applying rule "{value}" at {path}')
rule = getattr(self, value)
data = rule(path, data)

return data

def apply_rules(self, data: Any):
"""Walk all objects in data and apply rules where applicable."""
return self._apply_rules(Path("/"), data)
Loading
Loading