Skip to content

Pydargs allows configuring a (Pydantic) dataclass through command line arguments.

License

Notifications You must be signed in to change notification settings

rogiervandergeer/pydargs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pydargs

Easily configure a CLI application using a (Pydantic) dataclass.

Usage

Pydargs instantiates a dataclass that is used as (configuration) input of your entrypoint from command line arguments. For example, in example.py:

from dataclasses import dataclass
from pydargs import parse


@dataclass
class Config:
    number: int
    some_string: str = "abc"


def main(config: Config) -> None:
    """Your main functionality"""
    print(f"> Hello {config.number} + {config.some_string}")


if __name__ == "__main__":
    config = parse(Config)
    main(config)

Here the Config dataclass serves as input (configuration) of the main function. Pydargs facilitates instantiating the config instance, allowing the user to use command line arguments to set or override the values of its fields:

$ python example.py --number 1
> Hello 1 abc
$ python example.py --number 2 --some-string def
> Hello 2 def
$ python example.py --help
usage: example.py [-h] --number NUMBER [--some-string SOME_STRING]

options:
  -h, --help            show this help message and exit
  --number NUMBER
  --some-string SOME_STRING
                        (default: abc)

This saves you from having to maintain boilerplate code such as

from argparse import ArgumentParser
from dataclasses import dataclass


@dataclass
class Config:
    number: int
    some_string: str = "abc"


def main(config: Config) -> None:
    """Your main functionality"""
    print(f"> Hello {config.number} + {config.some_string}")


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument("--number", type=int)
    parser.add_argument("--some-string", dest="some_string", default="abc")
    namespace = parser.parse_args()
    config = Config(number=namespace["number"], some_string=namespace["some_string"])
    main(config)

Aside from that, pydargs supports:

Installation

Pydargs can be installed with your favourite package manager. For example:

pip install pydargs

ArgumentParser arguments

It's possible to pass additional arguments to the underlying argparse.ArgumentParser instance by providing them as keyword arguments to the parse function. For example:

config = parse(Config, prog="myprogram", allow_abbrev=False)

will disable abbreviations for long options and set the program name to myprogram in help messages. For an extensive list of accepted arguments, see the argparse docs.

Supported Field Types

The dataclass can have fields of the base types: int, float, str, bool, as well as:

  • Literals comprised of those types.
  • Enums, although these are not recommended as they do not play nice in the help messages. Only the enum name is accepted as a valid input, not the value.
  • Bytes, with an optional encoding metadata field: a_value: bytes = field(metadata=dict(encoding="ascii")), which defaults to utf-8.
  • Date and datetime, with an optional date_format metadata field: your_date: date = field(metadata=dict(date_format="%m-%d-%Y")). When not provided dates in ISO 8601 format are accepted.
  • Lists of those types, either denoted as e.g. list[int] or Sequence[int]. Multiple arguments to a numbers: list[int] field can be provided as --numbers 1 2 3. A list-field without a default will require at least a single value to be provided. If a default is provided, it will be completely replaced by any arguments, if provided.
  • Optional types, denoted as e.g. typing.Optional[int] or int | None (for Python 3.10 and above). Any argument passed is assumed to be of the provided type and can never be None.
  • Unions of types, denoted as e.g. typing.Union[int, str] or int | str. Each argument will be parsed into the first type that returns a valid result. Note that this means that str | int will always result in a value of type str.
  • Any other type that can be instantiated from a string, such as Path.
  • Dataclasses that, in turn, contain fields of supported types. See Nested Dataclasses.
  • A union of multiple dataclasses, that in turn contain fields of supported types, which will be parsed in Subparsers.

Overriding defaults from a file

Pydargs can also consume values from a JSON- or YAML-formatted file. To enable this, pass add_config_file_argument=True to the parse function, which will add a --config-file command line argument. If provided, the values from this file will override the defaults of the dataclass fields. Any command line arguments passed will override the defaults provided in the file.

For example, with the following contents in defaults.json:

{
  "a": 1,
  "b": "abc"
}

then running this code

from dataclasses import dataclass
from pydargs import parse


@dataclass
class Config:
  a: int = 2
  b: str = "def"


if __name__ == "__main__":
  config = parse(Config, add_config_file_argument=True)

with the following arguments

entrypoint --config-file defaults.json --b xyz

would result in Config(a=1, b="xyz").

Note that:

  • Only defaults can be overridden. Any dataclass fields without a default must always be provided on the command line.
  • Any extra keys present inside the file but not matching a field in the dataclass will be ignored, and if any are present a warning will be raised.
  • In order to load defaults from a YAML-formatted file, PyYAML has to be installed. To install it with pydargs, run pip install pydargs[pyyaml].
  • The parsed dataclass may not have a field named config_file.
  • The defaults provided in the file will not be type-casted by pydargs, and hence only JSON-native types are supported.

Metadata

Additional options can be provided to the dataclass field metadata.

The following metadata fields are supported:

positional

Set positional=True to create a positional argument instead of an option.

from dataclasses import dataclass, field

@dataclass
class Config:
  argument: str = field(metadata=dict(positional=True))

as_flags

Set as_flags=True for a boolean field:

from dataclasses import dataclass, field

@dataclass
class Config:
  verbose: bool = field(default=False, metadata=dict(as_flags=True))

which would create the arguments --verbose and --no-verbose to set the value of verbose to True or False respectively, instead of a single option that requires a value like --verbose True.

parser

Provide a custom type converter that parses the argument into the desired type. For example:

from dataclasses import dataclass, field
from json import loads

@dataclass
class Config:
  list_of_numbers: list[int] = field(metadata=dict(parser=loads))

This would parse --list-of-numbers [1, 2, 3] into the list [1, 2, 3]. Note that the error message returned when providing invalid input is lacking any details. Also, no validation is performed to verify that the returned type matches the field type. In the above example, --list-of-numbers '{"a": "b"}' would result in list_of_numbers being the dictionary {"a": "b"} without any kind of warning.

short_option

Provide a short option for a field, which can be used as an alternative to the long option. For example,

from dataclasses import dataclass, field

@dataclass
class Config:
  a_field_with_a_long_name: int = field(metadata=dict(short_option="-a"))

would allow using -a 42 as an alternative to --a-field-with-a-long-name 42.

Ignoring fields

Fields can be ignored by adding the ignore_arg metadata field:

@dataclass
class Config:
    number: int
    ignored: str = field(metadata=dict(ignore_arg=True))

When indicated, this field is not added to the parser and cannot be overridden with an argument.

Fields excluded from the __init__()

Fields not included in the __init__() (i.e. with init=False, see here ) will be ignored by pydargs and cannot be overridden with an argument.

@dataclass
class Config:
    number: int
    ignored: str = field(init=False)

This could be useful in combination with a __post_init__() method to set the value of the field.

help

Provide a brief description of the field, used in the help messages generated by argparse. For example, calling your_program -h with the dataclass below,

from dataclasses import dataclass, field

@dataclass
class Config:
  an_integer: int = field(metadata=dict(help="any integer you like"))

would result in a message like:

usage: your_program [-h] [--an-integer AN_INTEGER]

optional arguments:
  -h, --help               show this help message and exit
  --an-integer AN_INTEGER  any integer you like

metavar

Override the displayed name of an argument in the help messages generated by argparse, as documented here.

For example, with the following dataclass,

from dataclasses import dataclass, field

@dataclass
class Config:
  an_integer: int = field(metadata=dict(metavar="INT"))

calling your_program -h would result in a message like:

usage: your_program [-h] [--an-integer INT]

optional arguments:
  -h, --help        show this help message and exit
  --an-integer INT

Nested Dataclasses

Dataclasses may be nested; the type of a dataclass field may be another dataclass type:

from dataclasses import dataclass

@dataclass
class Config:
  field_a: int
  field_b: str = "abc"


@dataclass
class Base:
  config: Config
  verbose: bool = False

Argument names of fields of the nested dataclass are prefixed with the field name of the nested dataclass in the base dataclass. Calling pydargs.parse(Base, ["-h"]) will result in something like:

usage: your_program.py [-h] --config-field-a CONFIG_FIELD_A
                            [--config-field-b CONFIG_FIELD_B]
                            [--verbose VERBOSE]

options:
  -h, --help            show this help message and exit
  --verbose VERBOSE     (default: False)

config:
  --config-field-a CONFIG_FIELD_A
  --config-field-b CONFIG_FIELD_B
                        (default: abc)

Please be aware of the following:

  • The default (factory) of fields with a dataclass type is ignored by pydargs, which may yield unexpected results. E.g., in the example above, config: Config = field(default_factory=lambda: Config(field_b="def")) will not result in a default of "def" for field_b when parsed by pydargs. Instead, set field_b: str = "def" in the definition of Config. If you must add a default, for example for instantiating your dataclass elsewhere, do config: Config = field(default_factory=Config), assuming that all fields in Config have a default.
  • Nested dataclasses can not be positional (although fields of the nested dataclass can be).
  • Argument names must not collide. In the example above, the Base class should not contain a field named config_field_a.
  • When reading defaults from a file, the data inside the file may be nested like the dataclasses as well as flat with prefixes. E.g. {"config": {"field_b": "xyz"} has the same effect as {"config_field_b": "xyz"}. Pydargs will raise an exception in the case of collisions between keys in alternative formats.

Subparsers

Dataclasses can contain a field with a union-of-dataclasses type, e.g.:

from dataclasses import dataclass, field
from typing import Union


@dataclass
class Command1:
  field_a: int
  field_b: str = "abc"


@dataclass
class Command2:
  field_c: str = field(metadata=dict(positional=True))


@dataclass
class Base:
  command: Union[Command1, Command2]
  verbose: bool = False

This will result in sub commands which allow calling your entrypoint as entrypoint --verbose Command1 --field-a 12.

Calling pydargs.parse(Base, ["-h"]) will result in something like:

usage: your_program.py [-h] [--verbose VERBOSE] {Command1,command1,Command2,command2} ...

options:
  -h, --help            show this help message and exit
  --verbose VERBOSE     (default: False)

action:
  {Command1,command1,Command2,command2}

Note that:

  • Also lower-case command names are accepted.
  • Any dataclass can not contain more than one subcommand-field.
  • Sub-commands can be nested and mixed with nested dataclasses.
  • Any positional fields defined after a subcommand-field can not be parsed.
  • Subparsers handle all arguments that come after the command; so all global arguments must come before the command. In the above example this means that entrypoint --verbose Command2 string is valid but entrypoint Command2 string --verbose is not.