Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEP 1: Namespaces #14

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file modified docs/make.bat
100644 → 100755
Empty file.
147 changes: 147 additions & 0 deletions docs/source/dep-01-namespaces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
(dep_namespaces)=

# DEP-01:

```{eval-rst}
+------------+------------------------------------------------------------------+
| Author | `Hans-Martin von Gaudecker <https://github.com/hmgaudecker>`_ |
| | `Janos Gabler <https://github.com/janosg>`_ |
+------------+------------------------------------------------------------------+
| Status | Draft |
+------------+------------------------------------------------------------------+
| Type | Standards Track |
+------------+------------------------------------------------------------------+
| Created | 2023-01-13 |
+------------+------------------------------------------------------------------+
| Resolution | |
+------------+------------------------------------------------------------------+
```

## Abstract

## Backwards compatibility

All changes are fully backwards compatible.

## Motivation

dags constructs a graph based on function names, which need to be valid Python
identifiers. In larger projects, which distribute things across many submodules, this
will lead to very long and/or cryptic identifiers. Even in smaller projects, this may
happen if there are multiple similar submodules. E.g., in case of a library depicting a
taxes and transfers system, two different transfer programs will have many similar
concepts (eligibility criteria, benefits, ...). In case one wants to process a panel
dataset with dimensions year × variables, often one will have to do this at a granular
level for some variables in some years, but at for all years at the same time for other
variables. In both cases, disambiguation via string manipulations on the user side is
repetitive and error-prone. As a result, code becomes unmanageable.

Hence, programming languages such as Python have the concept of namespaces, which allows
for short identifiers with the scope provided by the broader context. The equivalent for
dags will be a nested dictionary of functions based on
[pybaum](https://github.com/OpenSourceEconomics/pybaum), with the scope provided by a
branch of the tree.

Before going into detail, an example and some terminology are useful.


## Example: Taxes and transfers system

Say we have the following structure:

```
taxes_transfers
|-- __init__.py
| |-- monthly_earnings(hours, hourly_wage)
| |-- [call of concatenate_functions]
|-- pensions.py
| |-- eligible(benefits_last_period, applied_this_period)
| |-- benefit(eligible, aime, conversion_factor, monthly_earnings, unemployment_insurance__earnings_limit)
|-- unemployment_insurance.py
| |-- monthly_earnings(hours, hourly_wage, earnings_limit):
| |-- eligible(benefits_last_period, applied_this_period, hours, monthly_earnings, hours_limit, earnings_limit)
| |-- benefit(eligible, monthly_earnings, fraction)
```

### Content of pensions.py
```py
def eligible(
benefits_last_period,
applied_this_period,
):
return True if benefits_last_period else applied_this_period


def benefit(
eligible,
aime,
conversion_factor,
monthly_earnings,
unemployment_insurance__earnings_limit,
):
cond = eligible and monthly_earnings < unemployment_insurance__earnings_limit
return aime * conversion_factor if cond else 0

import pensions
import unemployment_insurance

- repeat the example system from the test file
```

## Terminology

- **flat**
- **nested**
- **short**
- **registry**

## The new `functions` argument

### Possible Containers

- should we only allow (arbitrarily?) nested dictionaries or everything that works in pybaum?
- do we need an extensible registry as in pybaum?
- should it be possible to provide a module directly instead of a dictionary of functions?
- do we allow unlabeled datastructures as long as the functions have a `__name__` attribute?

It is probably not much extra work to do everything fully flexible but the flattening rules will be very different from pybaum. Instead of flattening everything into lists we need to flatten into dictionaries! Could re-use pybaum for that though. Just with different registries.

### Argument names

- When can we use short and long names for arguments
- Are there ever ambiguous cases and if so, how are they handled? How does this relate to pythons resolution of scopes?
- What should not work (e.g. because it would be error prone)

**Use more abstract examples than tax and transfer here!**


```
abstract
|-- __init__.py
| |-- a(b, c)
| |-- [call of concatenate_functions]
|-- x.py
| |-- f(a, b)
| |-- g(f, c)
|-- y.py
| |-- a(b, c)
| |-- m(x__f, a, b)
```



## Inputs of the concatenated function

- Is it necessary to have different input modes (e.g. flat and nested) or is nested always what we want?
- input of `x__c` should inherit behavior of .


## Output of the concatenated function

- Is it necessary to have different output formats (e.g. flat and nested) or should this be done outside of dags?

## Implementation Ideas

- How aware should the dag be of the nesting? It is probably possible to flatten everything before a dag is built but then we lose information that could be useful for visualization.
- It is generally easy to flatten the function container. The hard part is converting the function arguments into long names that are globally unique in the dag!
- **What can we learn from pytask? Essentially pytask has this already!**
3 changes: 3 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,6 @@ dependencies:
# Development
- jupyterlab
- nbsphinx

- pip:
- -e .
16 changes: 16 additions & 0 deletions tests/pensions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
def eligible(
benefits_last_period,
applied_this_period,
):
return True if benefits_last_period else applied_this_period


def benefit(
eligible,
aime,
conversion_factor,
monthly_earnings,
unemployment_insurance__earnings_limit,
):
cond = eligible and monthly_earnings < unemployment_insurance__earnings_limit
return aime * conversion_factor if cond else 0
150 changes: 150 additions & 0 deletions tests/test_trees.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
import inspect
from functools import partial

import pensions
import unemployment_insurance
from dags.dag import concatenate_functions
from dags.dag import create_dag
from dags.dag import get_ancestors


# Define a function in the "global" namespace
def monthly_earnings(hours, hourly_wage):
return hours * 4.3 * hourly_wage


# Note there is the same function in unemployment_insurance.py,
# leading to a potential ambiguity of what is meant with "monthly_earnings"
# inside of the "unemployment_insurance" namespace (namespace in the dags sense).
# In particular, there would be no way to access the "global" function.
#
# To handle this case in different manners, add a keyword argument:
#
# concatenate_functions(
# ...
# name_clashes="**raise**|ignore|warn",
# )
#
# Default is to raise an error.


# Typical way would be to define functions in a nested dictionary creating namespaces.
functions = {
"pensions": {
"eligible": pensions.eligible,
"benefit": pensions.benefit,
},
"unemployment_insurance": {
"eligible": unemployment_insurance.eligible,
"benefit": unemployment_insurance.benefit,
},
"monthly_earnings": monthly_earnings,
}


# Typical way to define targets would be in a nested dictionary.
targets = {"pensions": ["benefit"], "unemployment_insurance": ["benefit"]}

# Alternatively, use flat structure.
targets_flat = {"pensions__benefit", "unemployment_insurance__benefit"}

# Expected behavior of concatenate_functions with input_mode="tree"
# (meaning that a nested dict of inputs is expected by the complete system)
# and output_mode="tree" is to return *targets*
#
# concatenate_functions(
# functions=functions,
# targets=targets,
# input_mode="tree",
# output_mode="tree"
# )
#
def complete_system(
pensions: dict, # str: float: aime, conversion_factor, benefits_last_period, applied_this_period
unemployment_insurance: dict, # benefits_last_period, applied_this_period, hours_limit, earnings_limit,
hours: float,
hourly_wage: float,
):
...


# Expected behavior of concatenate_functions with input_mode="flat"
# (meaning that long name-style input is expected by the complete system)
# and output_mode="tree" is to return *targets_flat*
hmgaudecker marked this conversation as resolved.
Show resolved Hide resolved
#
# concatenate_functions(
# functions=functions,
# targets=targets_flat,
# input_mode="flat",
# output_mode="flat"
# )


def complete_system(
pensions__aime,
pensions__conversion_factor,
pensions__benefits_last_period,
pensions__applied_this_period,
unemployment_insurance__benefits_last_period,
unemployment_insurance__applied_this_period,
unemployment_insurance__hours_limit,
unemployment_insurance__earnings_limit,
employment__hours,
employment__hourly_wage,
):
...


functions = {
"pensions": {
"eligible": pensions.eligible,
"benefit": pensions.benefit,
},
"unemployment_insurance": {
"eligible": unemployment_insurance.eligible,
"benefit": unemployment_insurance.benefit,
},
"monthly_earnings": monthly_earnings,
}


# Specifying functions in a flat manner.
#
# (difference to current implementation is that we can still use
# the short names inside of modules)
#
# OTOneH, not sure whether we want to support this, added for completeness.
# OTOtherH, it's just a matter of calling tree_unflatten (I think)

functions_flat = {
"pensions__eligible": pensions.eligible,
"pensions__benefit": pensions.benefit,
"unemployment_insurance__eligible": unemployment_insurance.eligible,
"unemployment_insurance__benefit": unemployment_insurance.benefit,
"monthly_earnings": monthly_earnings,
}


# concatenate_functions(
# functions=functions_flat,
# targets=targets_flat,
# functions_mode="flat",
# input_mode="flat",
# output_mode="flat"
# )


def complete_system(
pensions__aime,
pensions__conversion_factor,
pensions__benefits_last_period,
pensions__applied_this_period,
unemployment_insurance__benefits_last_period,
unemployment_insurance__applied_this_period,
unemployment_insurance__hours_limit,
unemployment_insurance__earnings_limit,
unemployment_insurance__baseline_earnings
hours,
hourly_wage,
):
...
Loading