OpenSourceEconomics · hmgaudecker · Jan 13, 2023 · Jan 13, 2023 · Jan 15, 2023 · Jan 18, 2023
diff --git a/docs/make.bat b/docs/make.bat
diff --git a/docs/source/dep-01-namespaces.md b/docs/source/dep-01-namespaces.md
@@ -0,0 +1,147 @@
+(dep_namespaces)=
+
+# DEP-01:
+
+```{eval-rst}
++------------+------------------------------------------------------------------+
+| Author | `Hans-Martin von Gaudecker <https://github.com/hmgaudecker>`_ |
+| | `Janos Gabler <https://github.com/janosg>`_ |
++------------+------------------------------------------------------------------+
+| Status | Draft |
++------------+------------------------------------------------------------------+
+| Type | Standards Track |
++------------+------------------------------------------------------------------+
+| Created | 2023-01-13 |
++------------+------------------------------------------------------------------+
+| Resolution | |
++------------+------------------------------------------------------------------+
+```
+
+## Abstract
+
+## Backwards compatibility
+
+All changes are fully backwards compatible.
+
+## Motivation
+
+dags constructs a graph based on function names, which need to be valid Python
+identifiers. In larger projects, which distribute things across many submodules, this
+will lead to very long and/or cryptic identifiers. Even in smaller projects, this may
+happen if there are multiple similar submodules. E.g., in case of a library depicting a
+taxes and transfers system, two different transfer programs will have many similar
+concepts (eligibility criteria, benefits, ...). In case one wants to process a panel
+dataset with dimensions year × variables, often one will have to do this at a granular
+level for some variables in some years, but at for all years at the same time for other
+variables. In both cases, disambiguation via string manipulations on the user side is
+repetitive and error-prone. As a result, code becomes unmanageable.
+
+Hence, programming languages such as Python have the concept of namespaces, which allows
+for short identifiers with the scope provided by the broader context. The equivalent for
+dags will be a nested dictionary of functions based on
+[pybaum](https://github.com/OpenSourceEconomics/pybaum), with the scope provided by a
+branch of the tree.
+
+Before going into detail, an example and some terminology are useful.
+
+
+## Example: Taxes and transfers system
+
+Say we have the following structure:
+
+```
+taxes_transfers
+|-- __init__.py
+| |-- monthly_earnings(hours, hourly_wage)
+| |-- [call of concatenate_functions]
+|-- pensions.py
+| |-- eligible(benefits_last_period, applied_this_period)
+| |-- benefit(eligible, aime, conversion_factor, monthly_earnings, unemployment_insurance__earnings_limit)
+|-- unemployment_insurance.py
+| |-- monthly_earnings(hours, hourly_wage, earnings_limit):
+| |-- eligible(benefits_last_period, applied_this_period, hours, monthly_earnings, hours_limit, earnings_limit)
+| |-- benefit(eligible, monthly_earnings, fraction)
+```
+
+### Content of pensions.py
+```py
+def eligible(
+ benefits_last_period,
+ applied_this_period,
+):
+ return True if benefits_last_period else applied_this_period
+
+
+def benefit(
+ eligible,
+ aime,
+ conversion_factor,
+ monthly_earnings,
+ unemployment_insurance__earnings_limit,
+):
+ cond = eligible and monthly_earnings < unemployment_insurance__earnings_limit
+ return aime * conversion_factor if cond else 0
+
+import pensions
+import unemployment_insurance
+
+- repeat the example system from the test file
+```
+
+## Terminology
+
+- **flat**
+- **nested**
+- **short**
+- **registry**
+
+## The new `functions` argument
+
+### Possible Containers
+
+- should we only allow (arbitrarily?) nested dictionaries or everything that works in pybaum?
+- do we need an extensible registry as in pybaum?
+- should it be possible to provide a module directly instead of a dictionary of functions?
+- do we allow unlabeled datastructures as long as the functions have a `__name__` attribute?
+
+It is probably not much extra work to do everything fully flexible but the flattening rules will be very different from pybaum. Instead of flattening everything into lists we need to flatten into dictionaries! Could re-use pybaum for that though. Just with different registries.
+
+### Argument names
+
+- When can we use short and long names for arguments
+- Are there ever ambiguous cases and if so, how are they handled? How does this relate to pythons resolution of scopes?
+- What should not work (e.g. because it would be error prone)
+
+**Use more abstract examples than tax and transfer here!**
+
+
+```
+abstract
+|-- __init__.py
+| |-- a(b, c)
+| |-- [call of concatenate_functions]
+|-- x.py
+| |-- f(a, b)
+| |-- g(f, c)
+|-- y.py
+| |-- a(b, c)
+| |-- m(x__f, a, b)
+```
+
+
+
+## Inputs of the concatenated function
+
+- Is it necessary to have different input modes (e.g. flat and nested) or is nested always what we want?
+- input of `x__c` should inherit behavior of .
+
+
+## Output of the concatenated function
+
+- Is it necessary to have different output formats (e.g. flat and nested) or should this be done outside of dags?
+
+## Implementation Ideas
+
+- How aware should the dag be of the nesting? It is probably possible to flatten everything before a dag is built but then we lose information that could be useful for visualization.
+- It is generally easy to flatten the function container. The hard part is converting the function arguments into long names that are globally unique in the dag!
+- **What can we learn from pytask? Essentially pytask has this already!**
diff --git a/environment.yml b/environment.yml
@@ -30,3 +30,6 @@ dependencies:
  # Development
  - jupyterlab
  - nbsphinx
+
+ - pip:
+ - -e .
diff --git a/tests/pensions.py b/tests/pensions.py
@@ -0,0 +1,16 @@
+def eligible(
+ benefits_last_period,
+ applied_this_period,
+):
+ return True if benefits_last_period else applied_this_period
+
+
+def benefit(
+ eligible,
+ aime,
+ conversion_factor,
+ monthly_earnings,
+ unemployment_insurance__earnings_limit,
+):
+ cond = eligible and monthly_earnings < unemployment_insurance__earnings_limit
+ return aime * conversion_factor if cond else 0
diff --git a/tests/test_trees.py b/tests/test_trees.py
@@ -0,0 +1,150 @@
+import inspect
+from functools import partial
+
+import pensions
+import unemployment_insurance
+from dags.dag import concatenate_functions
+from dags.dag import create_dag
+from dags.dag import get_ancestors
+
+
+# Define a function in the "global" namespace
+def monthly_earnings(hours, hourly_wage):
+ return hours * 4.3 * hourly_wage
+
+
+# Note there is the same function in unemployment_insurance.py,
+# leading to a potential ambiguity of what is meant with "monthly_earnings"
+# inside of the "unemployment_insurance" namespace (namespace in the dags sense).
+# In particular, there would be no way to access the "global" function.
+#
+# To handle this case in different manners, add a keyword argument:
+#
+# concatenate_functions(
+# ...
+# name_clashes="**raise**|ignore|warn",
+# )
+#
+# Default is to raise an error.
+
+
+# Typical way would be to define functions in a nested dictionary creating namespaces.
+functions = {
+ "pensions": {
+ "eligible": pensions.eligible,
+ "benefit": pensions.benefit,
+ },
+ "unemployment_insurance": {
+ "eligible": unemployment_insurance.eligible,
+ "benefit": unemployment_insurance.benefit,
+ },
+ "monthly_earnings": monthly_earnings,
+}
+
+
+# Typical way to define targets would be in a nested dictionary.
+targets = {"pensions": ["benefit"], "unemployment_insurance": ["benefit"]}
+
+# Alternatively, use flat structure.
+targets_flat = {"pensions__benefit", "unemployment_insurance__benefit"}
+
+# Expected behavior of concatenate_functions with input_mode="tree"
+# (meaning that a nested dict of inputs is expected by the complete system)
+# and output_mode="tree" is to return *targets*
+#
+# concatenate_functions(
+# functions=functions,
+# targets=targets,
+# input_mode="tree",
+# output_mode="tree"
+# )
+#
+def complete_system(
+ pensions: dict, # str: float: aime, conversion_factor, benefits_last_period, applied_this_period
+ unemployment_insurance: dict, # benefits_last_period, applied_this_period, hours_limit, earnings_limit,
+ hours: float,
+ hourly_wage: float,
+):
+ ...
+
+
+# Expected behavior of concatenate_functions with input_mode="flat"
+# (meaning that long name-style input is expected by the complete system)
+# and output_mode="tree" is to return *targets_flat*
+#
+# concatenate_functions(
+# functions=functions,
+# targets=targets_flat,
+# input_mode="flat",
+# output_mode="flat"
+# )
+
+
+def complete_system(
+ pensions__aime,
+ pensions__conversion_factor,
+ pensions__benefits_last_period,
+ pensions__applied_this_period,
+ unemployment_insurance__benefits_last_period,
+ unemployment_insurance__applied_this_period,
+ unemployment_insurance__hours_limit,
+ unemployment_insurance__earnings_limit,
+ employment__hours,
+ employment__hourly_wage,
+):
+ ...
+
+
+functions = {
+ "pensions": {
+ "eligible": pensions.eligible,
+ "benefit": pensions.benefit,
+ },
+ "unemployment_insurance": {
+ "eligible": unemployment_insurance.eligible,
+ "benefit": unemployment_insurance.benefit,
+ },
+ "monthly_earnings": monthly_earnings,
+}
+
+
+# Specifying functions in a flat manner.
+#
+# (difference to current implementation is that we can still use
+# the short names inside of modules)
+#
+# OTOneH, not sure whether we want to support this, added for completeness.
+# OTOtherH, it's just a matter of calling tree_unflatten (I think)
+
+functions_flat = {
+ "pensions__eligible": pensions.eligible,
+ "pensions__benefit": pensions.benefit,
+ "unemployment_insurance__eligible": unemployment_insurance.eligible,
+ "unemployment_insurance__benefit": unemployment_insurance.benefit,
+ "monthly_earnings": monthly_earnings,
+}
+
+
+# concatenate_functions(
+# functions=functions_flat,
+# targets=targets_flat,
+# functions_mode="flat",
+# input_mode="flat",
+# output_mode="flat"
+# )
+
+
+def complete_system(
+ pensions__aime,
+ pensions__conversion_factor,
+ pensions__benefits_last_period,
+ pensions__applied_this_period,
+ unemployment_insurance__benefits_last_period,
+ unemployment_insurance__applied_this_period,
+ unemployment_insurance__hours_limit,
+ unemployment_insurance__earnings_limit,
+ unemployment_insurance__baseline_earnings
+ hours,
+ hourly_wage,
+):
+ ...