legend-exp · ggmarshall · Oct 9, 2024 · Oct 20, 2024 · Oct 21, 2024 · Oct 21, 2024
diff --git a/.ruff.toml b/.ruff.toml
@@ -12,7 +12,7 @@ lint.select = [
   "PIE",         # flake8-pie
   "PL",          # pylint
   "PT",          # flake8-pytest-style
-  # "PTH",         # flake8-use-pathlib
+  "PTH",         # flake8-use-pathlib
   "RET",         # flake8-return
   "RUF",         # Ruff-specific
   "SIM",         # flake8-simplify

diff --git a/README.md b/README.md
@@ -3,115 +3,3 @@
 Implementation of an automatic data processing flow for L200
 data, based on
 [Snakemake](https://snakemake.readthedocs.io/).
-
-
-## Configuration
-
-Data processing resources are configured via a single site-dependent (and
-possibly user-dependent) configuration file, named `config.json` in the
-following. You may choose an arbitrary name, though.
-
-Use the included [templates/config.json](templates/config.json) as a template
-and adjust the data base paths as necessary. Note that, when running Snakemake,
-the default path to the config file is `./config.json`.
-
-
-## Key-Lists
-
-Data generation is based on key-lists, which are flat text files
-(extension ".keylist") containing one entry of the form
-`{experiment}-{period}-{run}-{datatype}-{timestamp}` per line.
-
-Key-lists can be auto-generated based on the available  DAQ files
-using Snakemake targets of the form
-
-* `all-{experiment}.keylist`
-* `all-{experiment}-{period}.keylist`
-* `all-{experiment}-{period}-{run}.keylist`
-* `all-{experiment}-{period}-{run}-{datatype}.keylist`
-
-which will generate the list of available file keys for all l200 files, resp.
-a specific period, or a specific period and run, etc.
-
-For example:
-```shell
-$ snakemake all-l200-myper.keylist
-```
-will generate a key-list with all files regarding period `myper`.
-
-
-## File-Lists
-
-File-lists are flat files listing output files that should be generated,
-with one file per line. A file-list will typically be generated for a given
-data tier from a key-list, using the Snakemake targets of the form
-`{label}-{tier}.filelist` (generated from `{label}.keylist`).
-
-For file lists based on auto-generated key-lists like
-`all-{experiment}-{period}-{tier}.filelist`, the corresponding key-list
-(`all-{experiment}-{period}.keylist` in this case) will be created
-automatically, if it doesn't exist.
-
-Example:
-```shell
-$ snakemake all-mydet-mymeas-tier2.filelist
-```
-
-File-lists may of course also be derived from custom keylists, generated
-manually or by other means, e.g. `my-dataset-raw.filelist` will be
-generated from `my-dataset.keylist`.
-
-
-## Main output generation
-
-Usually, the main output will be determined by a file-list, resp. a key-list
-and data tier. The special output target `{label}-{tier}.gen` is used to
-generate all files listed in `{label}-{tier}.filelist`. After the files
-are created, the empty file `{label}-{tier}.filelist` will be created to
-mark the successful data production.
-
-Snakemake targets like `all-{experiment}-{period}-{tier}.gen` may be used
-to automatically generate key-lists and file-lists (if not already present)
-and produce all possible output for the given data tier, based on available
-tier0 files which match the target.
-
-Example:
-```shell
-$ snakemake all-mydet-mymeas-tier2.gen
-```
-Targets like `my-dataset-raw.gen` (derived from a key-list
-`my-dataset.keylist`) are of course allowed as well.
-
-
-## Monitoring
-
-Snakemake supports monitoring by connecting to a
-[panoptes](https://github.com/panoptes-organization/panoptes) server.
-
-Run (e.g.)
-```shell
-$ panoptes --port 5000
-```
-in the background to run a panoptes server instance, which comes with a
-GUI that can be accessed with a web-brower on the specified port.
-
-Then use the Snakemake option `--wms-monitor` to instruct Snakemake to push
-progress information to the panoptes server:
-```shell
-snakemake --wms-monitor http://127.0.0.1:5000 [...]
-```
-
-## Using software containers
-
-This dataflow doesn't use Snakemake's internal Singularity support, but
-instead supports Singularity containers via
-[`venv`](https://github.com/oschulz/singularity-venv) environments
-for greater control.
-
-To use this, the path to `venv` and the name of the environment must be set
-in `config.json`.
-
-This is only relevant then running Snakemake *outside* of the software
-container, e.g. then using a batch system (see below). If Snakemake
-and the whole workflow is run inside of a container instance, no
-container-related settings in `config.json` are required.
diff --git a/Snakefile b/Snakefile
@@ -10,7 +10,7 @@ This includes:
 - the same for partition level tiers
 """
 
-import pathlib
+from pathlib import Path
 import os
 import json
 import sys
@@ -20,8 +20,8 @@ from collections import OrderedDict
 import logging
 
 import scripts.util as ds
-from scripts.util.pars_loading import pars_catalog
-from scripts.util.patterns import get_pattern_tier_raw
+from scripts.util.pars_loading import ParsCatalog
+from scripts.util.patterns import get_pattern_tier
 from scripts.util.utils import (
     subst_vars_in_snakemake_config,
     runcmd,
@@ -31,6 +31,7 @@ from scripts.util.utils import (
     metadata_path,
     tmp_log_path,
     pars_path,
+    det_status_path,
 )
 
 # Set with `snakemake --configfile=/path/to/your/config.json`
@@ -43,8 +44,9 @@ setup = config["setups"]["l200"]
 configs = config_path(setup)
 chan_maps = chan_map_path(setup)
 meta = metadata_path(setup)
+det_status = det_status_path(setup)
 swenv = runcmd(setup)
-part = ds.dataset_file(setup, os.path.join(configs, "partitions.json"))
+part = ds.CalGrouping(setup, Path(det_status) / "cal_partitions.yaml")
 basedir = workflow.basedir
 
 
@@ -66,38 +68,13 @@ include: "rules/psp.smk"
 include: "rules/hit.smk"
 include: "rules/pht.smk"
 include: "rules/pht_fast.smk"
+include: "rules/ann.smk"
 include: "rules/evt.smk"
 include: "rules/skm.smk"
 include: "rules/blinding_calibration.smk"
 include: "rules/qc_phy.smk"
 
 
-# Log parameter catalogs in validity.jsonl files
-hit_par_cat_file = os.path.join(pars_path(setup), "hit", "validity.jsonl")
-if os.path.isfile(hit_par_cat_file):
-    os.remove(os.path.join(pars_path(setup), "hit", "validity.jsonl"))
-pathlib.Path(os.path.dirname(hit_par_cat_file)).mkdir(parents=True, exist_ok=True)
-ds.pars_key_resolve.write_to_jsonl(hit_par_catalog, hit_par_cat_file)
-
-pht_par_cat_file = os.path.join(pars_path(setup), "pht", "validity.jsonl")
-if os.path.isfile(pht_par_cat_file):
-    os.remove(os.path.join(pars_path(setup), "pht", "validity.jsonl"))
-pathlib.Path(os.path.dirname(pht_par_cat_file)).mkdir(parents=True, exist_ok=True)
-ds.pars_key_resolve.write_to_jsonl(pht_par_catalog, pht_par_cat_file)
-
-dsp_par_cat_file = os.path.join(pars_path(setup), "dsp", "validity.jsonl")
-if os.path.isfile(dsp_par_cat_file):
-    os.remove(dsp_par_cat_file)
-pathlib.Path(os.path.dirname(dsp_par_cat_file)).mkdir(parents=True, exist_ok=True)
-ds.pars_key_resolve.write_to_jsonl(dsp_par_catalog, dsp_par_cat_file)
-
-psp_par_cat_file = os.path.join(pars_path(setup), "psp", "validity.jsonl")
-if os.path.isfile(psp_par_cat_file):
-    os.remove(psp_par_cat_file)
-pathlib.Path(os.path.dirname(psp_par_cat_file)).mkdir(parents=True, exist_ok=True)
-ds.pars_key_resolve.write_to_jsonl(psp_par_catalog, psp_par_cat_file)
-
-
 localrules:
     gen_filelist,
     autogen_output,
@@ -111,36 +88,36 @@ onstart:
         shell('{swenv} python3 -B -c "import ' + pkg + '"')
 
         # Log parameter catalogs in validity.jsonl files
-    hit_par_cat_file = os.path.join(pars_path(setup), "hit", "validity.jsonl")
-    if os.path.isfile(hit_par_cat_file):
-        os.remove(os.path.join(pars_path(setup), "hit", "validity.jsonl"))
-    pathlib.Path(os.path.dirname(hit_par_cat_file)).mkdir(parents=True, exist_ok=True)
-    ds.pars_key_resolve.write_to_jsonl(hit_par_catalog, hit_par_cat_file)
-
-    pht_par_cat_file = os.path.join(pars_path(setup), "pht", "validity.jsonl")
-    if os.path.isfile(pht_par_cat_file):
-        os.remove(os.path.join(pars_path(setup), "pht", "validity.jsonl"))
-    pathlib.Path(os.path.dirname(pht_par_cat_file)).mkdir(parents=True, exist_ok=True)
-    ds.pars_key_resolve.write_to_jsonl(pht_par_catalog, pht_par_cat_file)
-
-    dsp_par_cat_file = os.path.join(pars_path(setup), "dsp", "validity.jsonl")
-    if os.path.isfile(dsp_par_cat_file):
-        os.remove(dsp_par_cat_file)
-    pathlib.Path(os.path.dirname(dsp_par_cat_file)).mkdir(parents=True, exist_ok=True)
-    ds.pars_key_resolve.write_to_jsonl(dsp_par_catalog, dsp_par_cat_file)
-
-    psp_par_cat_file = os.path.join(pars_path(setup), "psp", "validity.jsonl")
-    if os.path.isfile(psp_par_cat_file):
-        os.remove(psp_par_cat_file)
-    pathlib.Path(os.path.dirname(psp_par_cat_file)).mkdir(parents=True, exist_ok=True)
-    ds.pars_key_resolve.write_to_jsonl(psp_par_catalog, psp_par_cat_file)
+    hit_par_cat_file = Path(pars_path(setup)) / "hit" / "validity.yaml"
+    if hit_par_cat_file.is_file():
+        hit_par_cat_file.unlink()
+    Path(hit_par_cat_file).parent.mkdir(parents=True, exist_ok=True)
+    ds.ParsKeyResolve.write_to_yaml(hit_par_catalog, hit_par_cat_file)
+
+    pht_par_cat_file = Path(pars_path(setup)) / "pht" / "validity.yaml"
+    if pht_par_cat_file.is_file():
+        pht_par_cat_file.unlink()
+    Path(pht_par_cat_file).parent.mkdir(parents=True, exist_ok=True)
+    ds.ParsKeyResolve.write_to_yaml(pht_par_catalog, pht_par_cat_file)
+
+    dsp_par_cat_file = Path(pars_path(setup)) / "dsp" / "validity.yaml"
+    if dsp_par_cat_file.is_file():
+        dsp_par_cat_file.unlink()
+    Path(dsp_par_cat_file).parent.mkdir(parents=True, exist_ok=True)
+    ds.ParsKeyResolve.write_to_yaml(dsp_par_catalog, dsp_par_cat_file)
+
+    psp_par_cat_file = Path(pars_path(setup)) / "psp" / "validity.yaml"
+    if psp_par_cat_file.is_file():
+        psp_par_cat_file.unlink()
+    Path(psp_par_cat_file).parent.mkdir(parents=True, exist_ok=True)
+    ds.ParsKeyResolve.write_to_yaml(psp_par_catalog, psp_par_cat_file)
 
 
 onsuccess:
     from snakemake.report import auto_report
 
     rep_dir = f"{log_path(setup)}/report-{datetime.strftime(datetime.utcnow(), '%Y%m%dT%H%M%SZ')}"
-    pathlib.Path(rep_dir).mkdir(parents=True, exist_ok=True)
+    Path(rep_dir).mkdir(parents=True, exist_ok=True)
     # auto_report(workflow.persistence.dag, f"{rep_dir}/report.html")
 
     with open(os.path.join(rep_dir, "dag.txt"), "w") as f:
@@ -157,15 +134,15 @@ onsuccess:
         if os.path.isfile(file):
             os.remove(file)
 
-            # remove filelists
-    files = glob.glob(os.path.join(filelist_path(setup), "*"))
-    for file in files:
-        if os.path.isfile(file):
-            os.remove(file)
-    if os.path.exists(filelist_path(setup)):
-        os.rmdir(filelist_path(setup))
+            #     # remove filelists
+            # files = glob.glob(os.path.join(filelist_path(setup), "*"))
+            # for file in files:
+            #     if os.path.isfile(file):
+            #         os.remove(file)
+            # if os.path.exists(filelist_path(setup)):
+            #     os.rmdir(filelist_path(setup))
 
-        # remove logs
+            # remove logs
     files = glob.glob(os.path.join(tmp_log_path(setup), "*", "*.log"))
     for file in files:
         if os.path.isfile(file):
@@ -190,16 +167,17 @@ rule gen_filelist:
         lambda wildcards: get_filelist(
             wildcards,
             setup,
-            get_pattern_tier_raw(setup),
-            ignore_keys_file=os.path.join(configs, "ignore_keys.keylist"),
-            analysis_runs_file=os.path.join(configs, "analysis_runs.json"),
+            get_pattern_tier(setup, "raw", check_in_cycle=False),
+            ignore_keys_file=Path(det_status) / "ignored_daq_cycles.yaml",
+            analysis_runs_file=Path(det_status) / "runlists.yaml",
         ),
     output:
-        os.path.join(filelist_path(setup), "{label}-{tier}.filelist"),
+        temp(Path(filelist_path(setup)) / "{label}-{tier}.filelist"),
     run:
         if len(input) == 0:
             print(
-                "WARNING: No files found for the given pattern\nmake sure pattern follows the format: all-{experiment}-{period}-{run}-{datatype}-{timestamp}-{tier}.gen"
+                f"WARNING: No files found for the given pattern:{wildcards.label}",
+                "\nmake sure pattern follows the format: all-{experiment}-{period}-{run}-{datatype}-{timestamp}-{tier}.gen",
             )
         with open(output[0], "w") as f:
             for fn in input:

diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,21 @@
+SHELL := /bin/bash
+SOURCEDIR = source
+BUILDDIR = build
+
+all: apidoc
+	sphinx-build -M html "$(SOURCEDIR)" "$(BUILDDIR)" -W --keep-going
+
+apidoc: clean-apidoc
+	sphinx-apidoc \
+      --private \
+      --module-first \
+      --force \
+      --output-dir "$(SOURCEDIR)/api" \
+      ../scripts \
+      ../rules
+
+clean-apidoc:
+	rm -rf "$(SOURCEDIR)/api"
+
+clean: clean-apidoc
+	rm -rf "$(BUILDDIR)"
diff --git a/docs/source/developer.rst b/docs/source/developer.rst
@@ -0,0 +1,15 @@
+Developers Guide
+===============
+
+Snakemake is configured around a series of rules which specify how to generate a file/files from a set of input files.
+These rules are defined in the ``Snakefile`` and in the files in the ``rules`` directory.
+In general the structure is that a series of rules are defined to run on some calibration data generation
+a final ``par_{tier}.yaml`` file at the end which can be used by the ``tier``` rule to generate all the files in the tier.
+For most rules there are 2 versions the basic version and the partition version where the first uses a single run
+while the latter will group many runs together.
+This grouping is defined in the ``cal_grouping.yaml`` file in the `legend-datasets <https://github.com/legend-exp/legend-datasets>`_ repository.
+
+Each rule has specified its inputs and outputs along with how to generate which can be
+a shell command or a call to a python function. These scripts are stored in the ``scripts``` directory.
+Additional parameters can also be defined.
+Full details can be found at `snakemake https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html)`_.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -0,0 +1,41 @@
+Welcome to legend-dataflow's documentation!
+==================================
+
+*legend-dataflow* is a Python package based on Snakemake `<https://snakemake.readthedocs.io/en/stable/index.html>`_
+for running the data production of LEGEND.
+It is designed to calibrate and optimise hundreds of channels in parallel before
+bringing them all together to process the data. It takes as an input the metadata
+at `legend metadata <https://github.com/legend-exp/legend-metadata>`_.
+
+Getting started
+---------------
+
+It is recommended to install and use the package through the `legend-prodenv <https://github.com/legend-exp/legend-prodenv>`_.
+
+Next steps
+----------
+
+.. toctree::
+   :maxdepth: 1
+
+   Package API reference <api/modules>
+
+.. toctree::
+   :maxdepth: 1
+
+   tutorials
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Related projects
+
+   LEGEND Data Objects <https://legend-pydataobj.readthedocs.io>
+   Decoding Digitizer Data <https://legend-daq2lh5.readthedocs.io>
+   Digital Signal Processing <https://dspeed.readthedocs.io>
+   Pygama <https://pygama.readthedocs.io>
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Development
+
+   Source Code <https://github.com/legend-exp/legend-dataflow>