Skip to content

Commit

Permalink
start documentation for rules
Browse files Browse the repository at this point in the history
  • Loading branch information
azuline committed Nov 2, 2023
1 parent ca09df6 commit c3676d3
Show file tree
Hide file tree
Showing 4 changed files with 196 additions and 25 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ by the two interfaces is:
- Flag and unflag release "new"-ness
- White/black-list entries in the artist, genre, and label views
- Edit release metadata as a text file
- Define and store rules for bulk updating metadata
- Run and store rules for bulk updating metadata
- Import metadata and cover art from third-party sources
- Extract embedded cover art to an external file
- Extract "phony" single releases from EPs/Albums/etc
Expand Down
55 changes: 40 additions & 15 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,21 +68,6 @@ filenames or tags remaining static; the source directory can be freely
modified. The only constraint is that each release must be a directory in the
root of the `music_source_dir`.

# Read Cache Update

The read cache update is optimized to minimize the number of disk accesses, as
it's a hot path and quite expensive if implemented poorly.

The read cache update first pulls all relevant cached data from SQLite. Stored
on each track is the mtime during the previous cache update. The cache update
checks whether any files have changed via `readdir` and `stat` calls, and only
reads the file if the `mtime` has changed. Throughout the update, we take note
of the changes to apply. At the end of the update, we make a few fat SQL
queries to batch the writes.

The update process is also parallelizable, so we shard workloads across
multiple processes.

# Virtual Filesystem

We use the `llfuse` library for the virtual filesystem. `llfuse` is a fairly
Expand Down Expand Up @@ -137,6 +122,46 @@ So in the playlists case, we pretend that the written file exists for 2 seconds
after we wrote it. This way, `cp --preserve=mode` can replicate its attributes
onto a ghost file and happily exit without errors.

# Rules Engine

The rules engine has two subsystems: the DSL and the rule executor. In this
section, we'll go into rule executor performance.

The naive implementation has us searching the SQLite database with `LIKE`
queries, which is not performant. Regex is even less so, which is why we do not
support regex matchers (but we support regex actions!).

In Rosé, we instead index all tags into SQLite's Full Text Search (FTS). Every
character in a tag is treated as a word, which grants us substring search, at
the expense of a twice-as-large read cache. Note that this is _not_ the typical
way a FTS engine is used: words are typically words, not individual characters.

However, FTS is not designed to care about ordering, so a search query like
`Rose` would match the substring `esoR`. This "feature" of FTS leads to false
positives. So we introduce an additional fully-accurate Python filtering step
on the results of the FTS query. The Python filtering isn't performant enough
to run on all results, but it is sufficiently efficient to run on the subset of
tracks returned from the FTS query.

In the very brief testing period, the FTS implementation was around a hundred
times faster than the naive `LIKE` query. Queries that took multiple seconds
now completed in tens of milliseconds.

# Read Cache Update

The read cache update is optimized to minimize the number of disk accesses, as
it's a hot path and quite expensive if implemented poorly.

The read cache update first pulls all relevant cached data from SQLite. Stored
on each track is the mtime during the previous cache update. The cache update
checks whether any files have changed via `readdir` and `stat` calls, and only
reads the file if the `mtime` has changed. Throughout the update, we take note
of the changes to apply. At the end of the update, we make a few fat SQL
queries to batch the writes.

The update process is also parallelizable, so we shard workloads across
multiple processes.

# Logging

Logs are written to stderr. Logs are also written to disk: to
Expand Down
161 changes: 153 additions & 8 deletions docs/METADATA_MANAGEMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ Rosé manages the following tags:
- Artists
- Track Number
- Disc Number
- Rosé ID
- Rosé Release ID

Rosé does not care about any other tags and does not do anything with them.

Expand Down Expand Up @@ -190,15 +188,157 @@ artists = [

# Rules Engine

_In Development_
Rosé's rule engine allows you to update metadata in bulk across your library.
The rule engine supports two methods of execution:

1. Running ad hoc rules in the command line.
2. Storing rules in the configuration to run repeatedly.

## Example

I have two artists in Rosé: `CHUU` and `Chuu`. They're actually the same
artist, but capitalized differently. To normalize them, I execute the following
ad hoc rule:

```bash
$ rose metadata run-rule 'trackartist,albumartist:CHUU' 'replace:Chuu'
CHUU - 2023. Howl/01. Howl.opus
trackartist[main]: ['CHUU'] -> ['Chuu']
albumartist[main]: ['CHUU'] -> ['Chuu']
CHUU - 2023. Howl/02. Underwater.opus
trackartist[main]: ['CHUU'] -> ['Chuu']
albumartist[main]: ['CHUU'] -> ['Chuu']
CHUU - 2023. Howl/03. My Palace.opus
trackartist[main]: ['CHUU'] -> ['Chuu']
albumartist[main]: ['CHUU'] -> ['Chuu']
CHUU - 2023. Howl/04. Aliens.opus
trackartist[main]: ['CHUU'] -> ['Chuu']
albumartist[main]: ['CHUU'] -> ['Chuu']
CHUU - 2023. Howl/05. Hitchhiker.opus
trackartist[main]: ['CHUU'] -> ['Chuu']
albumartist[main]: ['CHUU'] -> ['Chuu']

Write changes to 5 tracks? [Y/n] y

[01:10:58] INFO: Writing tag changes for rule matcher=trackartist,albumartist:CHUU action=matched:CHUU::replace:Chuu
[01:10:58] INFO: Writing tag changes to CHUU - 2023. Howl/01. Howl.opus
[01:10:58] INFO: Writing tag changes to CHUU - 2023. Howl/02. Underwater.opus
[01:10:58] INFO: Writing tag changes to CHUU - 2023. Howl/03. My Palace.opus
[01:10:58] INFO: Writing tag changes to CHUU - 2023. Howl/04. Aliens.opus
[01:10:58] INFO: Writing tag changes to CHUU - 2023. Howl/05. Hitchhiker.opus

Applied tag changes to 5 tracks!
```

And we now have a single Chuu!

## Rule Syntax
```bash
$ rose tracks print ...
TODO
```

Rules are specified in a DSL.
And I also want to set all of Chuu's releases to the `K-Pop` genre:

Examples and human description TBD.
```bash
$ rose metadata run-rule 'trackartist,albumartist:Chuu' 'genre::replace-all:K-Pop'
CHUU - 2023. Howl/01. Howl.opus
genre: [] -> ['K-Pop']
CHUU - 2023. Howl/02. Underwater.opus
genre: [] -> ['K-Pop']
CHUU - 2023. Howl/03. My Palace.opus
genre: [] -> ['K-Pop']
CHUU - 2023. Howl/04. Aliens.opus
genre: [] -> ['K-Pop']
CHUU - 2023. Howl/05. Hitchhiker.opus
genre: [] -> ['K-Pop']
LOOΠΔ - 2017. Chuu/01. Heart Attack.opus
genre: ['Kpop'] -> ['K-Pop']
LOOΠΔ - 2017. Chuu/02. Girl's Talk.opus
genre: ['Kpop'] -> ['K-Pop']
Write changes to 7 tracks? [Y/n] y
[01:14:57] INFO: Writing tag changes for rule matcher=trackartist,albumartist:Chuu action=genre::replace-all:K-Pop
[01:14:57] INFO: Writing tag changes to CHUU - 2023. Howl/01. Howl.opus
[01:14:57] INFO: Writing tag changes to CHUU - 2023. Howl/02. Underwater.opus
[01:14:57] INFO: Writing tag changes to CHUU - 2023. Howl/03. My Palace.opus
[01:14:57] INFO: Writing tag changes to CHUU - 2023. Howl/04. Aliens.opus
[01:14:57] INFO: Writing tag changes to CHUU - 2023. Howl/05. Hitchhiker.opus
[01:14:57] INFO: Writing tag changes to LOOΠΔ - 2017. Chuu/01. Heart Attack.opus
[01:14:57] INFO: Writing tag changes to LOOΠΔ - 2017. Chuu/02. Girl's Talk.opus

Applied tag changes to 7 tracks!
```

The rule syntax is defined by the following grammar:
Now that I've written these rules, I can also store them in Rosé's configuration in
order to apply them on all releases I add in the future. I do this by appending
the following to my configuration file:

```toml
[[stored_metadata_rules]]
matcher = "trackartist,albumartist:CHUU"
actions = ["replace:Chuu"]
[[stored_metadata_rules]]
matcher = "trackartist,albumartist:Chuu"
actions = ["genre::replace-all:K-Pop"]
```

And with the `rose metadata run-stored-rules` command, I can run these rules,
as well as the others, repeatedly again in the future.

## Mechanics

The rules engine operates in two steps:

1. Find all tracks matching a _matcher_.
2. Apply _actions_ to the matched tracks.

### Matchers

Matchers are `(tags, pattern)` tuples for selecting tracks. Tracks are selected
if the `pattern` matches one or more of the track's values for the given
`tags`.

Pattern matching is executed as a substring match. For example, the patterns
`Chuu`, `Chu`, `hu`, and `huu` all match `Chuu`. Regex is not supported for
pattern matching due to its performance.

The `^` and `$` characters enable strict prefix and strict suffix matching,
respectively. So for example, the pattern `^Chu` match `Chuu`, but not `AChuu`.
And the pattern `Chu$` matches `Chu`, but not `Chuu`.

### Actions

Actions are `(tags, pattern, all, kind, *args)` tuples for modifying the
metadata of a track.

Given a track, if the `pattern` matches the `tags`, by the same logic as the
matchers, the action is applied.

There are four kinds of actions: `replace`, `sed`, `split`, and `delete`. Each
action has its own set of additional arguments.

- `replace`:

For multi-valued tags, `all`...

The `tags` and `pattern`, usually by default, equivalent the `matcher`.

### Track-Based Paradigm

Each action is applied to the track _as a whole_. Rosé does not
inherently restrict the action solely to the matched tag. What does this mean?

Examples TODO

## Rule Language

Rosé provides a Domain Specific Language (DSL) for defining rules. Rosé's
language has two types of expressions: _matchers_ and _actions_.

TODO

The formal syntax is defined by the following grammar:

```
<matcher> ::= <tags> ':' <pattern>
Expand All @@ -215,11 +355,16 @@ The rule syntax is defined by the following grammar:
<optional-all> ::= '' | '-all'
```

## Dry Runs

TODO

# Metadata Import & Cover Art Downloading

_In Development_

Sources: Discogs, MusicBrainz, Tidal, Deezer, Apple, Junodownload, Beatport, and fanart.tv
Sources: Discogs, MusicBrainz, Tidal, Deezer, Apple, Junodownload, Beatport,
fanart.tv, and RYM.

# Appendix A. Tag Field Mappings

Expand Down
3 changes: 2 additions & 1 deletion rose/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,11 +271,12 @@ def execute_metadata_rule(

# === Step 5: Flush writes to disk ===

logger.info(f"Writing tag changes for rule {rule}")
changed_release_ids: set[str] = set()
for tags, changes in actionable_audiotags:
if tags.release_id:
changed_release_ids.add(tags.release_id)
logger.info(f"Writing tag changes {tags.path} (rule {rule}).")
logger.info(f"Writing tag changes to {tags.path}")
pathtext = str(tags.path).lstrip(str(c.music_source_dir) + "/")
logger.debug(
f"{pathtext} changes: {' //// '.join([str(x)+' -> '+str(y) for _, x, y in changes])}"
Expand Down

0 comments on commit c3676d3

Please sign in to comment.