Skip to content

Commit

Permalink
initial part
Browse files Browse the repository at this point in the history
  • Loading branch information
azuline committed Nov 2, 2023
1 parent 373a1a0 commit 0dd130a
Show file tree
Hide file tree
Showing 6 changed files with 131 additions and 63 deletions.
15 changes: 10 additions & 5 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,19 +147,24 @@ In the very brief testing period, the FTS implementation was around a hundred
times faster than the naive `LIKE` query. Queries that took multiple seconds
now completed in tens of milliseconds.

# Read Cache Update
# Read Cache

The read cache update is optimized to minimize the number of disk accesses, as
it's a hot path and quite expensive if implemented poorly.
The read cache fully encapsulates the SQLite database. Other modules do not
read directly from the SQLite database; they use the higher-level read
functions from the cache module. (Though we cheap out on tests, which do test
against the database directly.)

The read cache update first pulls all relevant cached data from SQLite. Stored
The read cache's update procedure is optimized to minimize the number of disk
accesses, as it's a hot path and quite expensive if implemented poorly.

The update procedure first pulls all relevant cached data from SQLite. Stored
on each track is the mtime during the previous cache update. The cache update
checks whether any files have changed via `readdir` and `stat` calls, and only
reads the file if the `mtime` has changed. Throughout the update, we take note
of the changes to apply. At the end of the update, we make a few fat SQL
queries to batch the writes.

The update process is also parallelizable, so we shard workloads across
The update procedure is also parallelizable, so we shard workloads across
multiple processes.

# Logging
Expand Down
122 changes: 91 additions & 31 deletions docs/METADATA_MANAGEMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,19 +96,32 @@ artists = [
# Rules Engine

Rosé's rule engine allows you to update metadata in bulk across your library.
The rule engine supports two methods of execution:
Rules can be ran ad hoc from the command line, or stored in the configuration
file to be repeatedly ran over the entire library.

1. Running ad hoc rules in the command line.
2. Storing rules in the configuration to run repeatedly.
Rules consist of a _matcher_, which matches against tracks in your library, and
one or more _actions_, which modify the metadata of the matched tracks.

## Example
The 5 available actions let you _replace_ values, apply a regex substitution
(_sed_), _split_ one value into multiple values, _delete_ values, and _add_ new
values.

I have two artists in Rosé: `CHUU` and `Chuu`. They're actually the same
artist, but capitalized differently. To normalize them, I execute the following
ad hoc rule:
## Demo

Before diving into the mechanics and language of the rules engine, let's begin
with a quick demo of how the rule engine works.

Let's say that I am a LOOΠΔ fan (I mean, who isn't?). In my library, I have
two of Chuu's releases, but the first is tagged as `CHUU`, and the second as
`Chuu`. I want to normalize them to both be `Chuu`. The following rule
expresses this change:

```bash
$ rose metadata run-rule 'trackartist,albumartist:CHUU' 'replace:Chuu'
# The first argument to run-rule is the matcher. We match artist tags with the
# value CHUU. The second argument is our action. We replace the matched artist
# tags with Chuu.
$ rose metadata run-rule 'trackartist,albumartist:^CHUU$' 'replace:Chuu'

CHUU - 2023. Howl/01. Howl.opus
trackartist[main]: ['CHUU'] -> ['Chuu']
albumartist[main]: ['CHUU'] -> ['Chuu']
Expand Down Expand Up @@ -137,17 +150,15 @@ Write changes to 5 tracks? [Y/n] y
Applied tag changes to 5 tracks!
```

And we now have a single Chuu!

```bash
$ rose tracks print ...
TODO
```
And we now have a single Chuu artist in our library!

And I also want to set all of Chuu's releases to the `K-Pop` genre:
Let's go through one more example. I want all of Chuu's releases to have the
K-Pop genre. The following rule expresses that: for all releases with the
albumartist `Chuu`, add the `K-Pop` genre tag.

```bash
$ rose metadata run-rule 'trackartist,albumartist:Chuu' 'genre::replace-all:K-Pop'
$ rose metadata run-rule 'albumartist:^Chuu$' 'genre::add:K-Pop'

CHUU - 2023. Howl/01. Howl.opus
genre: [] -> ['K-Pop']
CHUU - 2023. Howl/02. Underwater.opus
Expand All @@ -159,9 +170,9 @@ CHUU - 2023. Howl/04. Aliens.opus
CHUU - 2023. Howl/05. Hitchhiker.opus
genre: [] -> ['K-Pop']
LOOΠΔ - 2017. Chuu/01. Heart Attack.opus
genre: ['Kpop'] -> ['K-Pop']
genre: ['Kpop'] -> ['Kpop', 'K-Pop']
LOOΠΔ - 2017. Chuu/02. Girl's Talk.opus
genre: ['Kpop'] -> ['K-Pop']
genre: ['Kpop'] -> ['Kpop', 'K-Pop']
Write changes to 7 tracks? [Y/n] y
Expand All @@ -177,21 +188,66 @@ Write changes to 7 tracks? [Y/n] y
Applied tag changes to 7 tracks!
```

Now that I've written these rules, I can also store them in Rosé's configuration in
order to apply them on all releases I add in the future. I do this by appending
the following to my configuration file:
Success! However, notice that one of Chuu's releases has the genre tag `Kpop`.
Let's convert that `Kpop` tag to `K-Pop`, across the board.

```bash
$ rose metadata run-rule 'genre:^Kpop$' 'replace:K-Pop'
G‐Dragon - 2012. ONE OF A KIND/01. One Of A Kind.opus
genre: ['Kpop'] -> ['K-Pop']
G‐Dragon - 2012. ONE OF A KIND/02. 크레용 (Crayon).opus
genre: ['Kpop'] -> ['K-Pop']
G‐Dragon - 2012. ONE OF A KIND/03. 결국.opus
genre: ['Kpop'] -> ['K-Pop']
G‐Dragon - 2012. ONE OF A KIND/04. 그 XX.opus
genre: ['Kpop'] -> ['K-Pop']
G‐Dragon - 2012. ONE OF A KIND/05. Missing You.opus
genre: ['Kpop'] -> ['K-Pop']
G‐Dragon - 2012. ONE OF A KIND/06. Today.opus
genre: ['Kpop'] -> ['K-Pop']
G‐Dragon - 2012. ONE OF A KIND/07. 불 붙여봐라.opus
genre: ['Kpop'] -> ['K-Pop']
LOOΠΔ - 2017. Chuu/01. Heart Attack.opus
genre: ['Kpop', 'K-Pop'] -> ['K-Pop']
LOOΠΔ - 2017. Chuu/02. Girl's Talk.opus
genre: ['Kpop', 'K-Pop'] -> ['K-Pop']
Write changes to 9 tracks? [Y/n] y
[14:47:26] INFO: Writing tag changes for rule matcher=genre:Kpop action=matched:Kpop::replace:K-Pop
[14:47:26] INFO: Wrote tag changes to G‐Dragon - 2012. ONE OF A KIND/01. One Of A Kind.opus
[14:47:26] INFO: Wrote tag changes to G‐Dragon - 2012. ONE OF A KIND/02. 크레용 (Crayon).opus
[14:47:26] INFO: Wrote tag changes to G‐Dragon - 2012. ONE OF A KIND/03. 결국.opus
[14:47:26] INFO: Wrote tag changes to G‐Dragon - 2012. ONE OF A KIND/04. 그 XX.opus
[14:47:26] INFO: Wrote tag changes to G‐Dragon - 2012. ONE OF A KIND/05. Missing You.opus
[14:47:26] INFO: Wrote tag changes to G‐Dragon - 2012. ONE OF A KIND/06. Today.opus
[14:47:26] INFO: Wrote tag changes to G‐Dragon - 2012. ONE OF A KIND/07. 불 붙여봐라.opus
[14:47:26] INFO: Wrote tag changes to LOOΠΔ - 2017. Chuu/01. Heart Attack.opus
[14:47:26] INFO: Wrote tag changes to LOOΠΔ - 2017. Chuu/02. Girl's Talk.opus

Applied tag changes to 7 tracks!
```

And we also normalized a G-Dragon release's genre tag on the way! I'd like to
run these rules again in the future, so that all new music I add to my library
is normalized according to these rules. To do so, the following text goes
inside my configuration file.

```toml
[[stored_metadata_rules]]
matcher = "trackartist,albumartist:CHUU"
matcher = "trackartist,albumartist:^CHUU$"
actions = ["replace:Chuu"]
[[stored_metadata_rules]]
matcher = "trackartist,albumartist:Chuu"
actions = ["genre::replace-all:K-Pop"]
matcher = "albumartist:^Chuu$"
actions = ["genre::add:K-Pop"]
[[stored_metadata_rules]]
matcher = "genre:^Kpop$"
actions = ["replace:K-Pop"]
```

And with the `rose metadata run-stored-rules` command, I can run these rules,
as well as the others, repeatedly again in the future.
Now, the `rose metadata run-stored-rules` command will run the above three
rules, along with any other rules I have in my configuration file, on the
entire library.

## Mechanics

Expand All @@ -214,22 +270,26 @@ The `^` and `$` characters enable strict prefix and strict suffix matching,
respectively. So for example, the pattern `^Chu` match `Chuu`, but not `AChuu`.
And the pattern `Chu$` matches `Chu`, but not `Chuu`.

Case sensitive

### Actions

Actions are `(tags, pattern, all, kind, *args)` tuples for modifying the
metadata of a track.

The `tags` and `pattern`, usually by default, equivalent the `matcher`. TODO

Given a track, if the `pattern` matches the `tags`, by the same logic as the
matchers, the action is applied.

There are four kinds of actions: `replace`, `sed`, `split`, and `delete`. Each
action has its own set of additional arguments.
There are five kinds of actions: `replace`, `sed`, `split`, `add`, and
`delete`. Each action has its own set of additional arguments.

- `replace`:
- `replace`: Replace a tag.

For multi-valued tags, `all`...
### Multi-Value Tags

The `tags` and `pattern`, usually by default, equivalent the `matcher`.
For multi-valued tags, `all`...

### Track-Based Paradigm

Expand Down
20 changes: 5 additions & 15 deletions rose/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@

from rose.artiststr import format_artist_string
from rose.audiotags import SUPPORTED_AUDIO_EXTENSIONS, AudioTags
from rose.common import VERSION
from rose.common import VERSION, uniq
from rose.config import Config

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -805,20 +805,20 @@ def _update_cache_for_releases_executor(

if set(tags.genre) != set(release.genres):
logger.debug(f"Release genre change detected for {source_path}, updating")
release.genres = _uniq(tags.genre)
release.genres = uniq(tags.genre)
release_dirty = True

if set(tags.label) != set(release.labels):
logger.debug(f"Release label change detected for {source_path}, updating")
release.labels = _uniq(tags.label)
release.labels = uniq(tags.label)
release_dirty = True

release_artists = []
for role, names in asdict(tags.album_artists).items():
# Multiple artists may resolve to the same alias (e.g. LOONA members...), so
# collect them in a deduplicated way, and then record the deduplicated aliases.
aliases: set[str] = set()
for name in _uniq(names):
for name in uniq(names):
release_artists.append(CachedArtist(name=name, role=role))
aliases.update(c.artist_aliases_parents_map.get(name, []))
for name in aliases:
Expand Down Expand Up @@ -926,7 +926,7 @@ def _update_cache_for_releases_executor(
# Multiple artists may resolve to the same alias (e.g. LOONA members...), so collect
# them in a deduplicated way, and then record the deduplicated aliases.
aliases = set()
for name in _uniq(names):
for name in uniq(names):
track.artists.append(CachedArtist(name=name, role=role))
aliases.update(c.artist_aliases_parents_map.get(name, []))
for name in aliases:
Expand Down Expand Up @@ -2234,16 +2234,6 @@ def _flatten(xxs: list[list[T]]) -> list[T]:
return xs


def _uniq(xs: list[T]) -> list[T]:
rv: list[T] = []
seen: set[T] = set()
for x in xs:
if x not in seen:
rv.append(x)
seen.add(x)
return rv


def _unpack(*xxs: str, delimiter: str = r" \\ ") -> Iterator[tuple[str, ...]]:
"""
Unpack an arbitrary number of strings, each of which is a " \\ "-delimited list in actuality,
Expand Down
13 changes: 13 additions & 0 deletions rose/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@

import uuid
from pathlib import Path
from typing import TypeVar

with (Path(__file__).parent / ".version").open("r") as fp:
VERSION = fp.read().strip()

T = TypeVar("T")


class RoseError(Exception):
pass
Expand All @@ -24,3 +27,13 @@ def valid_uuid(x: str) -> bool:
return True
except ValueError:
return False


def uniq(xs: list[T]) -> list[T]:
rv: list[T] = []
seen: set[T] = set()
for x in xs:
if x not in seen:
rv.append(x)
seen.add(x)
return rv
16 changes: 8 additions & 8 deletions rose/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
get_release_source_paths_from_ids,
update_cache_for_releases,
)
from rose.common import RoseError
from rose.common import RoseError, uniq
from rose.config import Config
from rose.rule_parser import (
AddAction,
Expand Down Expand Up @@ -236,7 +236,7 @@ def execute_metadata_rule(
todisplay: list[tuple[str, list[Changes]]] = []
maxpathwidth = 0
for tags, changes in actionable_audiotags:
pathtext = str(tags.path).lstrip(str(c.music_source_dir) + "/")
pathtext = str(tags.path).removeprefix(str(c.music_source_dir) + "/")
if len(pathtext) >= 120:
pathtext = pathtext[:59] + ".." + pathtext[-59:]
maxpathwidth = max(maxpathwidth, len(pathtext))
Expand Down Expand Up @@ -288,13 +288,13 @@ def execute_metadata_rule(
for tags, changes in actionable_audiotags:
if tags.release_id:
changed_release_ids.add(tags.release_id)
pathtext = str(tags.path).lstrip(str(c.music_source_dir) + "/")
pathtext = str(tags.path).removeprefix(str(c.music_source_dir) + "/")
logger.debug(
f"Attempting to write {pathtext} changes: "
f"{' //// '.join([str(x)+' -> '+str(y) for _, x, y in changes])}"
)
tags.flush()
logger.info(f"Wrote tag changes to {tags.path}")
logger.info(f"Wrote tag changes to {pathtext}")

click.echo()
click.echo(f"Applied tag changes to {len(actionable_audiotags)} tracks!")
Expand Down Expand Up @@ -322,7 +322,7 @@ def matches_pattern(pattern: str, value: str | int | None) -> bool:
# Factor out the logic for executing an action on a single-value tag and a multi-value tag.
def execute_single_action(action: MetadataAction, value: str | int | None) -> str | None:
if action.match_pattern and not matches_pattern(action.match_pattern, value):
return None
return str(value)

bhv = action.behavior
strvalue = str(value) if value is not None else None
Expand Down Expand Up @@ -356,11 +356,11 @@ def execute_multi_value_action(

# Handle these cases first; we have no need to loop into the body for them.
if action.all and isinstance(bhv, ReplaceAction):
return bhv.replacement.split(";")
return uniq(bhv.replacement.split(";"))
if action.all and isinstance(bhv, DeleteAction):
return []
if isinstance(bhv, AddAction):
return [*values, bhv.value]
return uniq([*values, bhv.value])

# Create a copy of the action with the match_pattern set to None. We'll pass this into the
# delegated calls in execute_single_action.
Expand All @@ -379,4 +379,4 @@ def execute_multi_value_action(
rval.append(newv.strip())
elif newv2 := execute_single_action(action, v):
rval.append(newv2)
return rval
return uniq(rval)
8 changes: 4 additions & 4 deletions rose/rules_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,11 +224,11 @@ def test_sed_action(config: Config, source_dir: Path) -> None:
def test_sed_all(config: Config, source_dir: Path) -> None:
rule = MetadataRule(
matcher=MetadataMatcher(tags=["genre"], pattern="P"),
actions=[MetadataAction(behavior=SedAction(src=re.compile("^.*$"), dst="ip"))],
actions=[MetadataAction(behavior=SedAction(src=re.compile("^(.*)$"), dst=r"i\1"))],
)
execute_metadata_rule(config, rule, confirm_yes=False)
af = AudioTags.from_file(source_dir / "Test Release 1" / "01.m4a")
assert af.genre == ["ip", "ip"]
assert af.genre == ["iK-Pop", "iPop"]


def test_split_action(config: Config, source_dir: Path) -> None:
Expand All @@ -248,7 +248,7 @@ def test_split_all_action(config: Config, source_dir: Path) -> None:
)
execute_metadata_rule(config, rule, confirm_yes=False)
af = AudioTags.from_file(source_dir / "Test Release 1" / "01.m4a")
assert af.genre == ["K-", "op", "op"]
assert af.genre == ["K-", "op"]


def test_add_action(config: Config, source_dir: Path) -> None:
Expand Down Expand Up @@ -300,7 +300,7 @@ def test_action_on_different_tag(config: Config, source_dir: Path) -> None:
)
execute_metadata_rule(config, rule, confirm_yes=False)
af = AudioTags.from_file(source_dir / "Test Release 1" / "01.m4a")
assert af.genre == ["hi", "hi"]
assert af.genre == ["hi"]


def test_chained_action(config: Config, source_dir: Path) -> None:
Expand Down

0 comments on commit 0dd130a

Please sign in to comment.