Primary Key Should Include `node` Column? #221

jakimfett · 2024-10-16T19:25:31Z

In the Laminar SQLite database, the "node" column is not part of the primary key, which limits the use of multiple Laminar instances.

Remote jobs do not meet this need, because having a local and cloud instance of Laminar is necessary for my particular workflow.

Laminar emphasizes the use of contexts rather than nodes (laminar will exit if the cfg/nodes path exists), but because the primary key is only including the name and number columns, any attempt to combine database entries from multiple instances of Laminar will causes some difficulty, as different instances of Laminar with the same job will cause primary key collisions despite having unique contexts.

Because Laminar does not auto increment the number field (uses SELECT name, MAX), in theory this can be made compatible by including the node column in the primary key.

Thoughts?

(EDIT: Should the node field be refactored to context or similar to finalize the deprecation of node configuration?)

The text was updated successfully, but these errors were encountered:

ohwgiles · 2024-10-21T00:23:12Z

(EDIT: Should the node field be refactored to context or similar to finalize the deprecation of node configuration?)

Yes, that column should really be renamed "context". Nodes are a long-deprecated concept.

I'm not aware of sqlite supporting multiple concurrent writers to a single database, so I'm not sure how you would expect multiple laminar instances to use the same database. This isn't a use case I've considered.

That being said, there should be no problem having separate laminar instances, each with its own sqlite database, and performing queries on the aggregate of these databases, e.g. using ATTACH. Does this satisfy your use case? Or do you explicitly need each instance to have (read) access to the other's database?

jakimfett · 2024-10-22T02:12:34Z

Yes, that column should really be renamed "context".

I'd be interested in attempting this refactor if you think it'd be relatively straightforward.
Should I open a new ticket for that?

...multiple concurrent writers...

Technically Write-Ahead Logging can do this, but only on the same host. However, this was not my intention or use case. Having multiple laminar instances use the same database is unnecessary, and I don't need concurrent access to happen.

Does this satisfy your use case?

My goal is being able to merge build info and artifacts from all my laminar databases without having primary key collisions. My actual use case is several non-public servers performing builds, and a "frontend" where build statistics (and select artifacts) are available to the general public.

Example, given three non-public servers A, B, C plus one public frontend.
Server A is a repurposed desktop.

The artifact is the pijul version control binary.
This server on during the day, but shut off at night.

Server B is a single board computer.

mostly lightweight work like editing text files and running pandoc.
Artifacts are zip / tar files with HTML docs for the functions.sh shell library.
This server is on whenever I need to SSH to Server A or Server B, or when editing a text file in one of the repositories.

Server C is a 2U rack-mounted blade server with excessive amounts of RAM and CPU.

The artifact is the git version control binary.
This server is only on when building a new version of git, which is rarely (less than once a month).

The public frontend server is the smallest cloud-based option available.

Frontend only packages docs and shell library after a manual repository push from Server B, and never compiles C++ or rust artifacts.
This server is on at all times.

All servers self-package the shell library when manually pushed from Server B or when frontend has a new stable version.
All servers have a "selftest" function that validates all local jobs ShellCheck.
All servers use the pijul and git binaries.
Both the shell library and selftest jobs create a primary key conflict when attempting to merge the databases.

In an ideal world, each server could use laminar's after processing to merge their database's delta into Server B (if available) and the frontend database, along with copying the latest stable artifact. This would allow public access to artifacts, build results, logs and statistics from all the build servers via the Frontend laminar instance web interface via a single URL, eg "server.tld/artifacts/build/latest" or similar. Currently, making this work requires some convoluted manual intervention to avoid primary key conflicts for any jobs with identical names, or just uploading to the artifacts directory without having a corresponding build defined in the laminar interface. However, I'd really like to have whats in laminar correspond to whats in the artifacts directory.

Expanding the primary key to include the nodes (or contexts column) seems like it would allow the databases to be merged, but then build numbering would (probably) collide on the frontend server so graphing would look weird and links to build info would be duplicated, I haven't tested contexts enough to know whether the web interface differentiates builds by number when there are multiple contexts.

I don't have good visibility on what a solution might be, just that the step where I'm merging the databases wouldn't complain about primary keys conflicting if the nodes (or contexts column) were part of the primary key. If this seems like too much of an edge case to resolve, I'm comfortable being told it's not within laminar's scope. Hopefully this wall of text clarified what I'm trying to do here.

ohwgiles · 2024-10-22T22:20:27Z

I'd be interested in attempting this refactor if you think it'd be relatively straightforward. Should I open a new ticket for that?

You're welcome to submit a PR for this. No need for an issue as well as the PR.

My goal is being able to merge build info and artifacts from all my laminar databases without having primary key collisions. My actual use case is several non-public servers performing builds, and a "frontend" where build statistics (and select artifacts) are available to the general public.
...

This is most definitely out of laminar's scope. Furthermore, making context part of the primary key wouldn't solve your problem, since the UI is definitely not written to handle runs with the same name and number. context is just a property of a run. I'm happy to consider any other simple, generic changes that would help you achieve your goals, but I don't think this particular one will fly.

It sounds like the simplest solution is just ensuring that names cannot collide when merging, e.g. by adding the server name to the job name.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Primary Key Should Include `node` Column? #221

Primary Key Should Include `node` Column? #221

jakimfett commented Oct 16, 2024 •

edited

Loading

ohwgiles commented Oct 21, 2024

jakimfett commented Oct 22, 2024

ohwgiles commented Oct 22, 2024

Primary Key Should Include node Column? #221

Primary Key Should Include node Column? #221

Comments

jakimfett commented Oct 16, 2024 • edited Loading

ohwgiles commented Oct 21, 2024

jakimfett commented Oct 22, 2024

ohwgiles commented Oct 22, 2024

Primary Key Should Include `node` Column? #221

Primary Key Should Include `node` Column? #221

jakimfett commented Oct 16, 2024 •

edited

Loading