Tables as Database Files as a Potential Solution to the Schema Change Problem #113

WesselAtWork · 2024-06-04T10:48:35Z

WesselAtWork
Jun 4, 2024

Tables as Database Files

The approach is simple in that we don't try to deal with the schema changes on Marmot's level, but on the boundary between Marmot and the Application.

A theoretical setup would look like where every table has multiple files (versions) that are all replicated by marmot.

block-beta
  columns 3
  db["physical database"]:3
  tab1("table1-1997.db") tab2("table1-1998.db") tab3("table1-1999.db") 
  tab4("table2_789.db") tab5("table2_790.db")

We can use the ATTACH to create a database connection that binds all (or some) of the table files together.

block-beta
  columns 3
  db["logical database"]:3
  tab1["table1"] tab2["table2"]

This moves the concern to the application, and by extension the application developer.

Inside the application we will have to deal with migrations and or data fusing.

Simple Switchover

If your application needs a code change to deal with the new schema anyway, you have to stop all instance of the application from adding to the old table.

To have a new schema of tableX we would create the new file tableX-16.db, apply the new schema, and then migrate the data over.

block-beta
 columns 3
 tab1("tableX-15.db")
 blockArrowId<["Migrate"]>(right)
 tab2("tableX-16.db")

Once the new table is live, all the applications instances are activated again and will now operate on the new table file.

Gradual Switchover

If you need every instance of the application to always be online, you have to have both databases (tables) active and live at some point in time.
You need to deal with reading from an old version at the same time you are writing to the new one, this has the added bonus of giving your application the ability to do time based partitions.

All data from 1998 is stored in the table_13_1998.db file.

This increases the programming burden as you need to deal with more then one version of the representation of your data.
Once you have the process well defined for your application, you could deal with any change at any time.

Basically watch for new db -> switchover as quick as possible -> use old table as readonly.

If you already have you application partitioned into a READ component and a WRITE component, this is way easier, as your writing component just needs to start writing into the new schema the moment it becomes active, and the reading component deals with the problem of fusing the data across files

You could also make every table file versioned to the application version: tableX_v1.0.1.db to make the distinction between old data and new data easy inside the application.

Implementation requirement from Marmot

IINS: Easily Dealing with multiple db files at the same time.

Currently we CAN do this model by running multiple instances of Marmot connected to all the different files, but that will get unmanageable very quickly.

The ideal would be a single process watching a folder, and then registering any new database files, as and when they arrive.

This way the migration would be controlled completely by the application, which would create the new files in the directory and use them.
All marmot needs to do, is get it replicated as fast as possible.

Optionally: What could be useful is if the marmot process could signal the application instance it is sidecar-ing, by either issuing a SIGHUP or by creating a file somewhere the application can watch for.
This should cause the application to "reload config" and catch the new table file. (Of course what EXACTLY the application does is left to the implementer)

Bonus

You can "watch" tables selectively!
In a theoretical setup where every table is self contained in separate sqlite database files , all you need to do to "watch" a single table is to only include that table file in your application environment.
i.e.

block-beta
  columns 5
  component1:2 space component2:2
  block:fuller:2
    columns 2
    tab1("table1.db") tab2("table2.db")
  end
  space
  block:halfer:2
    columns 2
    tab3("table1.db") space
  end

From marmot we just need the ability to use a regex on the sqlite files we want to replicate in our local instance form the main pool

Again this raises the implementation to the developer, who needs to be cognizant of this when they implement their program.
The application developer needs to be aware of what they want from the database and then set the db connection appropriately.

gedw99 · 2024-06-25T05:20:18Z

gedw99
Jun 25, 2024

Hey @WesselAtWork

Thank you for making the effort to explain this approach.

I like that your taking a different approach and braining storming different approaches.

Because marmot is designed to work with different systems like Pocketbase and others, putting the effort on the Application developer might not work that well maybe ? I kind of worry about that.

Some of my own thoughts on this from a pragmatic perspective:

If we can calculate the migration diff then maybe we can keep the effort off the Application Developer...

As far as I know the tricky part when you do migrations to the Tables with a CRDT system is that you need to hold the Old Tables schema in the DB also. Thats because CRDT is designed to allow offline changes on ANY of the SQLite DB's.

With marmot because we are Ordered ( thats to NATS ) , it is possibly much easier. The DB migration is calculate on 1 server , and then that migration is pushed ( via NATS ) to all the other SQLITE DB's. I am of course not certain, but this makes sense to me as a way to do it.

What about Race conditions ?
If Server 2 changes some data in a Table just before Server 1 did a migration on that Table, then Server 1 would ( because of NATS ordering ) apply the data change. We should be OK.

If NATS is partitioned ( fancy word for the network between some NATS Servers going down for a bit ), and then reforms the network, I do not know if the above Race condition would occur or not.. Would be cool if @maxpert could chime in here if he knows.

https://atlasgo.io/getting-started/sqlite-declarative-sql shows how we can calculate the Schema change automatically.. I don't know if that only works for Ent though. Anyone know ?

https://github.com/ariga/atlas is the code.

This project is relevant too for ideas perhaps ?

https://superfly.github.io/corrosion/schema.html
https://github.com/superfly/corrosion

Corrosion uses CR-SQlite btw: https://vlcn.io/docs/cr-sqlite/migrations

Its using the CRDT ideas, and not the NATS RAFT approach.

Some things I can point out with Corrosion that is sort of the same with Marmot:

It adds virtual tables just like Marmot: https://superfly.github.io/corrosion/crdts.html#crsqlite-tables
It uses a timestamp, like like Marmot. They use a Lamppost clock because it's CRDT. Marmot uses it for NATS Locking I believe.
It watches .sql files to work out the diff / changes, so that https://vlcn.io/docs/cr-sqlite/migrations can do intelligent changes to the Virtual tables, and possible real tables.

0 replies

WesselAtWork · 2024-06-25T10:17:12Z

WesselAtWork
Jun 25, 2024
Author

If NATS is partitioned ( fancy word for the network between some NATS Servers going down for a bit ), and then reforms the network, I do not know if the above Race condition would occur or not.. Would be cool if @maxpert could chime in here if he knows.

Reading from the feature promises on CR-SQLite it's supposed to deal with diverging writes if a network partition happens. You probably need to integrate more directly with the library, and manage the meta-structure inside marmot.

Because marmot is designed to work with different systems like Pocketbase and others, putting the effort on the Application developer might not work that well maybe ? I kind of worry about that.

Could you elaborate?
I assume you don't want the stance: "Sorry, deal with your migrations in this opinionated way".
I agree. It a less then ideal stance. It also blocks out existing projects from adopting marmot.

1 reply

gedw99 Jun 25, 2024

If NATS is partitioned ( fancy word for the network between some NATS Servers going down for a bit ), and then reforms the network, I do not know if the above Race condition would occur or not.. Would be cool if @maxpert could chime in here if he knows.

Reading from the feature promises on CR-SQLite it's supposed to deal with diverging writes if a network partition happens. You probably need to integrate more directly with the library, and manage the meta-structure inside marmot.

Yes I think so too. I guess we need to get something going and find out what race conditions we are hit with ? I suggest having a setup that makes it easy to simulate locally ( and in prod ) many of one thing ( like 5 Dbs / marmot servers ). Runner can do this really well btw. Been using it a lot. Its a god send and works on all Desktops and Servers: https://github.com/cirello-io/runner

Because marmot is designed to work with different systems like Pocketbase and others, putting the effort on the Application developer might not work that well maybe ? I kind of worry about that.

Could you elaborate? I assume you don't want the stance: "Sorry, deal with your migrations in this opinionated way". I agree. It a less then ideal stance. It also blocks out existing projects from adopting marmot.

Yes, this is why I suggested we calculate the migration by diffing the before and after SQL schema for a table. Like what Corrosion does.

Also Corrosion has a cool feature that does change feeds. It's highly useful, because you can set a SQL Query that is actually a subscription. https://superfly.github.io/corrosion/api/subscriptions.html

Ypui then get a feed as a HTTP forever stream when the results of the query changes. This is a huge feature because then you can easily do real time Notification without programming. You get do the Query and off you go. I dont want to do scope creep though.

Once you get down the rabbit hole, you realise that Corrosion IS the DB. It's a HTTP API over SQLite.

Anyway it late here. I will try to do some experimenting on this area, and report back here..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tables as Database Files as a Potential Solution to the Schema Change Problem #113

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Tables as Database Files as a Potential Solution to the Schema Change Problem #113

WesselAtWork Jun 4, 2024

Tables as Database Files

Simple Switchover

Gradual Switchover

Implementation requirement from Marmot

Other problems

Config.

Cleanup

Bonus

Replies: 2 comments · 1 reply

gedw99 Jun 25, 2024

WesselAtWork Jun 25, 2024 Author

gedw99 Jun 25, 2024

WesselAtWork
Jun 4, 2024

Replies: 2 comments 1 reply

gedw99
Jun 25, 2024

WesselAtWork
Jun 25, 2024
Author