Skip to content

Filesystem state issues

temas edited this page Jul 11, 2011 · 1 revision

A few of the core systems currently track state or perform basic transaction support using the base file system. This is currently implemented as objects or arrays of JSON on disk. Due to the asynchronous nature of node and the overall design this can run into many issues quickly.

The first obvious issue is two parts trying to write to the file at once. This is seemingly solved by using fs.writeFileSync, but now we have a new issue. If the fs.writeFileSync was only used for appends it would be fine, but we could be removing data, so it tends to be better to write an entire structure currently. This means now that the two writes could have slightly varying structures. As an example pictures services A & B scheduling at the same time.

Previous Queue:
+---+---+---+
| X | Y | Z |
+---+---+---+
+---+                                         +---+---+---+---+
| A | -> schedule callback -> core writes ->  | X | Y | Z | A |
+---+                                         +---+---+---+---+
+---+                                         +---+---+---+---+
| B | -> schedule callback -> core writes ->  | X | Y | Z | B |
+---+                                         +---+---+---+---+

Now, based on which order the fs.writeFileSyncs complete only one of the scheduled actions will be saved, and the other lost completely. But we know how to fix this! We introduce a queue to do all the writing to disk. Well, that works but adds another layer of complexity. Now every write has to first read synchronously, parse the JSON, merge the new data, stringify the JSON, and write out synchronously. The merge logic is potentially different in each case of implementation as well. Some pieces (such as eventing) might need a more complete transaction tracking so it will need to write to the disk often as different parts happen, and the logic might vary on it being a completed series of events vs a new event.

While the logic presented there is all sound, it feels very complex and possibly fragile per implementation. It also massively disrupts the async flow of the code. Attempting implementation and pondering it more has led me to the belief that we need to move some of this logic into a database. SQLite is the first that springs to my mind as cleanly representing the structure and async needs of most of these pieces. I don't feel like Mongo is the fit due to the nonabstract nature of each piece and the relative light duty use case. I'd like to have some discussion on this though before doing anything deeper.

Clone this wiki locally