Skip to content
ctide edited this page Aug 1, 2011 · 1 revision

Synclets - lightweight sync routines to simplify connectors

This is a proposal to simplify a common pattern of connectors of performing a sync of a specific data type. Someone building a connector should not have to worry about anything except how do I get data, and what data am I providing to the locker. We should be able to implement things like new data stores, or status tracking, without ever touching the sync code. Ideally, all connectors that ship with the locker and perform data sync would follow this pattern because it's easier, but not necessarily required. There could be 1 to many synclets for each connector, and some connectors may be no more than that. Some connectors also have push and custom apis specific to the remote service that fall outside of the sync patterns.

'Common' connector sync code

A synclet should only need the auth tokens and last known state (timestamp, list of ids, etc) passed into it to start, and can pull the data from the remote service, handle any errors (like rate limited or retry), etc. Once it has the data either new items or changes to current it returns that data and exits. The synclet manager generates the events, tracks the sync status and item counts, and stores things in mongo and the historical log, as well as provides the REST api for access to those things.

What the synclet manager needs to implement

  1. Scheduling. It needs to manage all of the scheduling for firing up synclets. Synclets should have a manifest that defines how often they should be fired up, and either which javascript file is required and ran by the common code, or the command line that is called to start them up.
  2. Producing events and storing data. The synclet itself will only produce a JSON formatted object that contains all of the data it pulled down from the source. This data is then stored into Mongo/LFS/wherever else by the manager and it will generate events and push those to collections.
  3. Providing status insight to the rest of the locker. By controlling scheduling and knowing when the processes finish, it can relay that information to anything else that cares about which synclets are currently run and which are idle, and when the next scheduled job for each will be.
  4. Providing access to the data stored by synclet. Rather than implementing this query logic into each synclet, it makes more sense to be stored in the central piece. This also makes it trivial for us to add more ways of querying the data and having it 'just work' for each connector.

Rough flow

http://www.websequencediagrams.com/cgi-bin/cdraw?lz=TG9ja2VyLT5NYW5hZ2VyOiBTdGFydHVwCgAKBy0-Q29ubmVjdG9yOiBTeW5jCgAHCS0-UHJvdmlkZXI6IEdpbW1lIERhdGEKAA0IAC0NSGVyZSB5b3UgZ28gYnJvADkMAHEJQ2FsbGJhY2sgd2l0aCBhbGwgdGhlIG5ldyBkYXRhAIEHCgCBIAtvcgBuBwCBJQkAgUoGOiBFbWl0IGV2ZW50cwAmFGNoZWR1bGUgYW5vdGhlciBzeW5jCgoAgXwRAIFRBmRhdGEgZm9yIGMAggMIIFgAYBIAgVQP&s=napkin

Open Questions

How do we store state? Continue using a currentState.json type file that's read in by the library, or should this be managed directly in the connector/connector?

How do we handle deleted items? The library, in this model, doesn't even talk to the database so how does it know whether friends have been deleted or not? Possibly store a local JSON file to the library with a list of IDs for things we care about? Should this just be left up to the connector itself to sort out?

Should we use stdout, or post items back to the connector-connector? I'm leaning towards posting instead of stdout, but, definitely open to discussion on this.

Clone this wiki locally