Skip to content

Latest commit

 

History

History
180 lines (139 loc) · 7.12 KB

readme.md

File metadata and controls

180 lines (139 loc) · 7.12 KB

Twitter Flock

Simple & robust workflows to export your flock of followers from Twitter.

Twitter Flock

Build Status JavaScript Style Guide

How it works

All automations are built around Twitter OAuth which gives us higher rate limits and access to private user actions like tweeting and sending DMs.

BatchJob

The core automation functionality is built around the BatchJob class.

The goal of BatchJob is to ensure that potentially large batches of Twitter API calls are serializable and resumable.

A BatchJob stores all of the state it would need to continue resolving its async batched operation in the event of an error. BatchJob instances support serializing their state in order to store them in a database of JSON file on disk.

Here's an example batch job in action:

// fetches the user ids for all of your followers
const job = BatchJobFactory.createBatchJobTwitterGetFollowers({
  params: {
    // assumes that you already have user oauth credentials
    accessToken: twitterAccessToken,
    accessTokenSecret: twitterAccessTokenSecret,

    // only fetch your first 10 followers for testing purposes
    maxLimit: 10,
    count: 10
  }
})

// process as much of this job as possible until it either completes or errors
await job.run()

// job.status: 'active' | 'done' | 'error'
// job.results: string[]
console.log(job.status, job.results)

// store this job to disk
fs.writeFileSync('out.json', job.serialize())

// ...

// read the job from disk and resume processing
const jobData = fs.readFileSync('out.json')
const job = BatchJobFactory.deserialize(jobData)

if (job.status === 'active') {
  await job.run()
}

This example also shows how to serialize and resume a job.

Workflow

Sequences of BatchJob instances can be connected together to form a Workflow.

Here's an example workflow:

const job = new Workflow({
  params: {
    // assumes that you already have user oauth credentials
    accessToken: twitterAccessToken,
    accessTokenSecret: twitterAccessTokenSecret,
    pipeline: [
      {
        type: 'twitter:get-followers',
        label: 'followers',
        params: {
          // only fetch your first 10 followers for testing purposes
          maxLimit: 10,
          count: 10
        }
      },
      {
        type: 'twitter:lookup-users',
        label: 'users',
        connect: {
          // connect the output of the first job to the `userIds` param for this job
          userIds: 'followers'
        },
        transforms: ['sort-users-by-fuzzy-popularity']
      },
      {
        type: 'twitter:send-direct-messages',
        connect: {
          // connect the output of the second job to the `users` param for this job
          users: 'users'
        },
        params: {
          // handlebars template with access to the current twitter user object
          template: `Hey @{{user.screen_name}}, I'm testing an open source Twitter automation tool and you happen to be my one and only lucky test user.\n\nSorry for the spam. https://github.com/saasify-sh/twitter-flock`
        }
      }
    ]
  }
})

await job.run()

This workflow is comprised of three jobs:

  • twitter:get-followers - Fetches the user ids of all of your followers.
  • twitter:lookup-users - Expands these user ids into user objects.
  • twitter:send-direct-messages - Sends a template-based direct message to each of these users.

Note that Workflow derives from BatchJob, so workflows are also serializable and resumable. Huzzah!

Future work

A more robust, scalable version of this project would use a solution like Apache Kafka. Kafka.js looks useful as a higher-level Node.js wrapper.

Kafka would add quite a bit of complexity, but it would also handle a lot of details and be significantly more efficient. In particular, Kafka would solve the producer / consumer model, give us more robust error handling, horizontal scalability, storing and committing state, and enable easy interop with different data sources and sinks.

This project was meant as a quick prototype, however, and our relatively simple BatchJob abstraction works pretty well all things considered.

Producer / Consumer

One of the disadvantages of the current design is that a BatchJob needs to complete before any dependent jobs can run, whereas we'd really like to model this as a Producer-Consumer problem.

DAGs

Another shortcoming of the current design is that Workflows can only combine sequences of jobs where the output of one job feeds into the input of the next job.

A more extensible design would allow for workflows comprised of directed acyclic graphs.

MVP TODO

  • resumable batch jobs
  • resumable workflows (sequences of batch jobs)
  • twitter:get-followers batch job
  • twitter:lookup-users batch job
  • twitter:send-direct-messages batch job
  • test workflow which combines these three batch jobs
  • test rate limits
    • twitter:get-followers 75k / 15 min
    • twitter:lookup-users 90k / 15 min
    • twitter:send-direct-messages 1k / day
    • twitter:send-tweets 300 / 3h -> 2.4k / day
  • large account test
  • gracefully handle twitter rate limits
  • experiment with extracting public emails
  • add default persistent storage
  • support commiting batch job updates
  • user-friendly cli
  • add cli support for different output formats
  • gracefully handle process exit
  • initial set of cli commands
  • cli oauth support
  • unit tests for snapshotting, serializing, deserializing
  • unit tests for workflows
  • convert transforms to batchjob
  • more dynamic rate limit handling
  • support bring-your-own-api-key
  • basic docs and demo video
  • hosted saasify version

License

MIT © Saasify

Support my OSS work by following me on twitter twitter