Tweetypie is the core Tweet service that handles the reading and writing of Tweet data. It is called by the Twitter clients (through GraphQL), as well as various internal Twitter services, to fetch, create, delete, and edit Tweets. Tweetypie calls several backends to hydrate Tweet related data to return to callers.
The next sections describe the layers involved in the read and create paths for Tweets.
In the read path, Tweetypie fetches the Tweet data from Manhattan or Twemcache, and hydrates data about the Tweet from various other backend services.
- backends: A "backend" is a wrapper around a thrift service that Tweetypie calls. For example Talon.scala is the backend for Talon, the URL shortener.
- repository: A "repository" wraps a backend and provides a structured interface for retrieving data from the backend. UrlRepository.scala is the repository for the Talon backend.
- hydrator: Tweetypie doesn't store all the data associated with Tweets. For example, it doesn't store User objects, but it stores screennames in the Tweet text (as mentions). It stores media IDs, but it doesn't store the media metadata. Hydrators take the raw Tweet data from Manhattan or Cache and return it with some additional information, along with hydration metadata that says whether the hydration took place. This information is usually fetched using a repository. For example, during the hydration process, the UrlEntityHydrator calls Talon using the UrlRepository and fetches the expanded URLs for the t.co links in the Tweet.
- handler: A handler is a function that handles requests to one of the Tweetypie endpoints. The GetTweetsHandler handles requests to
get_tweets
, one of the endpoints used to fetch Tweets.
At a high level, the path a get_tweets
request takes is as follows.
- The request is handled by GetTweetsHandler.
- GetTweetsHandler uses the TweetResultRepository (defined in LogicalRepositories.scala). The TweetResultRepository has at its core a ManhattanTweetRespository (that fetches the Tweet data from Manhattan), wrapped in a CachingTweetRepository (that applies caching using Twemcache). Finally, the caching repository is wrapped in a hydration layer (provided by TweetHydration.hydrateRepo). Essentially, the TweetResultRepository fetches the Tweet data from cache or Manhattan, and passes it through the hydration pipeline.
- The hydration pipeline is described in TweetHydration.scala, where all the hydrators are combined together.
The write path follows different patterns to the read path, but reuses some of the code.
- store: The store package includes the code for updating backends on write, and the coordination code for describing which backends need to be updated for which endpoints. There are two types of file in this package: stores and store modules. Files that end in Store are stores and define the logic for updating a backend, for example ManhattanTweetStore writes Tweets to Manhattan. Most of the files that don't end in Store are store modules and define the logic for handling a write endpoint, and describe which stores are called, for example InsertTweet which handles the
post_tweet
endpoint. Modules define which stores they call, and stores define which modules they handle.
The path a post_tweet
request takes is as follows.
- The request is handled in PostTweet.scala.
- TweetBuilder creates a Tweet from the request, after performing text processing, validation, URL shortening, media processing, checking for duplicates etc.
- WritePathHydration.hydrateInsertTweet passes the Tweet through the hydration pipeline to return the caller.
- The Tweet data is written to various stores as described in InsertTweet.scala.