Skip to content

Flow of Data in the System

gabrieltholsgard edited this page Dec 17, 2013 · 6 revisions

Life Cycle of a Data Point

Terminology

  • Publishing - Meaning that the value gets sent into RabbitMQ.
  • Subscribing - Meaning that we are listening for published values to RabbitMQ.
  • Consumer - Someone/Something that is subscribing.
  • Publisher - Someone/Something that is publishing.

Introduction

There are two ways that a data point can enter the system and these are via the API and via the polling system. There are differences on how a data point is handled when it first arrives but later they are handled the same way.
Here we will explain these differences at arrival and then get to the common and lastly how we calculate predictions using R.

  1. Polling system
  2. Post via the RESTful API
  3. Common
  4. Predictions with R

Polling system

When polling we are polling a specific stream for a resource and we get a blob of data in some kind of format, this then need to be parsed so that the requested data can be extracted, at the moment we are only able to parse Json formatted data. The relevant data is then made into a data point and stored in Elastic Search and published into RabbitMQ for clients and virtual streams to subscribe to.
Continue to read more in common.

Post via the RESTful API

When posting a data point via the RESTful API we do it by specifying which stream we are pushing to, this allows us to store the data point directly into Elastic Search and publish it directly to RabbitMQ. The data point have to be stored successfully in Elastic Search before we publish it.
Continue to read more in common.

Common

Once a data point have been published for a stream, on the exchange streams.streamid where streamid is the id of the stream, it has also been stored in the data store in Elastic Search. History of data points for a stream could now be retrieved, including the latest value, and also clients and virtual streams could get the data point via their subscriptions.
In RabbitMQ each client and virtual stream subscribing to a stream has its own queue for that stream. These queues are temporary and will be deleted when the consumer disconnect.
If a virtual stream subscribe to the stream it will receive the data point and apply its function to it, store the calculated value in the data store in Elastic Search and publish it on its exchange, vstreams.vstreamid, which clients or other virtual streams might be subscribed to.

Predictions with R

In the current system predictions are calculated when requested, they are not in real-time and as such does not need to be included in RabbitMQ. When prediction is requested on a stream the latest 50 values of that stream is collected and sent to R as a list, R will then calculate the prediction and return a list of 25 predicted values.