Skip to content

Workflow

fosol edited this page Oct 25, 2021 · 4 revisions

Today's News Online (TNO) is a news aggregation system that takes in news sources of varying types and provides a single location for clients to access the data. Provides a variety of key services to senior executives throughout government including near-live issue alerts, morning and evening summary reports, as well as in-depth media analysis of key government initiatives and issues.

The TNO solution processes have not yet been fully designed or implemented.

Flowchart

TNO aggregates news content from a variety of sources, organizes, and filters it to generate automated real time reports and alerts. Subscribers have a single location they can view all their news content.

All content is ingested into the solution through several different producers (listeners, requestors, readers, scanners, recorders, file shares, and editors). Producers are services that receive or fetch content from 3rd parties.

The content is then picked up and added to the Queue. Apache Kafka has been implemented to support the Queue. The Queue enables a performant process where all other dependent and independent systems can interact, which allows abstraction and separation of concerns. This enables horizontal and vertical scaling, along with enabling the ability to separate other services geographically. The Queue also enables automated management of the lifecycle and lifespan of content. Audio and Video content is never uploaded to the Queue, however the metadata is. Audio and Video content is instead maintained and provided through additional services that are physically closer to the content, which enables better performance and reduces network bandwidth.

Once content is in the Queue, consumers automatically pick it up and begin transcription, natural language processing, and indexing of the content and it's metadata. These processes and services enable searching, viewing, reporting, and analysis of the content and subscriber activities.

Workflow

Inputs

There are various ingestion services. Some are passively listening for pushed content (files uploaded to a share), while others are actively and constantly making requests to 3rd party sources for new content.

Editors are also able to manually add and published content.

Each service is a Kafka Producer which ensures all content events are pushed into the queue.

Service Description
Syndication Service that pulls syndication content from 3rd party APIs
API Listener Open RESTful API that 3rd parties can push content to
API Requester Service that pulls content from 3rd party APIs
Web Reader Service that crawls websites for content
Recorder - Stream Service that records streamed video
Recorder - TV Service that records TV
Recorder = Radio Service that records radio
File Share Listener Service that listens for file uploads to file shares
Editor App Web application that provides editors ability to manually add content

Ingest Workflow

A docker process will continually run based on configuration. It will have a local default configuration that it has to get started. It will make a request to the TNO DB for the latest configuration settings. To ensure duplicate entries are not pushed to Kafka it will maintain a reference in the TNO DB.

Ingest Workflow

Workflow Activities

Each activity enables content to move through the TNO solution so that it can be maintained, published, transcribed, parsed, search, analyzed, archived, and at end of live purged.

Activity Description
Store on File Share Video and audio content is downloaded to file shares
Upload to Media Service Video and audio content is uploaded to cloud media service
Queue Kafka queue services to manage the distribution of content
Transcribe Kafka consumer process to extract text from video and audio content
NLP Process Kafka consumer process to perform natural language processing
Index Elasticsearch storage and indexing of content for the purpose of search
Store Content All metadata and content is stored within TNO for it's licensed lifecycle

Automation

There are various services that are fully automated that generate reports, alerts, and archival and purging activities.

Service Description
Reports Generate reports based on content metadata, schedule, and subscribers
Alerts Generate alerts based on content metadata, schedule, and subscribers
Archive/Clean Ensure content licensing is adhered to, and configured storage limits

Outputs

The primary output of TNO is an aggregated source of 3rd party content. Subscribers are able to search and view content, along with receiving automated reports and alerts. TNO will also be able to monitor content and analyze subscriber activities in order to make informed decisions.

Feature Description
Search Filter and find content that is relevant and timely. This requires parsing content and generating relevant and accurate metadata through transcription, natural language processing, and indexing
Content Subscribers can view, listen, and read 3rd party content
Reports Subscribers received scheduled automatically generated content based on filters
Alerts Subscribers receive scheduled automatically generated content based on filters
Monitor Users can view and analysis content metadata and subscriber activities

Administrative

Users can manage the TNO solution through the below features.

Feature Description
User Management Administrators can assist users and their accounts. Users can manage their own profile preferences
Report Management Administrators can manage global reports. Users can create, manage and subscribe to reports
Alert Management Administrators can manage global alerts. Users can create, manage and subscribe to alerts
Subscription Management Administrators can manage user subscriptions. Users can manage their own subscription
Clone this wiki locally