Skip to content

Infrastructure

Stuart Morse edited this page Nov 22, 2021 · 109 revisions

To support TNO it requires several distributed environments and different types of architecture. Within government there is the local office, on-premise infrastructure, and on-premise Openshift infrastructure. Externally cloud based infrastructure is hosted within Azure and GitHub.

The TNO solution environments have not yet been fully designed or implemented.

Diagram

Environments

Existing Office Infrastructure

The following diagram provides some insight into how the on location office is setup to support the capture of audio, video, and online data sources.

Diagram details not yet complete

Office Infrastructure

All inputs to the NAS are through network connection. Kalel plays a significant role in all channel selections. Please check under Kalel>File>Events for text selection commands.

CC Radio Plus tuner

  • (channel selected at CC tuner)
  • Input - AC & radio signals Output 3.5mm (F) to Linux PC Fedora 10 - Delta 1010LT PCI card

Grace Digital tuners

  • (channel selected at Grace tuner)
  • Input - network connection & AC Output - two RCA (F) to Linux PC Fedora 10 - Delta 1010LT PCI card

Hansard Vecima feed

  • (channels are selected on the PC and in Kalel)
  • Input - AC & coax cable feed from Legislature passing through numerous amps Output - coax (F) analogue Victoria Shaw television feed to multiple splitters to Linux PC Fedora 10 - WinTV-HVR-1600 card

Shaw satellite boxes

  • (channels selected at the Shaw tuner)
  • Input - AC, satellite dish (roof 617 Gov't) - splitter - Shaw tuners Output - coax (F) to Linux PC Fedora 10 - WinTV-HVR-1600 card

Shaw cable boxes

  • (channels selected at the Shaw tuner)
  • Input - AC, Shaw coax feed - Shaw tuners Output - coax (F) to Linux PC Fedora 10 - WinTV-HVR-1600 card

Raspberry PI

  • (channel selected at Shaw tuner)
  • Input - network, AC & USB 2.0 Output - Network connection to NAS Connection
  • note - HDMI output from Shaw tuner to HDMI splitter to HDMI to USB converter to USB input in Raspberry PI. (the HDMI splitter was inserted to overcome propriety restrictions)

The support budget for this equipment is included in the AKTIV support contract which includes application support, 24/7 on call support and all troubleshooting work. All of the current A/V capture solutions were created using off-the-shelf hardware. GDX is interested in bringing all the support for TNO in-house. As a result, we are looking for a more reliable, purpose built solution for all AV capturing. The current system requires daily on-site maintenance and would not meet the WFH requirement of the current upgrade.

Please note, many of these solutions are expiring. We had to move to Raspberry PI as the new WinTV cards and drivers had computability issues. Similarly, the Delta cards no longer work in the gov't PCs we can acquire.

The Audio and Video Capture Process

Audio and Video streams are collected using a variety of devices but in each case the process follows these steps:

  • The audio/video signal is received by a tuner or digital capture device.
  • The capture device feeds a signal into an A/V card mounted in a Lenovo PC or Raspberry PI (running Linux).
  • The A/V card maps to a device in the host operating system's /dev directory.
  • Once per day Jorel spawns an ffmpeg capture process that streams the digital content from this device and stores it in a single file in local storage.

Capture 24 hours of a single TV/Radio broadcast (the four steps above)

Office Infrastructure

  • At intervals, defined in the Jorel event table, clips are extracted from the parent file and are stored on the NAS located in Victoria.
  • The timespan of these clips overlap so that there is no missed content.
  • As the clips are extracted based on the time of day, it is important that, should the capture event start late, the start time of recording is noted in the database. This then becomes the timestamp relative to which the clips are extracted.
  • Relevant clips are imported into QuickTime by editors, where the overlapping sections and any unwanted content are removed.
  • The curated clips are then saved to local storage on the editor's workstation.
  • The editors use Kalel to upload the completed clips to the web-accessible storage in Kamloops and create a corresponding news item.

Extract clips from captured file and combine them into a news clip and publish

Office Infrastructure

  • At the end of the day a ShellCommand event runs that terminates the capture process.

Capture server infrastructure

The Events table contains records describing 18 media servers that appear to be active. A media server can capture at most two television channels or six radio channels. Some servers support a mix of TV and radio capture, in which case at most one TV channel can be processed. The table below shows the channel assignments of the various servers:

Server Process Sources Type
media1 CIVT, BNN TV News
media2 CBYK, CFAX, CKNW, CKFU, CBU_2, CJCN Talk Radio
media3 CKPG, CBUT TV News
media4 CFJC, CHAN TV News
media5 CHBC TV News
media6 CIVI, Global TV News
media7 CHECK, CTV TV News
media8 CJRJ, CHNM TV, CKSP, CKYE TV News/Talk Radio
media9 CHNL, CBYG, CBU, CBCV, CKFR, CKWX Talk Radio
media10 CHMB, CBTK, CJVB, CACA, NWTEST Talk Radio
media11X PTC TV News
mediaHD2 CFTV TV News
mediaHD2X ZeeTV TV News
mediaHD3X HDCHAN6 TV News
mediaHDRasp1 CBC Newsworld TV News
mediaHDRasp2 CBUT TV News
mediaHDRasp3 CKPG, APTN TV News

Capture and clip commands

The capture commands are run by Jorel either at 12:10 am or 3:00 am every day to start capturing a day’s worth of programming. The values in square brackets, in the commands below, are substitution variables that are replaced with values from other fields in related database tables or files. Each capture command has a set of associated clip commands that are executed throughout the day. Analog video Capture events may also have a related ccCapture event that manages the extraction of closed-captioned text from the video stream. Some of the data from the Capture, Clip and (for analog video only) ccCapture events are shown below. (ffmpeg is a command-line utility that provides fine-grained video processing capabilities.)

CHMB Talk Radio
Capture ffmpeg -f oss -i /dev/[channel] -t [duration] -acodec mp2 -ab 64k -ar 22050 -vol 550 -y [capture] &> /dev/null
Clip ffmpeg -ss [start] -i [capture] -t [duration] -f mov -acodec alac -ab 64k -ar 22050 -vol 550 -y [clip].mov &> /dev/null
Global TV News
Capture cat /dev/[channel] | tee [capture] | ffmpeg -i - -t [duration] -f mpeg -vcodec mpeg2video -b 1000k -s 360x240 -r 25 -acodec mp2 -ab 128k -y [streamlocal].mpg &> /dev/null
Clip ffmpeg -ss [start] -i [streamlocal].mpg -t [duration] -f mov -vcodec mpeg4 -sameq -vsync 1 -acodec alac -ab 128k -async 22050 -vol 75 -y [clip].mov &> /dev/null
ccCapture ccextractor -i [input].cc -startat [start] -endat [stop] -noru -delay [delay] -out=txt -sc -o [tempfile][ext] &> /dev/null; /jorel/fix_cc.sh [tempfile][ext] [output][ext] &> /dev/null

Clip commands are also run by Jorel on a fixed schedule and extract a portion of the captured video stream from a start time to an end time. The clip values are templates with substitution variables that are populated from the EVENT_CLIPS table. Clips run many times per day and slice the file being captured into usable chunks.

If the CC_CAPTURE column of the Capture event record is set to true, related clip events will be provided in pairs with shared start and end times. One event will extract the av clip, the other will extract the text. The closed-captioned text is extracted from the captured video stream, using the ccextractor command line utility, and stored in the clips directory with a file name that matches that of the clip. The closed-captioned text is not included in files captured from a high-definition digital source.

Here are the EVENT_CLIPS records for CHMB and Global TV News. There's a lot of information in the following two tables, but they give a sense of the granularity of the audio/video clipping process.

CHMB Talk Radio

RSN EVENT_RSN NAME START_TIME STOP_TIME FREQUENCY LAST_RUN
144817037 144817017 0500 04:58 05:20 mtwtfss Nov 17 2021
153158345 144817017 0600 05:58 06:10 mtwtfss Nov 17 2021
144817040 144817017 0600a 05:58 06:20 mtwtfss Nov 17 2021
144817050 144817017 0700a 06:58 07:20 mtwtfss Nov 17 2021
153158369 144817017 0800 07:58 08:30 mtwtfss Nov 17 2021
144817051 144817017 0800a 07:58 08:30 mtwtfss Nov 17 2021
153158400 144817017 0900 08:58 09:10 mtwtfss Nov 17 2021
144817054 144817017 0900a 08:58 09:20 mtwtfss Nov 17 2021
153158406 144817017 1000 09:58 10:10 mtwtfss Nov 17 2021
144817059 144817017 1000a 09:58 10:20 mtwtfss Nov 17 2021
153158351 144817017 Sat 10:55 12:10 mtwtfss Nov 16 2021
144818655 144817017 1100a 10:58 11:20 mtwtfss Nov 17 2021
153158411 144817017 1100 10:58 11:10 mtwtfss Nov 17 2021
425995830 144817017 1200z 11:58 13:01 mtwtfss Nov 16 2021
144817033 144817017 1200a 11:58 12:35 mtwtfss Nov 16 2021
153158415 144817017 1200 11:58 12:10 mtwtfss Nov 16 2021
144822501 144817017 1230 12:28 12:40 mtwtfss Nov 16 2021
144817034 144817017 1300a 12:58 13:20 mtwtfss Nov 16 2021
153158420 144817017 1300 12:58 13:10 mtwtfss Nov 16 2021
144817066 144817017 1400a 13:58 14:20 mtwtfss Nov 16 2021
153158424 144817017 1400 13:58 14:10 mtwtfss Nov 16 2021
153158428 144817017 1500 14:58 15:10 mtwtfss Nov 16 2021
144817069 144817017 1500a 14:58 15:20 mtwtfss Nov 16 2021
144817070 144817017 1600 15:58 16:10 mtwtfss Nov 16 2021
153158432 144817017 1600a 15:58 16:15 mtwtfss Nov 16 2021
153846323 144817017 1600b 15:58 16:20 mtwtfss Nov 16 2021
153846335 144817017 1600c 15:58 16:30 mtwtfss Nov 16 2021
559727729 144817017 For Cindya 16:14 18:29 mtwtf-- Nov 16 2021
494486099 144817017 For Cindy 16:15 18:30 mtwtf-- Nov 16 2021
697558250 144817017 Fri 16:25 18:05 mtwtf-- Nov 16 2021
154119920 144817017 22a 21:58:30 22:02 mtwtfss Nov 16 2021

Global TV News

RSN EVENT_RSN NAME START_TIME STOP_TIME FREQUENCY LAST_RUN
132180052 91572573 0410-0420 04:09 04:21 mtwtfss Nov 17 2021
132180053 91572573 0420-0430 04:19 04:31 mtwtfss Nov 17 2021
132180051 91572573 0430-0440 04:29 04:41 mtwtfss Nov 17 2021
132180050 91572573 0500-0510 05:00 05:10 mtwtfss Nov 17 2021
132180049 91572573 0520-0530 05:20 05:30 mtwtfss Nov 17 2021
681577797 91572573 0540-0550 05:28 05:58 mtwtfss Nov 17 2021
150373783 91572573 0540-0550 05:28 05:58 mtwtfss Nov 17 2021
91586386 91572573 0600-0611 05:58 06:11:00 mtwtfss Nov 17 2021
91586387 91572573 0610-0621 06:10:00 06:21:00 mtwtfss Nov 17 2021
91586388 91572573 0620-0631 06:20:00 06:31:00 mtwtfss Nov 17 2021
91586389 91572573 0630-0641 06:30:00 06:41:00 mtwtfss Nov 17 2021
91586390 91572573 0640-0651 06:40:00 06:51:00 mtwtfss Nov 17 2021
91586391 91572573 0750-0800 07:51 08:01 mtwtfss Nov 17 2021
91586392 91572573 0800-0810 07:59 08:11 mtwtfss Nov 17 2021
133941584 91572573 0810 - 0820 08:09 08:21 mtwtfss Nov 17 2021
133941601 91572573 0820-0830 08:19 08:31 mtwtfss Nov 17 2021
133941605 91572573 0830-0840 08:29 08:41 mtwtfss Nov 17 2021
133941613 91572573 0840-0850 08:39 08:51 mtwtfss Nov 17 2021
133941614 91572573 0850-0900 08:49 09:01 mtwtfss Nov 17 2021
132750404 91572573 0900-0910 08:58 09:11 mtwtfss Nov 17 2021
132871544 91572573 0910-0920 09:09 09:21 mtwtfss Nov 17 2021
132871551 91572573 0920-0930 09:19 09:31 mtwtfss Nov 17 2021
133040921 91572573 0930-0940 09:29 09:41 mtwtfss Nov 17 2021
133040933 91572573 0940-0950 09:39 09:51 mtwtfss Nov 17 2021
103301559 91572573 0950-1001 09:50:00 10:01:00 mtwtfss Nov 17 2021
133415293 91572573 0958-1009 09:58 10:09 mtwtf-- Nov 17 2021
103301560 91572573 1000-1011 10:00:00 10:11:00 mtwtfss Nov 17 2021
156289050 91572573 1000 - 1030 10:08 10:35 mtwtfss Nov 17 2021
103301564 91572573 1010-1021 10:10:00 10:21:00 mtwtfss Nov 17 2021
103301565 91572573 1020-1031 10:20:00 10:31:00 mtwtfss Nov 17 2021
103301579 91572573 1030-1041 10:30:00 10:41:00 mtwtfss Nov 17 2021
103301587 91572573 1040-1051 10:40:00 10:51:00 mtwtfss Nov 17 2021
103301590 91572573 1050-1101 10:50:00 11:01:00 mtwtfss Nov 17 2021
91586416 91572573 1100-1111 10:59 11:11:00 mtwtfss Nov 17 2021
91586417 91572573 1110-1121 11:10:00 11:21:00 mtwtfss Nov 17 2021
91586418 91572573 1120-1131 11:20:00 11:31:00 mtwtfss Nov 17 2021
91586419 91572573 1130-1141 11:30:00 11:41:00 mtwtfss Nov 16 2021
91586420 91572573 1140-1151 11:40:00 11:51:00 mtwtfss Nov 16 2021
91586421 91572573 1150-1201 11:50:00 12:01:00 mtwtfss Nov 16 2021
91586422 91572573 1200-1211 12:00:00 12:11:00 mtwtfss Nov 16 2021
91586423 91572573 1210-1221 12:05:00 12:21:00 mtwtfss Nov 16 2021
91586424 91572573 1220-1231 12:20:00 12:31:00 mtwtfss Nov 16 2021
551554197 91572573 1230-1241 12:30 12:41 mtwtfss Nov 16 2021
91586425 91572573 1240-1251 12:40 12:51 mtwtfss Nov 16 2021
103792937 91572573 1250-1301 12:49 13:01:00 mtwtfss Nov 16 2021
132513009 91572573 1300-1310 12:59 13:11:00 mtwtfss Nov 16 2021
132513026 91572573 1310-1320 13:09 13:21:00 mtwtfss Nov 16 2021
132513027 91572573 1320-1330 13:19 13:31:00 mtwtfss Nov 16 2021
132513036 91572573 1330-1340 13:29 13:41:00 mtwtfss Nov 16 2021
132284596 91572573 1340-1350 13:39 13:51 mtwtfss Nov 16 2021
132284614 91572573 1350-1400 13:49 14:01 mtwtfss Nov 16 2021
341969422 91572573 1355-1407 13:58 14:08 mtwtfss Nov 16 2021
132284684 91572573 1400-1410 13:59 14:11 mtwtfss Nov 16 2021
132284691 91572573 1410-1420 14:09 14:21 mtwtfss Nov 16 2021
132284698 91572573 1420-1430 14:19 14:31 mtwtfss Nov 16 2021
132401338 91572573 1430-1440 14:29 14:41 mtwtfss Nov 16 2021
132401340 91572573 1440-1450 14:39 14:51 mtwtfss Nov 16 2021
132401345 91572573 1450-1500 14:49 15:01 mtwtfss Nov 16 2021
614353095 91572573 Dix/Henry 14:55 15:58 mtwtfs- Nov 16 2021
132401346 91572573 1500 - 1510 14:59 15:11 mtwtfss Nov 16 2021
132401353 91572573 1510 - 1520 15:09 15:21 mtwtfss Nov 16 2021
132401354 91572573 1520 - 1530 15:19 15:31 mtwtfss Nov 16 2021
132401359 91572573 1530 - 1540 15:29 15:41 mtwtfss Nov 16 2021
132401388 91572573 1540 - 1550 15:39 15:51 mtwtfss Nov 16 2021
132401390 91572573 1550 - 1600 15:49 16:05 mtwtfss Nov 16 2021
647483487 91572573 1550 to 1558 15:50:00 15:58:00 mtwtfss Nov 16 2021
91586446 91572573 1600-1611 15:59 16:11:00 mtwtfss Nov 16 2021
91586447 91572573 1610-1621 16:09 16:21:00 mtwtfss Nov 16 2021
91586448 91572573 1620-1631 16:19 16:31:00 mtwtfss Nov 16 2021
91586449 91572573 1630-1641 16:29 16:41:00 mtwtfss Nov 16 2021
91586450 91572573 1640-1651 16:39 16:51:00 mtwtfss Nov 16 2021
91586451 91572573 1650-1701 16:49 17:01:00 mtwtfss Nov 16 2021
182965611 91572573 CHAN1700_Overview 16:50:00 17:20:00 mtwtfss Nov 16 2021
79215130 91572573 1700-1711 16:58 17:12:00 mtwtfss Nov 16 2021
91586453 91572573 1710-1721 17:09:00 17:21:00 mtwtfss Nov 16 2021
91586454 91572573 1720-1731 17:20:00 17:31:00 mtwtfss Nov 16 2021
91586455 91572573 1730-1741 17:30:00 17:41:00 mtwtfss Nov 16 2021
91586456 91572573 1740-1751 17:40:00 17:51:00 mtwtfss Nov 16 2021
91586457 91572573 1750-1801 17:50:00 18:01:00 mtwtfss Nov 16 2021
182965619 91572573 CHAN1800_Overview 17:50:00 18:45:00 mtwtfss Nov 16 2021
91586458 91572573 1800-1811 18:00:00 18:11:00 mtwtfss Nov 16 2021
91586459 91572573 1810-1821 18:10:00 18:21:00 mtwtfss Nov 16 2021
91586460 91572573 1820-1831 18:20:00 18:31:00 mtwtfss Nov 16 2021
91586461 91572573 1830-1841 18:30:00 18:41:00 mtwtfss Nov 16 2021
91586462 91572573 1840-1851 18:40:00 18:51:00 mtwtfss Nov 16 2021
91586463 91572573 1850-1901 18:50:00 19:03:00 mtwtfss Nov 16 2021
133129560 91572573 1900-1910 18:59 19:11 mtwtfss Nov 16 2021
133129564 91572573 1910-1920 19:09 19:21 mtwtfss Nov 16 2021
133129571 91572573 1920-1930 19:19 19:31 mtwtfss Nov 16 2021
133129576 91572573 1930-1940 19:29 19:41 mtwtfss Nov 16 2021
133129579 91572573 1930-1940 19:29 19:41 mtwtfss Nov 16 2021
133129596 91572573 1940-1950 19:39 19:51 mtwtfss Nov 16 2021
133129614 91572573 1950-2000 19:49 20:01 mtwtfss Nov 16 2021
349745706 91572573 temp 19:57:00 20:11:00 mtwtfss Nov 16 2021
133363227 91572573 2000-2020 19:58 20:22 mtwtfss Nov 16 2021
134989085 91572573 TEST 20:10 20:25 mtwtfss Nov 16 2021
133363229 91572573 2020-2040 20:18 20:42 mtwtfss Nov 16 2021
133363230 91572573 2040-2100 20:38 21:02 mtwtfss Nov 16 2021
133363237 91572573 2100-2120 20:58 21:22 mtwtfss Nov 16 2021
133363239 91572573 2120-2140 21:18 21:42 mtwtfss Nov 16 2021
133363247 91572573 2140-2200 21:38 22:02 mtwtfss Nov 16 2021
91586486 91572573 2200-2211 21:58:00 22:11:00 mtwtfss Nov 16 2021
91586487 91572573 2210-2221 22:10:00 22:21:00 mtwtfss Nov 16 2021
91586488 91572573 2220-2231 22:20:00 22:31:00 mtwtfss Nov 16 2021
91586489 91572573 2230-2241 22:30:00 22:41:00 mtwtfss Nov 16 2021
91586490 91572573 2240-2251 22:40:00 22:51:00 mtwtfss Nov 16 2021
91586491 91572573 2250-2301 22:50:00 23:01:00 mtwtfss Nov 16 2021
91586492 91572573 2300-2311 23:00:00 23:11:00 mtwtfss Nov 16 2021
91586493 91572573 2310-2321 23:10:00 23:21:00 mtwtfss Nov 16 2021
91586494 91572573 2320-2331 23:20:00 23:31:00 mtwtfss Nov 16 2021
91586495 91572573 2330-2341 23:30:00 23:41:00 mtwtfss Nov 16 2021
91586496 91572573 2340-2351 23:40:00 23:51:00 mtwtfss Nov 16 2021
134160259 91572573 2400-2415 24:01 24:20 mtwtfss ?

AV Server Room inventory

Video and audio are captured using two different platforms:

  • Lenovo tower systems running Linux (analog).
  • Raspberry PI systems running Linux (digital).

The row of tower systems below left represents the majority of the media boxes listed in the events table. The two Raspberry PI devices are mediaHDRasp3 (right) mediaHDRasp2 (left). (Hovering your mouse over these images provides a description.)

  

The media boxes are arranged in ascending order, by name, from left to right. The image on the left includes media4, media5 and media6. These machines store the captured video on one of the 4 Terrabyte storage devices on the right (local NAS). The other NAS is spare and has been configured for use as a mapped drive for testing from home office locations.

  

Audio and Video signals enter the server room by different routes. The digital radio tuners received their input directly from RJ45/Cat8 Ethernet cables. The Motorolla DSR600 satellite tuners get their input from a satellite dish on the roof.

     

There are two types of radio tuner. The Grace Tuner (left) and the CC Radio Tuner (right). The CC Radio Tuner is newer, smaller, more reliable and more capable. These images show the rear of these devices and their respective I/O interfaces.

  

The Lenovo towers ingest analog audio and video using the expansion card models identified under "Existing office infrastructure" above. Each tower can capture from at most two analog TV feeds using separate expansion cards. This limit is based on previous reliability and longevity research performed by Aktiv Solutions. The ingestion of analog audio puts less stress on the hardware so multiple (currently at most six) audio feeds can be captured by a single tower. The images below illustrate the interface types and arrangements for these two cases. Media10 is shown on the left, media4 is on the right.

  

One of the towers (media8) captures both video and audio concurrently.

Digital video is captured using the Raspberry PI devices. As these share the form factor of a credit card there is no room internally for expansion hardware. For this reason separate HDMI capture cards are required for each channel (below left). An interstitial device (splitter) is also required between the HDMI source and the capture card (below right).

  

Prior to adopting the Raspberry PI as the digital video capture device, three additional Lenovo towers were purchased and configured for this purpose. These have more CPU and more memory than the other media boxes, but they were never able to reliably accomplish the task. They couldn't keep up with the video capture process for an entire 24 hour period. These machines are currently being used for purposes other than video ingestion, including Jorel2 testing.

The Raspberry PI devices proved to be much more capable at capturing digital video, likely because they store the video stream on their local solid-state drives. The three HD capture servers (unused) are shown below.

When a combination of hardware that adequately performs a task is identified, TNO will purchase multiple instances of those devices for future use. This mitigates the situation where newer models become incompatible with existing hardware over time and ensures that a new identical device can be swapped in if there is a failure. Two types of surplus hardware are shown below (Raspberry PIs and an HD video splitter).

  

The server room contains two banks of Mac Mini computers, some of which are over ten years old. These perform various tasks, including producing a time-based video editor usage report, receiving newspaper front page images, receiving newspaper content, running Azure scripts and executing a Jorel2 instance that processes the following production events:

PL/SQL RSS LDAP Alert HTML Pagewatcher Syndication

The two banks of OS/X boxes are shown below:

  

A/V capture issues and pain-points

Hardware compatibility and reliability

Video capture is hard on mechanical storage devices and creates a lot of heat in the motherboards of the capturing machines. Historically this has caused the failure of both hard drives and motherboards. This situation was mitigated by installing a powerful HVAC device in the server room, but the capturing machines are still vulnerable to failure. The risk of failure also increases as the hardware ages.

Within each video capture box there is a tight set of version/model dependencies linking the capture card, the motherboard and their associated drivers. A hard drive failure might be simple to address, but when a failure occurs in a motherboard or capture card it is very difficult to find contemporary models of either that work well together. This is the main reason why Scott is eager to replace these aging systems.

Closed captioning

The media boxes that capture standard-definition analog video have a distinct advantage over the digital capture devices. When ffmpeg saves an analog video as an mpeg2 file, the closed-captioned text, for the stream, is embedded in the file. The text for each clip is extracted from the underlying stream, when the clip is created, using the ccextractor command line utility.

The extracted text is a reliable representation of the words spoken during the clip and is superior in many ways to the speech-to-text functionality offered by Azure. As news items for TNO reports and alerts are selected based on their textual content, it is important that the text stored in the database matches the spoken word content of a TV or Radio show as closely as possible. The closed captioning approach has two main advantages over Azure's speech-to-text solution:

  • Accuracy
  • Speed

Closed-captioned text is available when the clip is created, whereas text extraction using Azure requires the following steps:

  • The clip is uploaded to the Azure server.
  • Speech to text processing is performed on the clip in real-time, so a five-minute clip will take five minutes to process.
  • The extracted text is then downloaded from Azure to the NAS.
  • After receipt of the text, it must be checked for content and formatting before being added as a news item/transcript. While this step also applies to closed-captioned text, the number and scope of the edits is greater in this case.

Consumer grade infrastructure

The audio/video capture infrastructure operates using the same devices as those available in the consumer space. This approach requires a separate tuner/decoder for each channel being captured. Every device requires an input (coaxial or ethernet cable) and provides an output (analog audio or analog/digital video). This strategy results in the proliferation of cables and their associated front-end and back-end devices. For example, a single video channel requires:

A separate DSR600 device is required for each TV channel because that's the only method, provided by Shaw, by which a single channel can be extracted from the satellite signal. An ideal solution would move the channel tuning requirement into a single device that also ran the video capture process. While this would expose a single point of failure, a duplex failover solution is likely achievable.

The proliferation of tuners/decoders introduces an increased administrative overhead, most notably each device must be tuned to the correct channel. Most modern A/V devices can only be fully programmed using an infrared remote control. Some manufacturers segment the frequency ranges of their remote controls so that there is no overlap between devices while others do not. In the worst case, a single remote control could affect all the devices from the same manufacturer. Without blocking the IR detectors on devices other than the one you want to configure there's a danger that more than one device will respond to IR commands. This situation could be avoided if a single device could capture multiple channels and maintain the tuning configuration in persistent storage.

Power failure

If the server room power supply is interrupted, much of the TNO functionality will cease to run. The website will continue to operate, as it is located in a data centre in Kamloops, but many of its data ingestion streams will cease to function. When the power supply resumes, every device in the room must reboot and begin running the same task(s) it was undertaking prior to the interruption. The list below outlines some of the issues that might arise during this recovery process:

  • Tuners/Decoders may revert to their default channel settings and require the manual selection of the correct channel.
  • Servers may select a plug-and-play configuration that is different from the one they were using prior to the reboot. This may render their capture cards inaccessible to their respective operating systems.