July 2018, Markus Konrad (post at mkonrad dot net)
A synthesized music video of the piece "Fortschritt" from kiriloff programmed in Python. It uses aubio for onset detection in the audio signal. Input video clips are alienated by drawing Voronoi diagrams derived from a sample of feature points from a binarized frame of the input clip. Video rendering is done with MoviePy and synthetic frame generation uses Gizeh and cairocffi. See "Further explanation" section.
See the final result at YouTube.
Please note: In case you want to clone this repository, you need to install the git extension for large files, git-lfs. This is because of the large video files included in the repository.
Python:
See requirements.txt
. Can be installed via pip with pip install -r requirements.txt
.
System:
There are three Python script files that can be run from command-line, and each has a specific purpose. Required script arguments are denoted as <arg>
and optional arguments as [arg]
:
onsets.py <audio input file> <onsets output pickle file> [plot output file] [number of seconds to read]
: Find onsets in audio and save them as pickle file.video_preproc.py [lowres]
: Video preprocessing script (resizing, cropping, FPS adjustment). Takes raw videos fromvideo/raw
and preprocesses them according to theSCENES
definition fromconf.py
. Optionally passlowres
to produce low resolution clips.makevideo.py [clip duration]
: Main video rendering script. Takes theSCENES
definition fromconf.py
and the preprocessed video files generated byvideo_preproc.py
and renders the full video. Optionally passclip duration
to render only this duration of the video.
The video is rendered according to scene definitions that are given in conf.py
. Each scene is defined by a beginning and end time, the input clip, as well as several rendering options.
The input clips are taken from the video
folder. The video_preproc.py
script generates the files in this folder according to the scene definitions using the raw video files from video/raw
. These files were not added to the git repository (there're too big).
The synthetic frames are generated using the following (very simplified) pipeline:
We have an original input clip frame C at a certain time t:
Additionally, we have the onset amplitude O at t. The onset amplitude is the "strength" of a detected note at this time. For example, you can see in this image the detected onsets as red bars and the amplitude as green line. Both are combined by finding the maximum amplitude between to onsets to get the onset amplitude.
C is first blurred with a kernel size 5 in order to reduce noise and then binarized using Otsu's method. Both is done using OpenCV. We get the following picture:
From the binarized image, we generate features F (the coordinates in frame image space where the binarized image is white), i.e. we get a list [(x1, y1), (x2, y2), ... (xn, yn)]
where x and y are coordinates of the white pixels in the above image. For example in the above image, we get almost 500,000 features, i.e. there are almost 500,000 white pixels in the above image.
Sample from F according to O (i.e. the higher the onset amplitude the more features get sampled and the more Voronoi cells will be drawn) to get S. See the following picture where 6000 random points were sampled from the almost 500,000 features of the binarized image:
(Note that the sampled white pixels are hardly visible because they are very small as the resolution of the frame is quite high)
From feature samples S the Voronoi regions are calculated using SciPy's Voronoi class. The raw SciPy plot looks like this:
The lines that make up the borders between the Voronoi regions are calculated to get to a Voronoi diagram restricted to frame image space. These lines are drawn according to scene definitions either with a solid color or with a color gradient using start and stop colors from the pixels of both end points of the line in the input clip. This for example draws the Voronoi cells on a white background and uses the color gradient effect:
Note that the actual rendering pipeline is little more complex as it also draws Voronoi diagrams of previous frames with decreasing transparency over time to get a smooth "changing spider webs" effect.