The source-code for our Endless-CL-Simulator is part of the following publication:
A Procedural World Generation Framework for Systematic Evaluation of Continual Learning, Timm Hess, Martin Mundt, Iuliia Pliushch, Visvanathan Ramesh, https://arxiv.org/abs/2106.02585
Please cite our work if you make use of this repository.
In this repository further insights into the technical realization of our simulator in UnrealEngine4 are provided, giving access to the source-code and an overview of the individual components and their interrelations. As such, prerequisites for 3D-assets, preparation of environmental lighting and weather components, the camera setup, and the material setup, are detailed in the following sections below. The simulator’s main-loop is unfolded with a detailed descriptions of its subroutines, including the essential aspect of capturing the images from the engine’s rendering pipeline at real-time, and the management of the procedural process.
The code in this repository is the source for the stand-alone executable we provided here. The source-code itself is not a fully functional project. Please refer to the executable for this purpose. The rationale is that essential assets needed to be removed to adhere to third-party licenses, which do not allow us to re-distribute these assets in their source form. In the executable simulator these assets have been included in an encrypted version, that is usable, but cannot be extracted for use beyond our simulator. However, we do include a list of asset and tool sources.
Nevertheless, we have decided to lay open the core mechanisms of our simulator to allow extension to the code base. We bevieve this to be beneficial as a basis for extension, given a reasonable amount of involved work and prior experience with UnrealEngine 4.
Working with these project files requires WindowsOS 10 and Unreal Engine 4.26 with VisualStudio to compile C++ projects.
- Preparation of the Simulator (3D Assets)
- Camera and Rendering-Modes
- Lighting and Weather
- Materials for Render Decomposition
- Interface to the Generative-Model (Manager-Units)
- Main Loop
- Tile Management
- Capturing Images to Disk
The simulator's 3D assets make up the building blocks from which its virtual world is constructed. These can be static 3D objects, also with additional logic units attached, e.g. volumes, triggers, or dynamic actors, which provide the means for animation or interactions of physics-based driving models.
- Modular Building Set
- Urban Material Pack
- MoCap
- Weather Tool
- Car Materials
- TreeIt
- MakeHuman
- TextureRepository
- AssetRepository
In the following, the 3D assets used in the current installment of the simulator are presented, including details for their specific preparation.
The street segments, with their attached pavements (sidewalks) and terrain, which will be addressed by Tile
when mentioned as a single construct, make up the basis of the simulators virtual world. In the image above the set of currently available tiles is depicted. They do not only provide physical planes, but are further involved in the sampling processes, by defining bounds for the placement and movement of object and actors. The deconstruction of a straight street tile into its three parts: street, sidewalk, and terrain, is illustrated in the next schematic, showing its additional components and markers. The decomposition into three sub-parts instead of a single object provides modularity to the scene composition, but also to the simulator's logic. It allows to interchange sidewalks or attach them to different street segments, and eases at-runtime-management of object references, which will be detailed later in the tile management section.
The center street segment is provided with an ArrowMarker
, a TriggerVolume
, and a SplineComponent
. The ArrowMarker
defines spawning position and orientation. The ones positioned left and right of the street are used to automatically place the sidewalks, the one on the top-side of the segment marks the location for the succeeding tile. The TriggerVolume
is placed towards the end of the street segment and prompts the next step of the procedural world generation process when collision with the main actor is detected. It is currently designed for cars, but could easily be adapted to fit other vehicles, such as airborne-drones. The SplineComponent
indicates the curvature and length of the segment. It is used as a guide for the movement of actors, but also defines the spawn locations of vehicle actors. These are currently defined in code, but for visualization purposes, green volumes have been added to the figure.
The sidewalk is provided with a single ArrowMarker
and a SplineComponent
, again the green boxes are merely for visualization purposes. The ArrowMarker
gives the orientation of the sidewalk, which is important for its placement, as the same object is used for both, the left and the right, sidewalk. The SplineComponent
has the exact same purpose as for the street-segment, giving information on the sidewalk's curvature and is used for spawning objects, indicated by the green box, for pedestrians, the red box, for trees, and the black box, for streetlamps.
The terrain is also accompanied with a number of ArrowMarkers
, giving spawn positions for buildings (again indicated by green boxes, but actually defined through code) and another ArrowMarker
, which indicates the orientation of the terrain object itself for its placement in the world.
This setup is used in the same fashion for all other tile types with the exception of crossings, which additionally define a rudimentary traffic-system to avoid crashes of vehicles crossing the street towards their lane. As vehicle actors react to obstacles in front of them, as will be discussed in more detail later, road blocking boxes are added, which are later rendered invisible. By always removing only one of these boxes at a time, in turn placing the other two, the two other lanes are blocked and vehicles will wait until the crossing is cleared, i.e. their block is removed. The switching of blocks is currently tied to a simple time delay, triggering a switch every 5 seconds.
Static Objects: Completely static objects for this simulator are currently composed of three categories: buildings, trees, and streetlamps.
The buildings have been designed by hand, using the Modular-Building-Set provided by EpicGames. Trees have been designed using the creative-common licensed TreeIt tree creator. And the streetlamp is taken from a free online repository. As these objects have no functionality apart from their visuals, they do not need further components to be attached to them, but are simply imported into the engine, converting them to the Unreal-asset-type for the rendering engine.
Dynamic Actors:
For dynamic actors to function in UnrealEngine, special preparation is required. To be able to perform animations, a so called Skeleton
(or Rig
) needs to be provided with the actor's 3D object.
The images show an exemplary depiction of the dynamic actor's skeletons, providing the means for animation and driving physics. The structures overlayed in white are skeletons for humans and cars, as currently used in the simulator. For the car actor also its physical bounds for main body and tires are hinted at by the pink boxes and spheres.
There are currently two types of actors implemented in the simulator: humans (pedestrians), and cars, as common in an urban environment. The human models where created using the MakeHuman software tool, where a ranges of skin color and gender have been considered. The software is able to directly add Skeleton
information to its generated 3D models, that is compatible with UnrealEngine. All walking and standing animations stem from motion-captures, provided by EpicGames. The car models have been collected from multiple free online repositories, although their free versions typically do not include Skeletons
compatible with UnrealEngine. Thus, skeletons where created by hand using Blender. In the case of cars, the Skeleton
provides the basis for the physics based driving model.
In addition to their animations, actors require some kind of behavior to make them move and interact in the scene. Where in principle these can be arbitrarily complex, actors' behavior in this version are rather rudimentary. Essentially, we have implemented bounded random walks.
Human movement is currently constrained by the tile segment they where spawned on. After being placed, they uniformly sample a random point according to their own spawn bounds, which they then approach, respecting the curvatures of their segment to not accidentally cross the street. For the movement a random animation is selected from the pool of available walking animations. Once the desired position is reached the process is repeated.
Car movement is tied to the street's lanes, where at crossings turning choices are drawn at random, with default equal probability weight for all choices. The cars' driving physics model is interfaced by a steering angle and engine throttle (gas pedal). For steering, the car probes the curvature of the lane ahead, finding the point on the lane at a certain distance (
SteeringThrottle = abs(SteeringAngle) * MaxThrottleDampening
The steeper the steering the more the throttle (gas pedal) is reduced, in order to keep the car from being carried out of the curve. Obstacles are detected at a certain distance in front of the car using a ray-like probe, setting the throttle's value according to distance of the obstacle, following a predefined curve which also includes negative throttle, i.e. decelleration by applying a brake. Finally, to prevent unrealistic acceleration to maximum engine power output, each car has a maximum speed assigned and drastically reduces its throttle when approaching this speed.
The camera observing the scene is the standard camera model built into UnrealEngine, with a shutter speed
of 1.0
, ISO
of 100.0
and aperture
of 1.0
, where auto-exposure
settings are deactivated. The field of view
is set to 90°
, and the bloom
post-process effect, which is enabled by default, is turned off. This setup provides the color viewport of the simulation, however, currently a detour is needed to actually gain access to the GPU's pixel-buffers.
This detour involves ScreenCaptureComponents
, which are basically cameras not writing their pixel buffers to a viewport, but to a texture
object, which can be accessed from the CPU rather than the GPU. Thus, for each rendering mode, an additional ScreenCaptureComponent
is spawned, which copies the main camera model's settings. The rendered scene color information can now be accessed. For semantic pixel annotation, normals, and depth, a second step using PostProcessMaterials
is required. Post processing materials were primarily intended to allow for extra effects on the rendered image that involve shader-computation. However, this grants them access to the GPU buffers of normals, depth, and custom-depth
, which we can use for semantic annotation of objects. By simply defining a PostProcessMaterial
for each rendering mode, transferring the respective buffer's information to the final camera output, essentially overwriting the color-image information with the intended buffer, this information is written to the texture and accessible by CPU. The custom-depth
is a feature, intended to mask specific objects by organizing them in custom depth layers, with values ranging from 0 to 255. For the purpose of scene segmentation, a custom depth identifier is assigned from a respective predefined lookup-table to each spawned object.
Distinct identifiers can be assigned for all object instances individually or per object category.
The virtual world's lighting is captured by two light sources, a DirectionalLight
, and a SkyLight
. Both are built-in UnrealEngine components. The DirectionalLight
simulates a light-source that is infinitely far away, emitting light according to a single vector into the scene, i.e. providing parallel shadows for all objects. It will be used to represent the virtual world's sun. Although one would assume a single sun component to be sufficient, UnrealEngine's physically based lighting model is restricted to the direct interaction of light and object surfaces, which do not comprise higher order light bounces. The latter would be referred to as global-illumination, which is currently a feature of ray-tracing render engines and requires enormous compute depending on the respective scene's complexity. Thus, for Unreal's real-time render an approximation is introduces, namely the SkyLight
, which introduces additional diffuse light into the scene. It takes the currently active world into account, such as object colors and ambient occlusion information, aiming for a more realistic approximation of light bounces than a trivial ambient light, which simply adds a certain amount of uniformly colored diffuse light into the scene.
To control the position of the sun, i.e. the rotation of the DirectionalLight
and its shift in color, mimicking atmospheric scattering effects, UnrealEngine provides a built in component that calculates the DirectionalLight's
direction based on a given latitude, longitude and daytime in hours, minutes, and seconds. The light's color tint stems from a respective built-in color lookup-table. In the current version of the simulator, night scenarios are not supported.
The simulator's weather system is made up of three components, the SkySphere
and ExponentialHeightFog
, both part of UnrealEngine, and the WeatherTool
, which is a commercial third-party component. The SkySphere
emulates the virtual world's sky and clouds, which function mostly as a backdrop to the scene. For example, the clouds are not volumetric, e.g. do not cast shadows or block light. However, the SkyLight's
color reacts to increased cloud coverage. To recreate the looks of an overcast day, the DirectinalLights
intensity should additionally be decreased, reducing object shadow intensities, in order to obtain a more diffusely lit scene.
The amount of fog in the scene can be controlled by the ExponentialHeightFog
. A number of parameters are provided, with the most important ones for this simulator being the start distance, which is set to
Finally, the weather phenomena effects of rain and snow, including interaction with the world, e.g. puddles, snow coverage, and artistic lense effects, such as droplets, were designed by a third-party and adopted into the simulator. The WeatherTool
provides controls for all effects, such as the density of the rain- and snowfall, or the strength of interaction with the environment. Again, the specific values for the weather effects were empirically set to match the expected looks of strong rain and snowfall. For both cases, the brightness of the rain and snow particles is set to 0.2, making them decently transparent. The density of the downfall is set to 500, which refers to the maximum number of present particles at the same time. For rain, the strength of rain impact effects, referring to splashes upon hitting the ground, is set to 50 for slight visiblility. The material adjustment strength, i.e. puddles, is set to 2.0 and rather large. For snow, the material adjustment is set to 0.6, in order to have a thin layer of snow on the ground. One has to note that for the adjustments of the materials, the respective material setups need to include a function that is provided by the WeatherTool
, as otherwise puddles and snow layers cannot be rendered.
The goal of the render decomposition is the exclusion of certain aspects of the rendered object surfaces from the generative model, i.e. removing the feature of color or surface reflectance.
Most (real-time) physically based renderers (PBR) represent the surfaces of objects by at least 4 texture maps. Namely, base-color, roughness, metallicity, and normal (or cavity) maps are used to calculate the light-surface interaction, i.e. resulting light color, reflection and refraction effects. The base-color texture provides the color patterns of the surface, the roughness and metallicity maps are providing the corresponding surface properties, i.e. influencing the reflectance of light in given areas, and lastly the normal-map, or cavity-map, defines additional, typically fine-grained, surface-normals which would require too many vertices and thus memory and compute, to be included directly in the 3D structure of the respective surface. An illustration of possible decompositions is shown in this image:
In the top row, three examples of distinct render decomposition settings are illustrated. The top left image shows a frame rendered with all aspects of the PBR-material active. In the center image, merely the base-color is still activated. And for the rightmost image all PBR material aspects are removed, leaving a colorless world without additional surface normals, metallicity, on all-rough surfaces. The second row, additionally illustrates the fine difference between active and inactive surface normals.
In UnrealEngine, combinations of these maps are represented in so called Materials.
As one cannot, without changing the fundamental shader-definitions of UnrealEngine, alter the Materials underlying shading function to exclude subsets of these maps from the light-surface interaction computation, the work-around for this simulator is to define default values which can be globally applied to all Materials to unify the respective features for all objects. As a result, for example, all object color-textures can be set to a gray value, excluding color features, apart from shadows, from the scene. Image examples for the decomposition can be seen in this image:
Further, to the best of our knowledge, one cannot currently adjust Material definitions after starting the simulation. Thus, the switches controlling the defaulting of the textures need to be included into all Materials upfront. Thereafter, these can be addressed and interacted with at runtime to manipulate the Material appearance. All switches follow the same underlying scheme as shown in the above image at the example of surface normals. An if statement can be included in the Material's definition, which compares a parameter, in this case a previously defined UseNormals flag, against a UseNormals = 1
, the A > B = \texttt{UseNormals} > 0
, comparison output is used, otherwise the other outputs are used. Both are linked to a zero normal, i.e. uniformly facing upwards in z-direction, which yields no additional normal information to the 3D surface normals.
A simplified example for a full Material definition can be seen in this image:
The interface to the generative model that is underlying the parameterized scene generation process is realized by a number of Actor
components that are called Managers
. In the following each Manager
is briefly outlined, highlighting their function in the simulator and relationships to other Manager
units.
All random sampling is centralized in a single unit, which is the RandomnessManager
. The benefit of this structure is that we get a single random engine with a single random seed as the foundation for all sampling processes, which enables re-running of a specific configuration deterministically. It currently provides uniform and normal distributions. However, it makes use of the std::default_random_engine
which in principle can be used for various, more complex, random processes.
This unit is called by all other units whenever a random sample is needed.
Loading of sequence configuration files and propagating parameter settings is centralized in the SequenceManager
.
This unit is working closely with the ComplexityManager
.
Handling the individual sub-sequences' settings, and propagating parameter updates to all other Managers
, is handled by the ComplexityManager
Unfortunately, its name is somewhat misleading, reaching back to early versions of the simulator and should perhaps be changed to SubSequenceManager
in an updated version.
It is in tight bound with the TileManager
tracking the number of tiles that have been sampled for a currently active sub-sequence and introduces parameter updates to TileManager
and EnvironmentManager
, when the desired number of sampled tiles has been reached.
The root of all scene geometry, i.e. 3D objects, lies in the TileManager
. It holds, adds, and destroys, the tiles that are spawned in the scene. Because this process is at the core of the simulation, a more detailed overview of this process is given in the tile management section.
Its updates are initially triggered either by the SequenceManager
, upon starting a new sequence to lay out the initial set of tiles, and by the tiles themselves, having their TriggerVolumes
call back to renew the oldest tile and continue the street track.
All updates to the environment are handled by the EnvironmentManager
. It has access to all lighting, post-processing, weather, and atmospheric components in the scene.
Capturing images to disk is governed by the SegmentationManager
. Depending on the simulator's configuration, it spawns SceneCapturingComponents
for each render mode to be recorded and triggers their image capturing processes. It is also responsible for the relative time of the simulator, globally applying time dilation according to the desired frame rates. For further details on the capturing process itself, please see the capture images to disk section.
This unit is mostly autonomous and not reliant on other Managers
.
On the top-level, the simulation process's main loop merely consists of three parts:
- moving the dynamic actors in the current scene
- evolving the procedural virtual world
- capturing frames.
All related processes, or at least their preceeding checks, are run once every engine tick
, i.e. update of the rendering engine's internal state, which is also called a frame
.
Prior to entering the main loop, the virtual world needs to be initialized. Therefore, the first set of parameters for the generative model is loaded, a number N_S
of tiles is spawned, providing the initial street track, and the main actor is placed on its lane.
The movement of actors is continuous in time and according to their respective behaviors, as previously described in the dynamic_actors section, being updated every frame. Likewise, the capturing of frames is controlled by time, triggering the capturing process if a time delta to the previously captured frame exceeds the desired time between two frames. Further details on the frame capturing are provided below in a section on capturing images to disk.
The virtual world's procedural process is tied to the movement of the main actor. Upon collision with a street segment's TriggerVolume
, an internal counter, keeping track of the number of tiles passed by the main actor, is increased. Based on the updated counter, it is first checked whether the video generation is complete and shall be terminated based on a comparison to the desired number of tiles to be passed for this video. If this is not the case, it is further checked whether a new set of parameters for the generative model needs to be loaded, providing a new setting for the virtual world. Thereafter, the procedural process is launched. The oldest tile and its respective objects and actors are deleted from the active scene. Subsequently a new tile is placed to continue the street track, where new objects and actors are sampled. Further details on the handling of tiles and their actors are provided in the section on tile management.
The spawning of tiles and populating them with newly sampled sets of objects and actors, as well as the destruction of tiles, removing themselves, but also all their respective objects and actors from the virtual world, is the core mechanism of the procedural world generation process. In the simulator all requests regarding tiles are centrally managed by the TileManager
unit. It holds a ring buffer of size equal to the number of tiles to be present at the same time, with a pointer indicating the oldest tile. The ring buffer's entries are pointers to the current tile objects being placed in the world. Upon triggering the world generation process, the TileManager
calls the destructor of the oldest tile and subsequently spawns a new tile object in the world, overwrites the oldest tile's pointer with the new tile's address and sets the pointer for the oldest tile to the next buffer entry.
The initial population of a tile, as also its destruction, is a hierarchical process, organized in a tree-like structure, such that each tile handles all its objects and actors itself. When initially spawning a new tile, only the center street-segment is placed, giving the root to this tile's dependency tree. The center street segment sets both its sidewalk objects, and samples and places the vehicle actors. In the same fashion, both sidewalk objects set their adjacent terrain, and sample objects and actors to be placed. The same is repeated for the objects sampled for the terrain. This way each tile segment holds the references to its own objects, which eases modularity in the code base. For destruction of a tile, all destructors recursively call their child destructors.
A special case are vehicle actors, as these are able to switch tiles, and should not be removed from the world just because their initial spawning tile has been removed. Therefore, these actors are not listed as child components, but detect whether they are placed on a street-tile themselves. They remove themselves if this is not the case, e.g. because their tile was removed or because they drove of the edge of the currently active world outside the visible world portion.
Being a real-time rendering engine originally intended for the gaming industry, there is limited support for using the rendered image data, other than displaying it in a given viewport, as this is more of a corner case in the industry. There have been solutions proposed for this issue, such as using automated screenshots, or other built-in image saving methods. However, these heavily diminish the engine’s computation performance to ScreenCaptureComponent
, rendering camera outputs to 2D textures, are used to access image data without a viewport. To make the data accessible it needs to be copied from GPU to CPU.
As the GPU's rendering thread is asynchronous, previous implementations used rendering pipeline intercepts (flushes), as an indicator for the CPU to know when the data has been fully written (to prevent reading uninitialized memory).
Circumventing these flushes, they need to be replaced by an alternative indicator. For this, flags, provided by UnrealEngine, have been utilized, which are injected into the GPU's rendering-pipeline, toggling to `true' once passing the full pipeline.
This has been adapted in the saving method of the engine so that the data is accessible by the CPU for further processing, without interceptions. To write the data to disk, which is another time intensive operation, that depends on the image size and the hardware, asynchronous CPU-threads are spawned to write the data without blocking the main game loop, which would have had effects similar to blocking the GPU thread.
For an more detailed explanation on this specific feature, please see our UnrealImageCapture repository where a step-by-step rundown of the procedure is available.
This project is licenced under the MIT Licence - see the LICENSE.md file for details.