Efficient instancing in a streaming scenario

Aug. 1, 2018
protect

The problem

You’re building a game-world that is big, so big in a fact that not all of it can be loaded into memory at once. You also don’t want to introduce portals or level loading. You want the player to have an uninterrupted experience.

A huge world

For true continuous streaming, a typical scenario would be something like this:

 

  • The world is partitioned into tiles (Quad-tree)

  • When the Camera moves, tile-data is read from disk and pre-processed in the back-ground.

  • We need to render meshes for each tile.

  • There can be more than 1000 tiles in the AOI, more than 100 different meshes and up to 10000 instances per mesh on one tile.

Is it possible to improve from 1000000000 draw calls to one draw call?
 

Introduction

To focus on the render-data preparation specifically, I assume the reader is familiar with the following concepts:

  • Instanced mesh rendering

  • Compute shaders

  • AOI (Area Of Interest)

  • Quad-tree tile-based space partitioning

For an introduction I recommend this BLOG entry on our website.

I will use OpenGL to demonstrate details because we use it ourselves and because it is the platform independent alternative. The technique however can be adapted for any modern graphics API that supports compute shaders.
 

The solution

The solution is to do the work on the GPU. This is the type of processing a GPU is particularly good at.
 

The diagrams below show memory layout.
Each colour represents a different type of instance data, stored non-interleaved.
For example, position, texture-array layer-index or mesh scale-factor etc.
Within each instance-data-type (colour) range, a sub-range (grey) will be used
for storing data for instances of a particular mesh.

In this example there are 4
different meshes that can be instanced. Within the sub-range, there is room to
store instance-data for “budget” amount of instances. After loop-stage step 4, we
know exactly where to store instance data of each type (pos, tex-index, scale,
etc.) for a particular mesh-type. In this example, the scene contains no mesh-
type 2 instances but many mesh-type 3 instances.

Prepare once at start-up

  • Load all mesh data of the models you want to be able to show in one buffer.

  • Prepare GL state by creating a Vertex Array Object containing all bindings.Data-ranges

  • Create a command-buffer containing Indirect-Structures, one structure for each mesh that you want to be able to render.

    Command-buffer

  • Fill the Indirect-Structure members that point to (non-instance) mesh vertex data.

Steps for one new tile entering the AOI

  1. Read geometry from disk

  2. Rasterize geometry into a material-map

  3. Generate instance-points covering the tile. Select a grid-density and randomise points inside their grid-cell to make it look natural if you’re doing procedural instancing. Whole papers have been written about this topic alone.

  4. Sample from the material-map at the grid-point to cull points and decorate data. Store the result in a buffer per tile.

  5. Keep the result-buffer of a tile for as long as it is in the AOI

Step 1, 2, 3 and 4 may well be replaced by simply loading points from disk if they are pre-calculated offline. In our case we cover the entire planet, so we need to store land-use data in vector form and convert it into raster data online, to keep the install size manageable.

Steps for each loop

This is where things get interesting.

  1. Do frustum and other culling of the tiles so you know what tiles are visible and contain meshes that need rendering.

  2. Clear instance-count and base-instance fields of indirect-structures in the command-buffer. Run a simple compute shader for this. If you would map the buffer or use glBufferData to allow access from the CPU, you introduce an expensive upload and synchronisation which we want to prevent.

    Instance-ranges
     

  3. Run a compute shader over the tile-set in view to determine which meshes to render. Just count instances per mesh in the instance-count member of the Indirect_structure.
    This may require sampling from the material map again or doing other calculations to pick a mesh LOD or reflect game-state. It may very well require procedural math to “randomly” spawn meshes. This all depends on your particular game requirements.

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Read More>>