Tag Archives: OpenGL ES

Game Performance: Geometry Instancing

Posted by Shanee Nishry, Games Developer Advocate

Imagine a beautiful virtual forest with countless trees, plants and vegetation, or a stadium with countless people in the crowd cheering. If you are heroic you might like the idea of an epic battle between armies.

Rendering a lot of meshes is desired to create a beautiful scene like a forest, a cheering crowd or an army, but doing so is quite costly and reduces the frame rate. Fortunately this is possible using a simple technique called Geometry Instancing.

Geometry instancing can be used in 2D games for rendering a large number of sprites, or in 3D for things like particles, characters and environment.

The NDK code sample More Teapots demoing the content of this article can be found with the ndk inside the samples folder and in the git repository.

Support and Extensions

Geometry instancing is available from OpenGL ES 3.0 and to OpenGL 2.0 devices which support the GL_NV_draw_instanced or GL_EXT_draw_instanced extensions. More information on how to using the extensions is shown in the More Teapots demo.

Overview

Submitting draw calls causes OpenGL to queue commands to be sent to the GPU, this has an expensive overhead which may affect performance. This overhead grows when changing states such as alpha blending function, active shader, textures and buffers.

Geometry Instancing is a technique that combines multiple draws of the same mesh into a single draw call, resulting in reduced overhead and potentially increased performance. This works even when different transformations are required.

The algorithm

To explain how Geometry Instancing works let’s quickly overview traditional drawing.

Traditional Drawing

To a draw a mesh you’d usually prepare a vertex buffer and an index buffer, bind your shader and buffers, set your uniforms such as a World View Projection matrix and make a draw call.

To draw multiple instances using the same mesh you set new uniform values for the transformations and other data and call draw again. This is repeated for every instance.

Drawing with Geometry Instancing

Geometry Instancing reduces CPU overhead by reducing the sequence described above into a single buffer and draw call.

It works by using an additional buffer which contains custom per-instance data needed by your shader, such as transformations, color, light data.

The first change to your workflow is to create the additional buffer on initialization stage.

To put it into code let’s define an example per-instance data that includes a world view projection matrix and a color:

C++

struct PerInstanceData
{
 Mat4x4 WorldViewProj;
 Vector4 Color;
};

You also need to the structure to your shader. The easiest way is by creating a Uniform Block with an array:

GLSL

#define MAX_INSTANCES 512

layout(std140) uniform PerInstanceData {
    struct
    {
        mat4      uMVP;
        vec4      uColor;
    } Data[ MAX_INSTANCES ];
};

Note that uniform blocks have limited sizes. You can find the maximum number of bytes you can use by querying for GL_MAX_UNIFORM_BLOCK_SIZE using glGetIntegerv.

Example:

GLint max_block_size = 0;
glGetIntegerv( GL_MAX_UNIFORM_BLOCK_SIZE, &max_block_size );

Bind the uniform block on the CPU in your program’s initialization stage:

C++

#define MAX_INSTANCES 512
#define BINDING_POINT 1
GLuint shaderProgram; // Compiled shader program

// Bind Uniform Block
GLuint blockIndex = glGetUniformBlockIndex( shaderProgram, "PerInstanceData" );
glUniformBlockBinding( shaderProgram, blockIndex, BINDING_POINT );

And create a corresponding uniform buffer object:

C++

// Create Instance Buffer
GLuint instanceBuffer;

glGenBuffers( 1, &instanceBuffer );
glBindBuffer( GL_UNIFORM_BUFFER, instanceBuffer );
glBindBufferBase( GL_UNIFORM_BUFFER, BINDING_POINT, instanceBuffer );

// Initialize buffer size
glBufferData( GL_UNIFORM_BUFFER, MAX_INSTANCES * sizeof( PerInstanceData ), NULL, GL_DYNAMIC_DRAW );

The next step is to update the instance data every frame to reflect changes to the visible objects you are going to draw. Once you have your new instance buffer you can draw everything with a single call to glDrawElementsInstanced.

You update the instance buffer using glMapBufferRange. This function locks the buffer and retrieves a pointer to the byte data allowing you to copy your per-instance data. Unlock your buffer using glUnmapBuffer when you are done.

Here is a simple example for updating the instance data:

const int NUM_SCENE_OBJECTS = …; // number of objects visible in your scene which share the same mesh

// Bind the buffer
glBindBuffer( GL_UNIFORM_BUFFER, instanceBuffer );

// Retrieve pointer to map the data
PerInstanceData* pBuffer = (PerInstanceData*) glMapBufferRange( GL_UNIFORM_BUFFER, 0,
                NUM_SCENE_OBJECTS * sizeof( PerInstanceData ),
                GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT );

// Iterate the scene objects
for ( int i = 0; i < NUM_SCENE_OBJECTS; ++i )
{
    pBuffer[ i ].WorldViewProj = ... // Copy World View Projection matrix
    pBuffer[ i ].Color = …               // Copy color
}

glUnmapBuffer( GL_UNIFORM_BUFFER ); // Unmap the buffer

And finally you can draw everything with a single call to glDrawElementsInstanced or glDrawArraysInstanced (depending if you are using an index buffer):

glDrawElementsInstanced( GL_TRIANGLES, NUM_INDICES, GL_UNSIGNED_SHORT, 0,
                NUM_SCENE_OBJECTS );

You are almost done! There is just one more step to do. In your shader you need to make use of the new uniform buffer object for your transformations and colors. In your shader main program:

void main()
{
    …
    gl_Position = PerInstanceData.Data[ gl_InstanceID ].uMVP * inPosition;
    outColor = PerInstanceData.Data[ gl_InstanceID ].uColor;
}

You might have noticed the use gl_InstanceID. This is a predefined OpenGL vertex shader variable that tells your program which instance it is currently drawing. Using this variable your shader can properly iterate the instance data and match the correct transformation and color for every vertex.

That’s it! You are now ready to use Geometry Instancing. If you are drawing the same mesh multiple times in a frame make sure to implement Geometry Instancing in your pipeline! This can greatly reduce overhead and improve performance.

Game Performance: Layout Qualifiers

Today, we want to share some best practices on using the OpenGL Shading Language (GLSL) that can optimize the performance of your game and simplify your workflow. Specifically, Layout qualifiers make your code more deterministic and increase performance by reducing your work.

Let’s start with a simple vertex shader and change it as we go along.

This basic vertex shader takes position and texture coordinates, transforms the position and outputs the data to the fragment shader:
attribute vec4 vertexPosition;
attribute vec2 vertexUV;

uniform mat4 matWorldViewProjection;

varying vec2 outTexCoord;

void main()
{
  outTexCoord = vertexUV;
  gl_Position = matWorldViewProjection * vertexPosition;
}

Vertex Attribute Index

To draw a mesh on to the screen, you need to create a vertex buffer and fill it with vertex data, including positions and texture coordinates for this example.

In our sample shader, the vertex data may be laid out like this:
struct Vertex
{
  Vector4 Position;
  Vector2 TexCoords;
};
Therefore, we defined our vertex shader attributes like this:
attribute vec4 vertexPosition;
attribute vec2  vertexUV;
To associate the vertex data with the shader attributes, a call to glGetAttribLocation will get the handle of the named attribute. The attribute format is then detailed with a call to glVertexAttribPointer.
GLint handleVertexPos = glGetAttribLocation( myShaderProgram, "vertexPosition" );
glVertexAttribPointer( handleVertexPos, 4, GL_FLOAT, GL_FALSE, 0, 0 );

GLint handleVertexUV = glGetAttribLocation( myShaderProgram, "vertexUV" );
glVertexAttribPointer( handleVertexUV, 2, GL_FLOAT, GL_FALSE, 0, 0 );
But you may have multiple shaders with the vertexPosition attribute and calling glGetAttribLocation for every shader is a waste of performance which increases the loading time of your game.

Using layout qualifiers you can change your vertex shader attributes declaration like this:
layout(location = 0) in vec4 vertexPosition;
layout(location = 1) in vec2 vertexUV;
To do so you also need to tell the shader compiler that your shader is aimed at GL ES version 3.1. This is done by adding a version declaration:
#version 300 es
Let’s see how this affects our shader, changes are marked in bold:
#version 300 es

layout(location = 0) in vec4 vertexPosition;
layout(location = 1) in vec2 vertexUV;

uniform mat4 matWorldViewProjection;

out vec2 outTexCoord;

void main()
{
  outTexCoord = vertexUV;
  gl_Position = matWorldViewProjection * vertexPosition;
}
Note that we also changed outTexCoord from varying to out. The varying keyword is deprecated from version 300 es and requires changing for the shader to work.

Note that Vertex Attribute qualifiers and #version 300 es are supported from OpenGL ES 3.0. The desktop equivalent is supported on OpenGL 3.3 and using #version 330.

Now you know your position attributes always at 0 and your texture coordinates will be at 1 and you can now bind your shader format without using glGetAttribLocation:
const int ATTRIB_POS = 0;
const int ATTRIB_UV   = 1;

glVertexAttribPointer( ATTRIB_POS, 4, GL_FLOAT, GL_FALSE, 0, 0 );
glVertexAttribPointer( ATTRIB_UV, 2, GL_FLOAT, GL_FALSE, 0, 0 );
This simple change leads to a cleaner pipeline, simpler code and saved performance during loading time.

To learn more about performance on Android, check out the Android Performance Patterns series.

Posted by Shanee Nishry, Games Developer Advocate