How to Analyze the Performance Cost of Your Unity Shaders

June 23, 2020
protect

In this post, you will learn how to analyze your unity shader complexity with numbers so you finally can:

  • Stop being fragment-bound in your unity game performance

  • Compare the GPU complexity between two shaders

  • Analyze shader costs in terms of texture, arithmetic and load/store GPU cycles

  • ... And reach 60, 90 or higher FPS

For this, we will use this little-known tool called mali offline shader compiler.

With this free software, you'll finally be able to see how you're spending your GPU cycles in your Unity shaders.

So let's get started with this exciting topic.

<iframe title="Unity Shader Performance: How to Quickly Measure the GPU Cycles Your Shaders Take" src="https://www.youtube.com/embed/uXO9mPHyj_Q?rel=0&amp;modestbranding=0&amp;controls=1&amp;showinfo=1&amp;fs=1&amp;wmode=transparent&amp;enablejsapi=1&amp;origin=https%3A%2F%2Fwww.gamedeveloper.com" height="nullpx" width="100%" allowfullscreen="" data-testid="iframe-video" loading="lazy" class="optanon-category-C0004 ot-vscat-C0004 " data-gtm-yt-inspected-91172384_163="true" id="318866455" data-gtm-yt-inspected-91172384_165="true" data-gtm-yt-inspected-113="true"></iframe>

Is It Important to Count GPU Cycles Nowadays?

Now more than ever, it is crucial to understand the impact of your shaders on the performance of your game.

With ever increasing resolutions (I look at you, VR), more and more games are bottlenecked by fragment shading stage.

“The more pixels you render, the more attention you have to pay to the cost of your fragment shaders.”

Rubén (The Gamedev Guru)

And to get an idea on how expensive your shaders are, here are two approaches:

  • Making guesstimates, e.g. "this shader looks expensive".

  • Measuring: either through static analysis or in-game profiling.

In this blog post, we will measure the cost of your shaders through static analysis. Guesstimates will work better once you gain more experience measuring 😉

In the next sections, you and I will get to compile your shaders in matter of minutes.

With that, we will get valuable performance information about them that will guide your future decisions.

Setting Up Your Mali Offline Compiler

You can download Mali Offline Compiler as part of Arm Mobile Studio.

On that page, you'll want to download the latest release for your target platform.

Download arm Mobile Studio

arm-mobile-studio-download

Download arm Mobile Studio

Once you've gone through the setup, the mali offline compiler should be part of your PATH variable, i.e. you'll be able to invoke it through the command line.

If that was not the case, you can add it yourself. You can find the malioc executable on the installation path.

Compiling Your Unity Shaders

Before we can start using the Mali Offline Shader Compiler, we need to instruct Unity to compile the shader you want to analyze.

You see, mali knows nothing about your unity shaders' format.

Mali just wants it in GLSL format.

Luckily, this is pretty easy in Unity.

Navigate to a material of your choice and click on the wheel icon on its right. Then, click on select shader.

Unity: Finding Your Shader

unity-select-shader

Unity: Finding Your Shader

Doing so will show you the inspector of your shader, which includes its name, some meta-data and the possibility to compile it.

Unity: Compiling Your Shader

unity-compile-shader

Unity: Compiling Your Shader

(You might need to select GLES3x, as this is the graphics API Mali works well with)

Guess which button will you press?

Getting Your Unity Shader Performance Metrics

Once you pressed Compile and show code, your code editor will show you the possibly long list of shaders that Unity compiled for you.

This temporary file contains all the vertex and fragment shader variants Unity produced for you.

Vertex shaders start with #ifdef VERTEX and end at its #endif.

And you can delimit fragment shaders by FRAGMENT.

Here's what you'll want to do:

  • Copy the inner code of either a vertex or a fragment shader

  • Paste it into a new file and save it with its proper extension (.vert or .frag)

  • Kindly ask mali to give you the performance metrics

Let me show you two examples on the standard shader.

Vertex Shader Performance Metrics

Here's the code I am saving to shader.vert:


//#ifdef VERTEX
#version 300 es
#define HLSLCC_ENABLE_UNIFORM_BUFFERS 1
#if HLSLCC_ENABLE_UNIFORM_BUFFERS
#define UNITY_UNIFORM
#else
#define UNITY_UNIFORM uniform
#endif
#define UNITY_SUPPORTS_UNIFORM_LOCATION 1
#if UNITY_SUPPORTS_UNIFORM_LOCATION
#define UNITY_LOCATION(x) layout(location = x)
#define UNITY_BINDING(x) layout(binding = x, std140)
#else
#define UNITY_LOCATION(x)
#define UNITY_BINDING(x) layout(std140)
#endif
uniform vec3 _WorldSpaceCameraPos;
uniform mediump vec4 unity_SHBr;
uniform mediump vec4 unity_SHBg;
uniform mediump vec4 unity_SHBb;
uniform mediump vec4 unity_SHC;
uniform vec4 hlslcc_mtx4x4unity_ObjectToWorld[4];
uniform vec4 hlslcc_mtx4x4unity_WorldToObject[4];
uniform vec4 hlslcc_mtx4x4unity_MatrixVP[4];
uniform vec4 _MainTex_ST;
uniform vec4 _DetailAlbedoMap_ST;
uniform mediump float _UVSec;
in highp vec4 in_POSITION0;
in mediump vec3 in_NORMAL0;
in highp vec2 in_TEXCOORD0;
in highp vec2 in_TEXCOORD1;
out highp vec4 vs_TEXCOORD0;
out highp vec4 vs_TEXCOORD1;
out highp vec4 vs_TEXCOORD2;
out highp vec4 vs_TEXCOORD3;
out highp vec4 vs_TEXCOORD4;
out mediump vec4 vs_TEXCOORD5;
out highp vec4 vs_TEXCOORD7;
out highp vec3 vs_TEXCOORD8;
vec4 u_xlat0;
mediump vec4 u_xlat16_0;
bool u_xlatb0;
vec4 u_xlat1;
mediump float u_xlat16_2;
mediump vec3 u_xlat16_3;
float u_xlat12;
void main()
{
u_xlat0 = in_POSITION0.yyyy * hlslcc_mtx4x4unity_ObjectToWorld[1];
u_xlat0 = hlslcc_mtx4x4unity_ObjectToWorld[0] * in_POSITION0.xxxx + u_xlat0;
u_xlat0 = hlslcc_mtx4x4unity_ObjectToWorld[2] * in_POSITION0.zzzz + u_xlat0;
u_xlat0 = u_xlat0 + hlslcc_mtx4x4unity_ObjectToWorld[3];
u_xlat1 = u_xlat0.yyyy * hlslcc_mtx4x4unity_MatrixVP[1];
u_xlat1 = hlslcc_mtx4x4unity_MatrixVP[0] * u_xlat0.xxxx + u_xlat1;
u_xlat1 = hlslcc_mtx4x4unity_MatrixVP[2] * u_xlat0.zzzz + u_xlat1;
gl_Position = hlslcc_mtx4x4unity_MatrixVP[3] * u_xlat0.wwww + u_xlat1;
#ifdef UNITY_ADRENO_ES3
u_xlatb0 = !!(_UVSec==0.0);
#else
u_xlatb0 = _UVSec==0.0;
#endif
u_xlat0.xy = (bool(u_xlatb0)) ? in_TEXCOORD0.xy : in_TEXCOORD1.xy;
vs_TEXCOORD0.zw = u_xlat0.xy * _DetailAlbedoMap_ST.xy + _DetailAlbedoMap_ST.zw;
vs_TEXCOORD0.xy = in_TEXCOORD0.xy * _MainTex_ST.xy + _MainTex_ST.zw;
u_xlat0.xyz = in_POSITION0.yyy * hlslcc_mtx4x4unity_ObjectToWorld[1].xyz;
u_xlat0.xyz = hlslcc_mtx4x4unity_ObjectToWorld[0].xyz * in_POSITION0.xxx + u_xlat0.xyz;
u_xlat0.xyz = hlslcc_mtx4x4unity_ObjectToWorld[2].xyz * in_POSITION0.zzz + u_xlat0.xyz;
u_xlat0.xyz = hlslcc_mtx4x4unity_ObjectToWorld[3].xyz * in_POSITION0.www + u_xlat0.xyz;
vs_TEXCOORD1.xyz = u_xlat0.xyz + (-_WorldSpaceCameraPos.xyz);
vs_TEXCOORD8.xyz = u_xlat0.xyz;
vs_TEXCOORD1.w = 0.0;
vs_TEXCOORD2 = vec4(0.0, 0.0, 0.0, 0.0);
vs_TEXCOORD3 = vec4(0.0, 0.0, 0.0, 0.0);
u_xlat0.x = dot(in_NORMAL0.xyz, hlslcc_mtx4x4unity_WorldToObject[0].xyz);
u_xlat0.y = dot(in_NORMAL0.xyz, hlslcc_mtx4x4unity_WorldToObject[1].xyz);
u_xlat0.z = dot(in_NORMAL0.xyz, hlslcc_mtx4x4unity_WorldToObject[2].xyz);
u_xlat12 = dot(u_xlat0.xyz, u_xlat0.xyz);
u_xlat12 = inversesqrt(u_xlat12);
u_xlat0.xyz = vec3(u_xlat12) * u_xlat0.xyz;
vs_TEXCOORD4.xyz = u_xlat0.xyz;
vs_TEXCOORD4.w = 0.0;
u_xlat16_2 = u_xlat0.y * u_xlat0.y;
u_xlat16_2 = u_xlat0.x * u_xlat0.x + (-u_xlat16_2);
u_xlat16_0 = u_xlat0.yzzx * u_xlat0.xyzz;
u_xlat16_3.x = dot(unity_SHBr, u_xlat16_0);
u_xlat16_3.y = dot(unity_SHBg, u_xlat16_0);
u_xlat16_3.z = dot(unity_SHBb, u_xlat16_0);
vs_TEXCOORD5.xyz = unity_SHC.xyz * vec3(u_xlat16_2) + u_xlat16_3.xyz;
vs_TEXCOORD5.w = 0.0;
vs_TEXCOORD7 = vec4(0.0, 0.0, 0.0, 0.0);
return;
}
//#endif

Note that you have to exclude the first #ifdef VERTEX and the last #endif. I just left them there for your reference.

Then, invoke the mali offline compiler like "malioc shader.vert", which produces this output:


C:\Users\rtorresb\Desktop\Tmp>malioc shader.vert

Mali Offline Compiler v7.1.0 (Build 7a3538)

Copyright 2007-2020 Arm Limited, all rights reserved

Configuration

=============

Hardware: Mali-G76 r0p0

Driver: Bifrost r19p0-00rel0

Shader type: OpenGL ES Vertex (inferred)

Main shader

===========

Work registers: 32

Uniform registers: 82

Stack spilling: False

A LS V T Bound

Total instruction cycles: 2.9 16.0 0.0 0.0 LS

Shortest path cycles: 2.9 16.0 0.0 0.0 LS

Longest path cycles: 2.9 16.0 0.0 0.0 LS

A = Arithmetic, LS = Load/Store, V = Varying, T = Texture

As you can see, this specific shader is load/store bound with 16 cycles for a Mali G76 GPU.

It's a pretty expensive one, but that's what you get when using the standard shader.

If you wanted to optimize this shader, then you'll want to reduce the load/store operations of your shaders. Then, redo this step to see how you improved it.

Fragment Shader Performance Metrics

Let's go through the same procedure with the fragment shader below:


//#ifdef FRAGMENT
#versio

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Read More>>