In this post, you will learn how to analyze your unity shader complexity with numbers so you finally can:
Stop being fragment-bound in your unity game performance
Compare the GPU complexity between two shaders
Analyze shader costs in terms of texture, arithmetic and load/store GPU cycles
... And reach 60, 90 or higher FPS
For this, we will use this little-known tool called mali offline shader compiler.
With this free software, you'll finally be able to see how you're spending your GPU cycles in your Unity shaders.
So let's get started with this exciting topic.
Is It Important to Count GPU Cycles Nowadays?
Now more than ever, it is crucial to understand the impact of your shaders on the performance of your game.
With ever increasing resolutions (I look at you, VR), more and more games are bottlenecked by fragment shading stage.
“The more pixels you render, the more attention you have to pay to the cost of your fragment shaders.”
Rubén (The Gamedev Guru)
And to get an idea on how expensive your shaders are, here are two approaches:
Making guesstimates, e.g. "this shader looks expensive".
Measuring: either through static analysis or in-game profiling.
In this blog post, we will measure the cost of your shaders through static analysis. Guesstimates will work better once you gain more experience measuring 😉
In the next sections, you and I will get to compile your shaders in matter of minutes.
With that, we will get valuable performance information about them that will guide your future decisions.
Setting Up Your Mali Offline Compiler
You can download Mali Offline Compiler as part of Arm Mobile Studio.
On that page, you'll want to download the latest release for your target platform.
arm-mobile-studio-download
Download arm Mobile Studio
Once you've gone through the setup, the mali offline compiler should be part of your PATH variable, i.e. you'll be able to invoke it through the command line.
If that was not the case, you can add it yourself. You can find the malioc executable on the installation path.
Compiling Your Unity Shaders
Before we can start using the Mali Offline Shader Compiler, we need to instruct Unity to compile the shader you want to analyze.
You see, mali knows nothing about your unity shaders' format.
Mali just wants it in GLSL format.
Luckily, this is pretty easy in Unity.
Navigate to a material of your choice and click on the wheel icon on its right. Then, click on select shader.
unity-select-shader
Unity: Finding Your Shader
Doing so will show you the inspector of your shader, which includes its name, some meta-data and the possibility to compile it.
unity-compile-shader
Unity: Compiling Your Shader
(You might need to select GLES3x, as this is the graphics API Mali works well with)
Guess which button will you press?
Getting Your Unity Shader Performance Metrics
Once you pressed Compile and show code, your code editor will show you the possibly long list of shaders that Unity compiled for you.
This temporary file contains all the vertex and fragment shader variants Unity produced for you.
Vertex shaders start with #ifdef VERTEX and end at its #endif.
And you can delimit fragment shaders by FRAGMENT.
Here's what you'll want to do:
Copy the inner code of either a vertex or a fragment shader
Paste it into a new file and save it with its proper extension (.vert or .frag)
Kindly ask mali to give you the performance metrics
Let me show you two examples on the standard shader.
Vertex Shader Performance Metrics
Here's the code I am saving to shader.vert:
//#ifdef VERTEX #version 300 es #define HLSLCC_ENABLE_UNIFORM_BUFFERS 1 #if HLSLCC_ENABLE_UNIFORM_BUFFERS #define UNITY_UNIFORM #else #define UNITY_UNIFORM uniform #endif #define UNITY_SUPPORTS_UNIFORM_LOCATION 1 #if UNITY_SUPPORTS_UNIFORM_LOCATION #define UNITY_LOCATION(x) layout(location = x) #define UNITY_BINDING(x) layout(binding = x, std140) #else #define UNITY_LOCATION(x) #define UNITY_BINDING(x) layout(std140) #endif uniform vec3 _WorldSpaceCameraPos; uniform mediump vec4 unity_SHBr; uniform mediump vec4 unity_SHBg; uniform mediump vec4 unity_SHBb; uniform mediump vec4 unity_SHC; uniform vec4 hlslcc_mtx4x4unity_ObjectToWorld[4]; uniform vec4 hlslcc_mtx4x4unity_WorldToObject[4]; uniform vec4 hlslcc_mtx4x4unity_MatrixVP[4]; uniform vec4 _MainTex_ST; uniform vec4 _DetailAlbedoMap_ST; uniform mediump float _UVSec; in highp vec4 in_POSITION0; in mediump vec3 in_NORMAL0; in highp vec2 in_TEXCOORD0; in highp vec2 in_TEXCOORD1; out highp vec4 vs_TEXCOORD0; out highp vec4 vs_TEXCOORD1; out highp vec4 vs_TEXCOORD2; out highp vec4 vs_TEXCOORD3; out highp vec4 vs_TEXCOORD4; out mediump vec4 vs_TEXCOORD5; out highp vec4 vs_TEXCOORD7; out highp vec3 vs_TEXCOORD8; vec4 u_xlat0; mediump vec4 u_xlat16_0; bool u_xlatb0; vec4 u_xlat1; mediump float u_xlat16_2; mediump vec3 u_xlat16_3; float u_xlat12; void main() { u_xlat0 = in_POSITION0.yyyy * hlslcc_mtx4x4unity_ObjectToWorld[1]; u_xlat0 = hlslcc_mtx4x4unity_ObjectToWorld[0] * in_POSITION0.xxxx + u_xlat0; u_xlat0 = hlslcc_mtx4x4unity_ObjectToWorld[2] * in_POSITION0.zzzz + u_xlat0; u_xlat0 = u_xlat0 + hlslcc_mtx4x4unity_ObjectToWorld[3]; u_xlat1 = u_xlat0.yyyy * hlslcc_mtx4x4unity_MatrixVP[1]; u_xlat1 = hlslcc_mtx4x4unity_MatrixVP[0] * u_xlat0.xxxx + u_xlat1; u_xlat1 = hlslcc_mtx4x4unity_MatrixVP[2] * u_xlat0.zzzz + u_xlat1; gl_Position = hlslcc_mtx4x4unity_MatrixVP[3] * u_xlat0.wwww + u_xlat1; #ifdef UNITY_ADRENO_ES3 u_xlatb0 = !!(_UVSec==0.0); #else u_xlatb0 = _UVSec==0.0; #endif u_xlat0.xy = (bool(u_xlatb0)) ? in_TEXCOORD0.xy : in_TEXCOORD1.xy; vs_TEXCOORD0.zw = u_xlat0.xy * _DetailAlbedoMap_ST.xy + _DetailAlbedoMap_ST.zw; vs_TEXCOORD0.xy = in_TEXCOORD0.xy * _MainTex_ST.xy + _MainTex_ST.zw; u_xlat0.xyz = in_POSITION0.yyy * hlslcc_mtx4x4unity_ObjectToWorld[1].xyz; u_xlat0.xyz = hlslcc_mtx4x4unity_ObjectToWorld[0].xyz * in_POSITION0.xxx + u_xlat0.xyz; u_xlat0.xyz = hlslcc_mtx4x4unity_ObjectToWorld[2].xyz * in_POSITION0.zzz + u_xlat0.xyz; u_xlat0.xyz = hlslcc_mtx4x4unity_ObjectToWorld[3].xyz * in_POSITION0.www + u_xlat0.xyz; vs_TEXCOORD1.xyz = u_xlat0.xyz + (-_WorldSpaceCameraPos.xyz); vs_TEXCOORD8.xyz = u_xlat0.xyz; vs_TEXCOORD1.w = 0.0; vs_TEXCOORD2 = vec4(0.0, 0.0, 0.0, 0.0); vs_TEXCOORD3 = vec4(0.0, 0.0, 0.0, 0.0); u_xlat0.x = dot(in_NORMAL0.xyz, hlslcc_mtx4x4unity_WorldToObject[0].xyz); u_xlat0.y = dot(in_NORMAL0.xyz, hlslcc_mtx4x4unity_WorldToObject[1].xyz); u_xlat0.z = dot(in_NORMAL0.xyz, hlslcc_mtx4x4unity_WorldToObject[2].xyz); u_xlat12 = dot(u_xlat0.xyz, u_xlat0.xyz); u_xlat12 = inversesqrt(u_xlat12); u_xlat0.xyz = vec3(u_xlat12) * u_xlat0.xyz; vs_TEXCOORD4.xyz = u_xlat0.xyz; vs_TEXCOORD4.w = 0.0; u_xlat16_2 = u_xlat0.y * u_xlat0.y; u_xlat16_2 = u_xlat0.x * u_xlat0.x + (-u_xlat16_2); u_xlat16_0 = u_xlat0.yzzx * u_xlat0.xyzz; u_xlat16_3.x = dot(unity_SHBr, u_xlat16_0); u_xlat16_3.y = dot(unity_SHBg, u_xlat16_0); u_xlat16_3.z = dot(unity_SHBb, u_xlat16_0); vs_TEXCOORD5.xyz = unity_SHC.xyz * vec3(u_xlat16_2) + u_xlat16_3.xyz; vs_TEXCOORD5.w = 0.0; vs_TEXCOORD7 = vec4(0.0, 0.0, 0.0, 0.0); return; } //#endif
Note that you have to exclude the first #ifdef VERTEX and the last #endif. I just left them there for your reference.
Then, invoke the mali offline compiler like "malioc shader.vert", which produces this output:
C:\Users\rtorresb\Desktop\Tmp>malioc shader.vert Mali Offline Compiler v7.1.0 (Build 7a3538) Copyright 2007-2020 Arm Limited, all rights reserved Configuration ============= Hardware: Mali-G76 r0p0 Driver: Bifrost r19p0-00rel0 Shader type: OpenGL ES Vertex (inferred) Main shader =========== Work registers: 32 Uniform registers: 82 Stack spilling: False A LS V T Bound Total instruction cycles: 2.9 16.0 0.0 0.0 LS Shortest path cycles: 2.9 16.0 0.0 0.0 LS Longest path cycles: 2.9 16.0 0.0 0.0 LS A = Arithmetic, LS = Load/Store, V = Varying, T = Texture
As you can see, this specific shader is load/store bound with 16 cycles for a Mali G76 GPU.
It's a pretty expensive one, but that's what you get when using the standard shader.
If you wanted to optimize this shader, then you'll want to reduce the load/store operations of your shaders. Then, redo this step to see how you improved it.
Fragment Shader Performance Metrics
Let's go through the same procedure with the fragment shader below:
//#ifdef FRAGMENT #versio