Squeeze These Last Milliseconds With CPU Time Slicing

Feb. 12, 2020
protect

[Visit the original post at Unity Performance: CPU Time Slicing]

The common wisdom says "profile, optimize, repeat". Let me warn you: this won't always work. Optimizing low-hanging fruits can gain you a whole millisecond in a day. But when those are gone, speeds of 0.1 milliseconds gains per week can become normal. When that happens, you need other tools such as the Unity CPU Slicing technique.

Interlaced-Logic-Thumbnail

In this blog post, you'll learn how to apply:

  • Update Batching: reduce the costly overhead of Unity calling your Update functions

  • CPU Slicing: Split your CPU load across multiple frames to multiply your scripts' performance

Quick Navigation

How Did I Get Into This Mess?

Unity Performance: What's CPU Time Slicing?

How To Apply CPU Time Slicing In 3 Steps

    1. Create an UpdateManager

    2. Link Your Scripts to the UpdateManager

    3. Tweaking CPU Time Slicing

Get Started Now: Your Toolkit

How Did I Get Into This Mess?

2019 was a tough year for me.

I was in charge of porting a high-demanding PC/PS4 VR title to Oculus Quest, a mobile VR platform.

It was a daunting task for many reasons... and performance was surely the toughest objective to accomplish.

So I had a few months to make the rendering of each frame take less than 13 milliseconds. If you're a veteran, you know that doesn't really give you much room for fancy 3d graphics and complex gameplay.

And 13 milliseconds was a very distant goal from the 40+ milliseconds I started with.

At the beginning of the project, I was confident I'd pull it off. After all, I had done huge mobile optimizations in the past.

Weeks passed by optimizing every single aspect of the game I could think of. Days where I saved a whole millisecond were kind of common. And those days brought me closer to my objective relatively fast.

However, by the middle of the project... I wasn't that confident anymore.

After optimizing all the low-hanging fruits, the CPU performance gains were becoming much rarer. I was used to gaining whole milliseconds within single days, but my speed dropped to the painful figures of 0.1-millisecond gains per day — if lucky.

I remember how excited I got when I jobified the audio system we used to update 200+ audio sources. I was really proud. But after careful examination, that optimization only got me 0.3 ms ahead but stole about 3 days of my budget.

... And I still had 3 milliseconds to go with little time left.

That was really a big problem considering the optimization speed I dropped to.

Draw calls were fine. Physics were also very optimized for this type of game. And the whole game logic was already pretty well optimized and partly multithreaded.

I had just a few weeks left and I didn't have any idea on how to approach the situation... and I knew the traditional wisdom of "profile game, optimize script, repeat" wouldn't get me there in time.

If you know me, you can probably guess what I would do in these type of situations...

Radical solutions.

I opened my notebook and started an unusual brainstorming session that would end up with a crazy idea.

As I re-read the Oculus Quest guidelines, I saw that indeed I had to render the game at 72 FPS.

But this is the key I realized back at that point: rendering at 72 FPS doesn't mean you must execute everything at 72 FPS. In fact, physics already execute at a different pace.

So I asked myself: what if I run the logic at lower framerate?

I quickly stood up from my seat, grabbed a sugarless double espresso and went straight to the drawing board.

At the drawing board, I started by making an inventory of all the expensive gameplay functions (funny how code that stole milliseconds from a CPU would steal hours from my sleep in the previous months).

Cactuar

The "Cactuar" Performance Group

Then I divided the gameplay functions into three groups:

  • Cactuar group: thousands of inexpensive scripts that, when combined, created a terrifying panorama. The name comes from the deadly 1000 Needles attack from Cactuar in Final Fantasy, each needle making just 1 point of damage.

  • Serious troublemakers: about ten scripts, each taking an average of 0.1 millisecond per frame.

  • Final bosses: massive monster scripts you wouldn't want to mess with. Each took about 0.5 milliseconds.

Was making these groups critical?

Probably not, but I had fun doing it... and it helped me with the next step.

Script groups in sight, I then re-arranged the scripts into two new separate groups that I called Group Alpha and Group Beta. I moved them around with one goal in mind: to make each group take about the same time to execute, i.e. 1.5 milliseconds per group.

The idea was simple: to execute the logic of Group Alpha in frame 1 and the logic of Group Beta in frame 2. And then I just had to repeat the cycle to literally halve the per-frame CPU cost of my scripts.

Within an hour I implemented a prototype to split the execution of my logic into these two groups.

Naturally, the next step was to test it. As soon as I went into gameplay I saw the game breaking into pieces from 8 different angles. Even Unity crashed (but that was nothing new).

Another sugarless double espresso gave me the well-deserved caffeine kick that helped me tweaking my scripts. I had to make them less CPU-attention sensitive. They'd need to let go of all the CPU love they used to get each frame to be content with half of it.

After some time and tears, I got it all to work.

I reduced the CPU time I spent each frame executing scripts by using what I call logic interlaced execution. I think the whole internet calls it CPU Time Slicing... so I guess I'll stick to that name.

Traditional Execution vs. CPU Slicing

Traditional Execution vs. CPU Slicing

Traditional Execution vs. CPU Slicing

Slow down, though... this system comes with its side effects.

I excluded scripts driving noticeable visual elements, as alternating their execution would make the game kind of jittery. I found it funny but my client wasn't that enthusiastic about these.

The benefit of this system is that eventually I could add a third, fourth, fifth group depending on time-critical the scripts were.

Also, by using a centralized update manager I got rid of the overhead that comes with having too many Update functions in Unity.

This system helped me get the last few gains that I so desperately needed.

Back to you... let me share with you the trick about implementing this powerful strategy in your game.

Unity CPU Slicing: Setup

Unity CPU Slicing: Setup

Unity CPU Slicing: Setup

Unity Performance: What's CPU Time Slicing?

If you read my story, you might have a vague idea about the meaning of Unity CPU Time Slicing. But just in case, I'll share an analogy that I like with you.

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Read More>>