One Draw Call UI

July 19, 2017
protect

Tobias wrote a nice post about the low level rendering of our UI. If you haven’t checked it out already, go ahead and do so, it introduces some interesting concepts.

To follow up, I wanted to say a little bit about the more high-level part of the UI, since that’s what has been occupying my mind the last few weeks.

 

We’ve decided to go with an immediate mode rather than a retained mode model for the UI. For a primer on to these two different models, have a look at Casey Muratori’s introductory talk or this tutorial by Sol.

The main reason why we like immediate mode, or IMGUI as it is often called, is that it avoids a lot of state synchronization between the UI and the underlying “model”. This tends to reduce boiler plate and produce code that is “simpler” (in some sense of the word). In an IMGUI it is easy to follow a control from the initial line of code that draws it all the way to the actual graphics that gets produced. In retained mode, your actions touch a bit of state that gets processed later and it can be hard to see the consequences (did this change cause a reflow?). We believe that there is great value to the transparency you get with the IMGUI model.

The ability to draw some quick controls deep inside a routine without worrying about who “owns” them and who needs to clean them up is also really nice.

IMGUI Performance

The two main critiques that are usually brought up against IMGUIs are:

  • It is good for simple UIs, but once you want to do X it doesn’t work (where X can be a text editor, drag-and-drop, or something else).

  • It is inefficient to redraw the UI every frame.

The first issue can be quite easily disproved by just doing X in an IMGUI. There are plenty of examples of people doing X for lots of different values of X.

But I wanted to say a little bit more about the second point.

First, I think people underestimate just how mind-numbingly fast computers are at doing simple operations. And most of the things we want to do in a UI are pretty simple. Putting some rects in a buffer. We can do a lot of that before it starts to become a problem.

If it becomes a problem, there are a lot of things we can do to fix it. For example, optimizing a scrolled list to only process the items that are visible is trivial in an IMGUI. In contrast, in many retained UIs this requires switching to a completely different protocol, a Virtual List View or something similar.

Note that the usual implementation of a Virtual List View means putting an immediate model on top of the retained UI.

Caching can also be used in various ways. Expensive operations (if you have any) can be cached with hashed values of the input arguments. We can avoid updating the UI all together if there is no interaction with it. We can use separate UIs for different tabs, and only update the UIs of the tabs that the user interacts with, etc. We can also put a retained model on top of the immediate UI, keep expensive calculations in the retained model and only use the immediate UI for drawing. In fact, our docking system does just that, as you will see later.

The fact that it can be useful both to put an immediate model on top of a retained UI and a retained model on top of an immediate UI shows that both models have merit. But between these two approaches, I find the retained model on top of the immediate UI to be cleaner. Systems without state are simpler and its cleaner to have the simpler systems at the bottom of the stack.

Side note: I’ve seen people propose to cache bitmaps for unchanging parts of the UI, but in my mind it seems simpler and better to just cache vertex and index buffers. A 1024 x 1024 bitmap is 3 MB in R8G8B8. You have to draw a lot of UI before you get 3 MB of vertex buffers. And retina displays give vertex buffers an even bigger advantage.

So caching can be used, if you need it, but computers are so fast that I don’t really think that you do.

The important thing is that for me, performance is a process more than anything else. Pretty much anything can be made performant if you have the time to analyze where the performance problems are and then address those issues using various tricks and techniques. This is where IMGUI has a big advantage, in my opinion. In an IMGUI, the code flow is so straightforward that it is really easy to see where the performance issues are and how to fix them. In a retained system, with a more complicated stack, this is much harder. Retained UIs may or may not have a theoretical performance advantage, in practice I think IMGUIs have the upper hand.

Case in point: I’ve seen people struggle for weeks with making a performant log viewer in a retained system (HTML/Javascript). In an IMGUI, we just keep the strings in a buffer and draw as many as will fit on the screen. Less than an hour’s work, and the cost is just the cost of drawing the strings that fit on the screen, no matter how big the buffer is.

Single Draw Call

If you’ve been following the blog, by now you should know that one of the fun things we do at Our Machinery is to take a strong, elegant, but maybe a bit extremist idea about how to write code, and then push that idea as hard as we can to see if we can make it fly. (Previous hits are things such as writing all header files in C, not allow header files to include other header files, etc.)

The nice thing about this somewhat drastic development methodology is that it can pull you out of the rut of same-sameness, force your brain to think in new ways and move into some exciting, unexplored territory.

For the UI, one such idea that we went with is:

  • Draw everything with a single draw call.

This is an interesting idea, because it seems almost possible — the interesting space between the trivial and the impossible. Bindless APIs such as Vulkan lets us do this without massive texture atlassing, but there are still some issues that we need to solve, such as:

  • Won’t we need different shaders to draw textured and untextured primitives and other “special” things?

  • Don’t we need different draw calls to handle different clipping/scissoring shapes?

  • What about overlapping windows and drop down menus?

Tobias already addressed the first two items in his post, so I won’t say more about them here. I’ll talk more about overlapping windows below.

The nice thing about using a single draw call is that though it complicates things somewhat, in the sense that we have to find solutions to the problems above, it also vastly simplifies things.

Using a single draw call means that our drawing functions can just take a vertex buffer and an index buffer as input and write the drawn shapes directly into those buffers. The drawing functions don’t have to worry about creating multiple draw calls, routing those draw calls to different shaders, etc.

We also don’t have to worry about how to efficiently “batch” our drawing to keep the number of draw calls down (because the number of draw calls will always be exactly one). This is something you can really go crazy with and apply all kinds of advanced algorithms — dynamic texture atlassing, finding non-overlapping items that can be put in the same draw call, without worrying about draw order, etc. There is probably room enough for a couple of PhD theses there, and we don’t have to do any of it. Nice!

System Layers

Our UI implementation has three layers:

UI Layers

The Drawing layer just knows how to draw basic shapes such as rectangles, circles and text. The UI layer uses the drawing layer and user input to implement interactive controls, such as buttons and checkboxes. Finally, the docking layer puts those controls into dockable tabs that can be dragged and dropped between windows.

As stated above, the drawing functions just write data to vertex and index buffers. The API looks like this:


struct tm_draw2d_buffer_t
{
    uint8_t *vbuffer;
    uint32_t vbytes;
    uint32_t *ibuffer;
    uint32_t in;
};

struct tm_drawing_api {
    void (*fill_rect)(struct tm_draw2d_buffer_t *buffer,
        const struct tm_draw2d_style_t *style, struct tm_rect_t r);
    ...
};

The style parameter contains settings for color, line width and other UI style options. The drawing function will write data directly into the vbuffer and the ibuffer and increase the counters vbytes and in (number of indices) to reflect the written data. Note that there is no provision for automatically growing the buffers. It is the responsibility of the caller to make sure that the buffers have space enough for the data that gets written.

I like to separate the tasks of allocating memory and processing data as much as possible to make APIs more flexible. If we had put the memory allocation inside the fill_rect() function we would dictate that the vbuffer and ibuffer must be allocated individually on the heap, they couldn’t be part of a larger structure. In addition, we would have to pass more parameters to the function, such as an allocation context, etc.

Keeping allocation and processing separate makes for a smaller, simpler API. I realize that this might be a controversial choice for people who are used to higher level APIs, but note that if you want a higher level API, you can just wrap the drawing functions in a library that also takes care of allocation. In fact, that’s exactly what the UI layer does.

We could just write regular triangle vertices to the vbuffer, but as you can read in Tobias’ post, we try to be a bit more clever and compress the data that we write to use less memory. This compressed data then gets unpacked in the vertex shader.

Note that the drawing layer doesn’t have any internal state at all, it only writes to the buffers passed in to the functions.

The UI layer is a stateful API in the sense that you create an UI object and then you draw controls (in immediate mode) into that object. The UI object state holds the

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Read More>>