[In this reprinted #altdevblogaday in-depth piece, Valve Software's Bruce Dawson takes us through his Xperf experience finding the cause of (and a workaround for) repeated hangs in Visual Studio's output window. ] The Windows Performance Toolkit, also known as xperf, is a powerful (and free!) system-wide Windows profiler. In the past I've talked about using xperf to identify slowdowns in PowerPoint by using xperf's built-in sampling profiler, but that actually understates the true value of Xperf. While I think xperf is a better sampling profiler than most of the alternatives (higher frequency, lower overhead, kernel and user mode), xperf is really at its best when it reveals information that other profilers cannot measure at all. In short, lots of profilers can tell you what your program is doing, but few profilers are excellent at telling you why your program is doing nothing. Our story so far Xperf has a high learning curve. Therefore I highly recommend that you start by reading some previous articles from this series. The entire series can be found here, but the most important ones are:
Xperf Basics: Recording a Trace – this covers installing xperf (takes just a few minutes), recording traces, and adding ETW events to your game
Xperf Analysis Basics – the essential knowledge of how to navigate the xperf UI, including how to set up symbol paths
The rest of this post assumes that you have installed xperf (preferably the Windows 8 version) and have familiarity with at least Xperf Analysis Basics. Wait Analysis victories I've had good luck using Wait Analysis to find many performance problems. Some of these delays were short enough to be difficult to notice, yet long enough to matter. Others were debilitating. All were difficult or impossible to analyze through CPU sampling or other 'normal' CPU profilers. Some examples include:
Finding the cause of brief startup hangs in Internet Explorer and various games
Profiling Luke Stackwalker to find out why it caused frame rate glitches in the game it was profiling
Finding the cause of a 10x perf-reduction when upgrading to a newer version of Windows
Finding the cause of frame rate hitches during fraps recording
Finding the chain of lock contention that caused frame rate hitches on a heavily loaded system
Finding the cause of (and a workaround for) repeated 2-6 second hangs in Visual Studio's output window
The last investigation is the one I want to cover today. It is sufficiently simple and self-contained that I can cover it end-to-end in a single (long) post. Finding the hang When profiling a transient problem such as a frame-rate glitch or a temporary hang the first challenge is to locate the hang in the trace. A trace might cover 60 seconds, and a hang might last for 2 seconds or less, so knowing where to look is crucial. There are a number of ways to do this:
Find the key stroke that triggered the hang, through logging of input events
Use instrumentation in the functions of interest
Look for patterns in the CPU usage or other data
Use OS hang-detection events
I've used all four of these techniques. Our internal trace recording tool has an optional input event logger which puts all keyboard and mouse input into the trace (watch for it). If a hang is triggered by a particular key press or mouse click then finding its start point in the trace is trivial. Custom instrumentation (emitting ETW events at key points in your game, see the Recording a Trace post) is also a common technique. Emitting an event every frame makes a frame rate hitch plainly visible. However this doesn't work when investigating performance problems in other people's code, such as in Visual Studio. In some cases a hang will be plainly visible in the CPU consumption. One recent hang showed a significant hole in the otherwise consistent CPU usage, plain as day. A specific event that indicates the time and duration of a hang would be ideal, and Windows 7 actually has such an event. The Microsoft-WindowsWin32k ETW user provider will emit an event whenever a thread resumes pumping messages after a significant delay. Windows Vista and earlier users are out of luck, but on Windows 7 this is often exactly what is needed, and this provider is enabled by my recommended trace recording batch files. It's hands on time I've uploaded a .zip file of a sample trace to VisualStudioF8Hang.zip. If you download this trace, load it into xperfview, and make sure that the Microsoft symbols can be found then you can follow along. This is by far the best way to learn wait analysis. This trace covers over over ten minutes for some types of data, but the detailed sampling and context switch data only covers 28 seconds, from about 782 to 810 seconds. Start by selecting the region where we have full data, from 782 to 810 s and cloning this selection.
Our path now depends on whether you are using the new (Windows 8) version of xperfview.exe. Hands on with old versions of xperfview While this exact technique is only applicable to (and only works with) old versions of xperfview, the general concept is still applicable and the exploration of generic events is crucial whether looking for your custom events or exploring the built-in events. Scroll down to the Generic Events table. Right-click the selected region and bring up a summary table. Enable the Process Name column and put it first. Enable the ThreadID column and put it after Field 3. Move the Time (s) column and put it after the ThreadID column. I also hid a couple of columns in order to get my screenshot to fit, but that's less critical. Now we have all of the information we need in a convenient and easy to read place. If we drill into the data for devenv.exe and select the MessageCheckDelay Task Name we should see something like this:
The Zen of summary tables is all about looking for data columns that seem useful, enabling them, and fearlessly rearranging columns to group/sort/pivot/spindle the data to answer your question. In this case our question was when does devenv.exe (group by Process Name) hang (group by the Microsoft-Windows-Win32k provider, Task Name equals MessageCheckDelay or InputProcessDelay), and for those events, look at the TimeSinceInputRemoveMs, Thread ID, and Time (s) data. So now, with relatively little effort, I know that devenv.exe hung (didn't check for messages) for 5,304 ms, its message pump is running on thread ID 9,536, and the hang ended at 805.666 seconds into the trace. Cool. Hands on with the new version of xperfview (6.2.8229) Microsoft is continuing to develop xperf and if you install the latest version (released Feb 29, 2012, and linked to from here) then there are a couple of options. Wpa.exe has a new UI which shows pretty graphs for UI delays:
I don't know how to dig in deeper so I can't tell if it is any use, so that's all I have to say about it. The new xperfview.exe has removed the MessageCheckDelay and InputProcessDelay events from Generic Events but has added a new UI Delay Information Graph. If you scroll down to this graph and zoom in around 800 s (in the area where we have full detail) then you should see five reports of hung apps. VTrace.exe (my trace recording application) hung for a while, there are three spurious reports of Internet Explorer hanging, and there is a MsgCheckDelay report for devenv.exe. It's really too easy.
You can right-click to change the threshold for what delays are reported, or to bring up a summary table of delays. You'll need to bring up the Delay Summary Table to find out the UI thread ID for devenv.exe. Select the region around the devenv hang and we're ready for the next step. Finding the cause The MessageCheckDelay is emitted at the end of the hang (805.666 seconds) and it tells us the length of the hang (5.304 s) so we now know the range of the hang quite accurately. The hang runs from 800.362 to 805.666 seconds so we should zoom in on that area of the graphs in the xperf main window and look at CPU Usage by Process. My system has eight hardware threads (four hyperthreaded cores) so 12.5 percent utilization represents one busy thread. Even without that context we can see from the graph below that my CPUs are idle for most of the time. There's a bit of devenv activity (the two blue spikes), but mostly this is an idle hang.
When analyzing an idle hang you should select the entire region of the hang, and it is particularly important to select the end of the hang. It is better to select a few extra tens or hundreds of milliseconds at the end rather than risk missing the crucial events that end the hang. This selection can be done with the mouse or by right-clicking and using the Select Interval command. For easy reproducibility I right-clicked and used the Select Interval command to select the region from 800.0 s to 806.0 s. I then used Clone Selection to copy it to all of the graphs. Who woke whom? If a thread is not running, and it then starts running, then there was a context switch that started it (the new thread) running. That context switch is recorded in our ETW trace and contains all sorts of useful information. Included in this information is (for the traces recorded with my recommended batch files) the new process name and thread ID, the call stack which the thread woke up on (which is the same one it went to sleep on), the length of time it was not running and, for threads that were waiting on some synchronization primitive, the thread that woke it up. Ponder that, because it's crucial. An ETW trace tells you, for each context switch, how long the thread was not running, and who woke it up. That's why it is important to have the end of the hang selected, because that is (presumably) the time of the context switch that gets the thread running again. In the main xperf window go to the CPU Scheduling graph (make sure the correct time range is selected), right click on the selection, and select "Summary Table with Ready Thread" to view all context switches for the selected region together with the readying thread information. Make sure the columns are in this order:
NewProcess – this is the process whose thread is being scheduled to run
No tags.