In-depth: When even crashing doesn't work

July 9, 2012

[In this reprinted #altdevblogaday in-depth piece, Valve Software programmer Bruce Dawson explains that, while crashing can improve code quality, the task of crashing can be more error prone than you might expect.] I've written previously about the importance of crashing in order to improve code quality. However even the seemingly simple task of crashing can be more error prone than you might expect. I've recently become aware of two different problems that can happen when crashing in 64-bit Windows. There is a Windows bug which can make debuggers forget where a crash happened, and there is a Windows design decision which sometimes causes a crash to be completely ignored! Both problems are (mostly) avoidable once you know what to do, but the required techniques are far from obvious.

Forgetting where a crash happened

It is a reasonable minimum requirement that a debugger should halt on the exact instruction that triggered a fault and then attempt to show source code, local variables, a call stack etc. There are all sorts of reasons it may be difficult or impossible to show source code (none available), local variables (optimized away), or a call stack (stack trashed), but for user-mode debugging it should always be possible to stop on the faulting instruction. And indeed, in all the decades that I have used Visual C++ it has managed this task quite well – until recently. Starting a few months ago I noticed that, when the program that I was debugging crashed, the VC++ debugger would not halt on the faulting instruction. It wouldn't even halt in the crashing function. Instead it would halt two levels into the OS, with a call stack that made no sense. At first I thought that the project I was working on was doing something weird with a structured exception handler but I was able to reproduce the bug on a fresh project created by the VC++ New Project Wizard. I briefly thought that maybe something was misconfigured on my machine, but then my coworkers started reporting this problem as well. Then I thought maybe it was a newly introduced VC++ bug – but the same problem can be triggered in windbg as well. I wasn't sure what was happening but it smelled like a recently introduced Windows bug. My minimal test program for this bug was to call this Crash() function just before the message pump in a default Win32 program, debug build:

void Crash() { char* p = 0; p[0] = 0; }

If I break on the instruction that will crash then I get the call stack below, and I should get the same call stack after crashing:

That is indeed the call stack that I got in this scenario for years. However, starting a few months ago, on most 64-bit Windows 7 machines that I have tested this on, the actual call stack is this:

Notice that the function that crashed is not even listed! This makes routine bug investigation an expert-level problem. Sometimes the crash call stack is even worse, with even the parent of the crashing function missing:

The actual stack displayed varies. Sometimes it is correct, and sometimes the two ZwRaiseException entries are listed. It seems to depend on subtle details of the code at the crash location, or the stack frames, or the phase of Venus. Windbg defaults to halting on first-chance exceptions, so it normally avoids this bug. However if you continue execution after a crash then the exception handlers run and the bug appears. I've created a simple test program with a "Crash normally" menu item so that you can easily test it. Source and the executable are available here. You'll have to build the project file (with VS 2010 or VS 2012) to get symbols in order to see this properly in a debugger. Another blogger investigated this issue earlier this year and found the root cause. The issue is a bug in the WoW64 support for AVX. Saving the state of the AVX registers requires additional space, and apparently the WoW64 debug support fails to reserve enough space, so the stack gets corrupted. Oops.

There is a fix (well, a couple of workarounds)

The problem with correctly displaying the location of a crash only occurs if the first-chance exception handlers are allowed to run. First-chance exception handlers give a program a chance to take some action when a program crashes (such as saving a minidump, or translating raw exception numbers into something more readable). Programmatically saving minidumps is unnecessary and inadvisable when you are running under the debugger, so that's no loss. Translating raw exception numbers is valuable when debugging – I demonstrated it a few posts ago – but it's not valuable enough to justify the complexity caused by not knowing where you crashed. Other uses of first-chance exceptions – such as 'fixing' bugs so that you can continue executing – are morally bankrupt and will not be acknowledged further here. Clearly what we want to do is to stop any exception handlers from running when our program crashes. We want the debugger to halt when an exception is thrown, instead of after it has complicated things by letting exception handlers run. This is actually the default behavior in windbg but in Visual Studio we have to change a setting. Go to the Debug menu, select Exceptions, and check the box beside Win32 Exceptions. In an ideal world this would be a global setting and we would be done with the problem, but alas this is a per-solution setting, so you may have to click this check box many times. It's a minor nuisance, and well worth it for the benefit of actually being able to debug your crashes. Another workaround with a different set of tradeoffs was suggested by Michaln, author of the os2museum blog. He points out that you can disable AVX support and therefore avoid the problem. The obvious disadvantage is that you lose AVX support, which will eventually become unacceptable. The command below and a reboot will turn off AVX support.

bcdedit /set xsavedisable 1

I think that there are two changes which Microsoft should make. One is that Visual Studio should default to halting immediately when Win32 exceptions are thrown – that is a safer policy in general, and would have avoided most of the impact of this bug. The other change that Microsoft should make is to actually fix WOW64. I have reported this bug to Microsoft through informal channels, but I've heard no reply so far.

Failure to stop at all

An equally disturbing problem was introduced some years ago with 64-bit Windows and it causes some crashes to be silently ignored. Structured exception handling is the Windows system that underpins all exception handling (C++ exceptions are implemented using structured exception handling under the hood). Its full implementation relies on being able to unwind the stack (without or without calling destructors) in order to transfer execution from where an exception occurs to a catch/__except block. The introduction of 64-bit Windows complicated this. On 64-bit Windows it is impossible to unwind the stack across the kernel boundary. That is, if your process calls into the kernel, and then the kernel calls back into your process, and an exception is thrown in the callback that is supposed to be handled on the other side of the kernel boundary, then Windows cannot handle this. This may seem a bit esoteric and unlikely – writing kernel callbacks seems like a rare activity – but it's actually quite common. In particular, a WindowProc is a callback, and it is often called by the kernel, as shown below:

If your code crashes in the user code on the right – called from the kernel – then Windows has a problem. Since Windows can't invoke your exception handlers in the box on the left, and it doesn't know what they would do, it has to make an executive decision about this exception. It can either crash the process, or it can silently ignore the exception, unwind the stack back to the kernel boundary, and then continue executing as if nothing happened. Crashing the process may significantly inconvenience users, especially if there is a bug specific to 64-bit Windows in an unsupported product. But silently swallowing the exception means that many developers may be crashing in their WndProc without realizing it, leaving their process in an indeterminate state that may be causing future pain and suffering. Microsoft tries to err on the side of maximum compatibility and stability, but sometimes this just sweeps problems under the rug. Triggering this behavior is easy. In a Project Wizard "Win32 Project" just drop a call to the Crash() function in the paint handler. To make this demo particularly dramatic be sure to put the Visual Studio exception settings back to normal. That is, make it so that Visual Studio does not stop when an exception is thrown – only when it is unhandled. Here's a sample of what the modified code could look like, complete with a new/delete pair that straddles the Crash() call:

Tags:

No tags.

Subscribe to our newsletter

About JikGuard.com

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Explore Features>>

Top

Microsoft no longer selling Movies and TV on Xbox (or anywhere)

In-depth: When even crashing doesn't work

Forgetting where a crash happened

There is a fix (well, a couple of workarounds)

Failure to stop at all

Top

Tags

Recent

Blog

Random

Most Views