Physical Design of The Machinery

April 19, 2017
protect

At Our Machinery we have one simple rule that governs the physical design of the code:

Header files (.h) cannot include other header files

 

In other words, header files cannot depend on other header files for the types they need. (Except for <inttypes.h> and <stdboolh.> , we make exceptions for those to get the standard uint32_t and bool types.)

This is a pretty extreme choice. Let’s unpack why we made this decision and how we make it work.

Physical design

To begin with, if you are not familiar with the term, the physical design of a C++ project refers to how the code is broken down into individual .h and .cpp files and how those files are organized on disk — which files include which others, etc. It is related to, but separate from, the logical design which refers to how the components of the system are connected and interact.

Just as with logical design, the primary goal of physical design is to reduce coupling. We want to minimize the dependencies, so that changes in one file don’t have a big impact on other files.

Bad physical design is easy to spot. It happens whenever .cpp files pull in a lot of .h files that they don’t need. Most of the time this happens indirectly. The .cpp file includes something it needs. Then that thing includes more stuff. And the stuff includes even more stuff — completely unrelated to the original problem. Expanding the #includes in such a project can be both scary and enlightening. A few innocent #includes sometimes expand to megabytes of header data once all the recursive inclusion is resolved. This has a number of bad effects:

  • Compiles are slow. (Modern compilers are smart about caching header data, but there are limits to what they can do.)

  • Making changes in single header files causes the whole project to recompile.

  • You sometimes get weird and annoying errors because of name conflicts in the headers you’re unwittingly pulling in (#define min in <windows.h> is a classic).

All of these things make it harder and more cumbersome to make code changes, which is the worst thing you can do to a software project. Software needs to be nimble to stay alive. If the code gets too heavy you need an army of programmers to move it along, which is neither efficient nor fun.

A large part of the reason why people are drawn to scripting languages is the ability to iterate over the code quickly, to make changes and immediately see the results. At Our Machinery, we believe the similar things are possible with high-performance static languages, if you just set things up the right way.

Fixing physical design

Bad physical design tends to be a creeping issue. In the beginning, when your project is small, compile times are good no matter what, so who cares. When the project gets so big that it starts to be annoying, it is often too late to do anything about it.

Fixing a bad physical design is hard, even if you have a tool that can pinpoint where you should focus your efforts. It typically goes something like this:

  • Remove a single #include statement.

  • Insert the forward declarations that might now be needed.

  • Insert new #include statements that might now be needed (because those files were previously included indirectly by the line you just removed).

  • If there are any dependencies that can’t be resolved with forward declarations — refactor the code (this could be an arbitrarily complex task, depending on how gnarly the code is).

  • Make sure the code still compiles.

  • Make sure the code still compiles on every supported platform and in every supported configuration.

  • If you had to refactor the code, run tests to make sure you didn’t introduce any new bugs.

That’s a lot of work to remove one single #include statement. And of course, removing a single #include doesn’t have any measurable effect on the compile time or anything else that people care about. You have to repeat this, again and again, hundreds, maybe thousands of times before you start seeing results. It’s thankless, unrewarding grunt work. And if the physical design is already bad, all those compiles you need to do to check if things still work will take a really long time.

Also, while you are slaving away at this, the rest of the team is doing some other work, happily adding new #include statements everywhere. There is a reasonable chance that over the next few months, all the work you just did will be undone.

Physical design is really easy to screw up, but requires concentrated effort and hard work to fix. This means that time and human nature will pretty much guarantee that you are screwed. It’s simple thermodynamics.

So in a lot of large codebases, people essentially just give up and switch to unity builds or something. Once you are in that hole, it might even be the right decision, but that’s not where we want to end up with Our Machinery.

Necessity is the mother of prevention

Since the problem is so hard to fix, we need to stop it before it happens. We need rules that people can follow to avoid getting into this mess. But if you look online, most of the rules you find will boil down to something like:

Don’t include more header files than you need

That is good advice, but it as a rule it sucks. It’s too vague. It doesn’t tell you what to do or how to solve problems you encounter. It’s not verifiable. A rule like that will not stop people from just adding #includes until the code compiles. After all, that’s what they “need”. We have:

Vague rules + no enforcement + path of least resistance + human nature → chaos

Rules should be clear enough that it is obvious if the code follows them or not. Even better, they should be machine verifiable. That way, we can just put the rule in the pre-commit hook and be certain that it is always followed.

This is why we use clang-format to format our code. It doesn’t do exactly what we want, but it is close enough that having automatic formatting is worth the trade-off.

(If you want to know — our main gripe with clang-format is that it insists on left flushing pre-processor directives instead of indenting them with the rest of the code. I find this a terrible, unreadable practice — especially when you have nested levels of #ifs and

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Read More>>